The Talent500 Blog
Architecting for the Cloud

Architecting for the Cloud

AWS Principles

Scalability

While AWS provides virtually unlimited on-demand capacity, the architecture should be designed to take advantage of those resources. There are two ways to scale an IT architecture.

  • Vertical Scaling

This takes place through increasing specifications of an individual resource e.g. updating EC2 instance type with increasing RAM, CPU, IOPS, or networking capabilities will eventually hit a limit, and is not always a cost-effective or highly available approach

  • Horizontal Scaling

This takes place by increasing the number of resources e.g. adding more EC2 instances or EBS volumes can help leverage the elasticity of cloud computing.  Not all architectures can be designed to distribute their workload to multiple resources

Applications designed should be stateless, needs no knowledge of previous interactions and store no session information.  Capacity can be increased and decreased.  After running tasks have been drained State, if needed, can be implemented using Low latency external stores, e.g. DynamoDB, Redis, to maintain state information and Session affinity, e.g. ELB sticky sessions, to bind all the transactions of a session to a specific compute resource.

However, it cannot be guaranteed or taken advantage of newly added resources for existing sessions.  The load can be distributed across multiple resources using the Push model, e.g. through ELB where it distributes the load across multiple EC2 instances.  Pull model, e.g. through SQS or Kinesis where multiple consumers subscribe and consume distributed processing, e.g. using EMR or Kinesis, helps process large amounts of data by dividing a task and its data into many small fragments of works

Disposable Resources Instead of Fixed Servers

Resources need to be treated as temporary disposable resources rather than fixed permanent on-premises resources. AWS focuses on the concept of Immutable infrastructure. Servers, once launched, are never updated throughout their lifetime. Updates can be performed on a new server with the latest configurations.

This ensures resources are always in a consistent (and tested) state and easier rollbacks.  AWS provides multiple ways to instantiate compute resources in an automated and repeatable way

  • Bootstrapping: Scripts to configure and set up e.g. using data scripts and cloud-init to install software or copy resources and code.
  • Golden Images: A snapshot of a particular state of that resource, faster start times and removes dependencies to configuration services or third-party repositories
  • Containers: AWS support for docker images through Elastic Beanstalk and ECS (Elastic Container Service ).  Docker allows packaging a piece of software in a Docker Image, which is a standardized unit for software development, containing everything the software needs to run: code, runtime, system tools, system libraries, etc
  • Infrastructure as Code: AWS assets are programmable.   Techniques, practices, and tools from software development can be applied to make the whole infrastructure reusable, maintainable, extensible, and testable.

AWS provides services like CloudFormation and OpsWorks for deployment

Automation

AWS provides various automation tools and services which help improve the system’s stability, efficiency and time to market.

  • Elastic Beanstalk

A PaaS (Platform As A Service )  that allows quick application deployment while handling resource provisioning, load balancing, auto-scaling, monitoring etc.

  • EC2 Auto Recovery

Creates CloudWatch alarm that monitors an EC2 instance and automatically recovers it if it becomes impaired. A recovered instance is identical to the original instance, including the instance ID, private & Elastic IP addresses, and all instance metadata.  An instance is migrated through a reboot, and in memory, contents are lost.

  • Auto Scaling

Allows to maintain application availability and scale the capacity up or down automatically as per defined conditions.

  • CloudWatch Alarms

Allows SNS triggers to be configured when a particular metric goes beyond a specified threshold for a specified number of periods

  • CloudWatch Events

Allows real-time stream of system events that describe changes in AWS resources

  • OpsWorks

Allows continuous configuration through lifecycle events that automatically update the instances’ configuration to adapt to environmental changes. Events can be used to trigger Chef recipes on each instance to perform specific configuration tasks

  • Lambda Scheduled Events
    Allows Lambda function creation and directs AWS Lambda to execute it on a regular schedule.

Loose Coupling

AWS helps loosely coupled architecture that reduces interdependencies, a change or failure in a component does not cascade to other components

Asynchronous Integration

Asynchronous Integration does not involve direct point-to-point interaction but usually through an intermediate durable storage layer e.g. SQS, Kinesis, decouples the components and introduces additional resiliency suitable for any interaction that doesn’t need an immediate response and an ack that a request has been registered will suffice

Service Discovery

Allows new resources to be launched or terminated at any point in time and discovered as well e.g. using ELB as a single point of contact with hiding the underlying instance details or Route 53 zones to abstract the load balancer’s endpoint

Well-Defined Interfaces

allows various components to interact with each other through specific, technology interfaces e.g. RESTful apis with API Gateway 

Services, Not Servers

Databases
AWS provides different categories of database technologies

  • Relational Databases (RDS)
  1. Normalizes data into well-defined tabular structures known as tables, which consist of rows and columns.
  2. Provides a powerful query language, flexible indexing capabilities, strong integrity controls, and the ability to combine data from multiple tables in a fast and efficient manner
  3. Allows vertical scalability by increasing resources and horizontal scalability using Read Replicas for read capacity and sharding or data partitioning for write capacity
  4. Provides High Availability using Multi-AZ deployment, where data is synchronously replicated 
  • NoSQL Databases (DynamoDB)
  1. Provides databases that trade some of the query and transaction capabilities of relational databases for a more flexible data model that seamlessly scales horizontally
  2. Perform data partitioning and replication to scale both the reads and writes in a horizontal fashion
  3. DynamoDB service synchronously replicates data across three facilities in an AWS region to provide fault tolerance in the event of a server failure or Availability Zone disruption
  • Data Warehouse (Redshift)
  1. A specialized type of relational database, optimized for analysis and reporting of large amounts of data
  2. Redshift achieves efficient storage and optimum query performance through a combination of massively parallel processing (MPP), columnar data storage, and targeted data compression encoding schemes
  3. Redshift MPP architecture enables increased performance by increasing the number of nodes in the data warehouse cluster

Removing Single Points of Failure

AWS provides ways to implement redundancy, automate the recovery and reduce disruption at every layer of the architecture.
AWS supports redundancy in the following ways

  • Standby Redundancy
  1. When a resource fails, functionality is recovered on a secondary resource using a process called failover.
  2. Failover will typically require some time before it completes, and during that period the resource remains unavailable.
  3. Secondary resources can either be launched automatically only when needed (to reduce cost), or they can be already running idle (to accelerate failover and minimize disruption).
  4. Standby redundancy is often used for stateful components such as relational databases.
  • Active Redundancy
  1. Requests are distributed to multiple redundant compute resources, if one fails, the rest can simply absorb a larger share of the workload. 
  2. Compared to standby redundancy, it can achieve better utilization and affect a smaller population when there is a failure. 

AWS supports replication

Synchronous replication

  1. Acknowledges a transaction after it has been durably stored in both the primary location and its replicas.
  2. Protects data integrity from the event of a primary node failure.
  3. Used to scale read capacity for queries that require the most up-to-date data (strong consistency).
  4. Compromises performance and availability.

 

Asynchronous replication

  1. Decouples the primary node from its replicas at the expense of introducing replication lag.
  2. Horizontally scale the system’s read capacity for queries that can tolerate that replication lag.

Quorum-based replication

 

  1. Combines synchronous and asynchronous replication to overcome the challenges of large-scale distributed database systems.
  2. Replication to multiple nodes can be managed by defining a minimum number of nodes that must participate in a successful write operation 

AWS provide services to reduce or remove the single point of failure

 

  • Regions, Availability Zones with multiple data centres
  • ELB or Route 53 to configure health checks and mask failure by routing traffic to healthy endpoints
  • Auto Scaling to automatically replace unhealthy nodes
  • EC2 auto-recovery to recover unhealthy impaired nodes
  • S3, DynamoDB with data redundantly stored across multiple facilities
  • Multi-AZ RDS and Read Replicas
  • ElastiCache Redis engine supports replication with automatic failover 

Optimize for Cost

  • AWS can help organizations reduce capital expenses and drive savings as a result of the AWS economies of scale
  • AWS provides different options which should be utilized as per the use case –
  • EC2 instance types – On Demand, Reserved and Spot
  • Trusted Advisor or EC2 usage reports to identify the compute resources and their usage
  • S3 storage class – Standard, Reduced Redundancy, and Standard-Infrequent Access
  • EBS volumes – Magnetic, General Purpose SSD, Provisioned IOPS SSD
  • Cost Allocation tags to identify costs based on tags
  • Auto Scaling to horizontally scale the capacity up or down based on demand
  • Lambda-based architectures to never pay for idle or redundant resources
  • Utilize managed services where scaling is handled by AWS e.g. ELB, CloudFront, Kinesis, SQS, CloudSearch etc.

Caching

Caching improves application performance and increases the cost efficiency of an implementation

Application Data Caching

  1. Provides services that help store and retrieve information from fast, managed, in-memory caches.
  2. ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud and supports two open-source in-memory caching engines: Memcached and Redis 

Edge Caching

  1. Allows content to be served by infrastructure that is closer to viewers, lowering latency and giving high, sustained data transfer rates needed to deliver large popular objects to end users at scale.
  2. CloudFront is a Content Delivery Network (CDN) consisting of multiple edge locations, that allow copies of static and dynamic content to be cached

Security

  • AWS works on a shared security responsibility model
  • AWS is responsible for the security of the underlying cloud infrastructure
  • Responsible for securing the workloads you deploy in AWS
  • AWS also provides ample security features 
    • IAM roles define a granular set of policies and assign them to users, groups, and AWS resources.
    • IAM roles assign short-term credentials to resources, which are automatically distributed and rotated.
    • Amazon Cognito, for mobile applications, allows client devices to get controlled access to AWS resources via temporary tokens.
    • VPC isolates parts of infrastructure through the use of subnets, security groups, and routing controls.
    • WAF (Web Application Firewall ) to help protect web applications from SQL injection and other vulnerabilities in the application code.
    • CloudWatch logs to collect logs centrally as the servers are temporary.
    • CloudTrail for auditing AWS API calls, which delivers a log file to an S3 bucket. Logs can then be stored in an immutable manner and automatically processed to either notify or even take action on your behalf, protecting your organization from non-compliance.
    • AWS Config, Amazon Inspector, and AWS Trusted Advisor to continually monitor for compliance or vulnerabilities giving a clear overview of which IT resources are in compliance, and which are not.

It is advised to use these best practices for designing your AWS architecture.

0
Avatar

Priyam Vaidya

A certified cloud architect (Azure and AWS) with over 15 years of experience in IT. Currently working as Sr Cloud Infrastructure Engineer. Love to explore and train others on new technology

Add comment