Solutions Architect Series – Part 3: Principles of Solution Architecture Design 1/2

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Scaling workload

Scaling could be predictive if you are aware of your workload, which is often the case; or it could be reactive if you get a sudden spike or if you have never handled that kind of load before.

Predictive scaling

Is the best-case approach that any organization wants to take. Often, you can collect historical data of application workload, for example, an e-commerce website such as Amazon may have a sudden traffic spike, and you need predictive scaling to avoid any latency issues. Traffic patterns may include the following:

  • Weekends have three times more traffic than a weekday.
  • Daytime has five times more traffic than at night.
  • Shopping seasons, such as Thanksgiving or Boxing Day, have 20 times more traffic than regular days.
  • Overall, the holiday season in November and December has 8 to 10 times more traffic than during other months.

Reactive scaling

You will need to understand your existing architecture and traffic patterns, along with an estimate of the desired traffic. You also need to understand the navigation path of the website. For example, the user has to log in to buy a product, which can lead to more traffic on the login page.

In order to plan for the scaling of your server resources for traffic handling, you need to determine the following patterns:

  • Determine web pages, which are read-only and can be cached.
  • Which user queries need just to read that data, rather than write or update anything in the database?
  • Does a user query frequently, requesting the same or repeated data, such as their own user profile?

To offload your web-layer traffic, you can move static content, such as images and videos, to content distribution networks from your web server.

At the server fleet level, you need to use a load balancer in order to distribute traffic, and you need to use auto-scaling to increase or shrink several servers in order to apply horizontal scaling. To reduce the database load, use the right database for the right need—a NoSQL database to store user sessions and review comments, a relational database for the transaction, and apply caching to store frequent queries.

Building resilient architecture

Design for failure, and nothing will fail. Having a resilient architecture means that your application should be available for customers while also recovering from failure.

From the security perspective, the Distributed Denial of Service (DDoS) attack has the potential to impact the availability of services and applications. Exposing your application through the content distribution network (CDN) will provide the inbuilt capability and adding the Web Application Firewall (WAF) rule can help to prevent unwanted traffic.

Resiliency needs to be applied in all the critical layers that affect the application’s availability to implement the design of failure. To achieve resiliency, the following best practices need to be applied in order to create a redundant environment:

  • Use the DNS server to route traffic between different physical locations so that your application will still be able to run in the case of entire region failure.
  • Use the CDN to distribute and cache static content such as videos, images, and static web pages near the user location, so that your application will still be available in case of a DDoS attack or local point of presence (PoP) location failure.
    Once traffic reaches a region, use a load balancer to route traffic to a fleet of servers so that your application should still be able to run even if one location fails within your region.
  • Use auto-scaling to add or remove servers based on user demand. As a result, your application should not get impacted by individual server failure.
  • Create a standby database to endure the high availability of the database, meaning that your application should be available in the instance of a database failure.

At the application level, it is essential to avoid cascading failure, where the failure of one component can bring down the entire system. There are different mechanisms available to handle cascading, such as applying timeout, traffic rejection, implementing the idempotent operation, and using circuit-breaking patterns.

Design for performance

Like resiliency, the solution architect needs to consider performance at every layer of architecture design. The team needs to put monitoring in place to continue to perform effectively, and work to improve upon it continuously. Better performance means more user engagements and increases in return on investment—high-performance applications are designed to handle application slowness due to external factors such as a slow internet connection.

At the server level, you need to choose the right kind of server depending upon your workload. For example, choose the right amount of memory and compute to handle the workload, as memory congestion can slow down application performance, and eventually, the server may crash. For storage, it is important to choose the right input/output operations per second (IOPS). For write-intensive applications, you need high IOPS to reduce latency and to increase disk write speed.

To achieve great performance, apply caching at every layer of your architecture design. The following are the considerations that are required to add caching to various layers of your application design:

  • Use browser cache on the user’s system to load frequently requested web pages.
  • Use the DNS cache for quick website lookup.
  • Use the CDN cache for high-resolution images and videos that are near to the user’s location.
  • At the server level, maximize the memory cache to serve user requests.
  • Use cache engines such as Redis and Memcached to serve frequent queries from the caching engine.
  • Use the database cache to serve frequent queries from memory.
  • Take care of cache expiration and cache eviction at every layer.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.