Solutions Architect Series – Part 6: Performance Considerations

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Design principles for architecture performance

For high-performant applications, you need to have low latency and high throughput at every layer of the architecture. Concurrency helps to process a large number of requests. You also learned the difference between parallelism and concurrency and got an insight into how caching can help to improve overall application performance.

Making a computational choice

Now, containers are also becoming popular as the need for automation and resource utilization is increased. Containers are becoming the preferred choice, especially in the area of microservice application deployment. The optimal choice of computing—whether you want to choose server instances, containers, or go for serverless—depends upon application use cases.

Choosing a storage

Storage is one of the critical factors for your application’s performance. Any software application needs to interact with storage for installation, logging, and accessing files.

Choosing the database

There are multiple factors to consider when choosing to use a database—for example, the access pattern can significantly impact the selection of database technology. You should optimize your database based on the access pattern.

  • Online transactional processing (OLTP): Most of the traditional relational databases are considered OLTP. The transactional database is the oldest and most popular method of storing and handling application data. Scaling can be tricky for the relational database as it can scale vertically and hit the upper limit of system capacity. For horizontal scaling, you have to read the replica for read scaling and partition for write scaling.
  • Nonrelational databases (NoSQL): NoSQL databases can store a large amount of data and provide low-access latency. They are easy to scale by adding more nodes when required and can support horizontal scaling out of the box. They can be an excellent choice to store user session data and can make your application stateless to achieve horizontal scaling without compromising user experience. You can develop a distributed application on top of the NoSQL database, which provides good latency and scaling, but query joining has to be handled at the application layer. NoSQL databases don’t support complex queries such as joining tables and entities.
  • Online analytical processing (OLAP): A query for a large volume of structured data for analytics purposes is better served by a data warehouse platform designed for faster access to structured data. Modern data warehouse technologies adopt the columnar format and use massive parallel processing (MPP), which helps to fetch and analyze data faster.
  • Building a data search: Oftentimes, you will need to search a large volume of data to solve issues quickly or get business insights. The ability to search your application data will help you to access detailed information and analyze it from different views. To search for data with low latency and high throughput, you need to have search engines as your technology choice.

Making the networking choice

DNS routing strategy

  • Simple routing policy: As the name suggests, this is the most straightforward routing policy and doesn’t involve any complications. It is useful to route traffic to a single resource—for example, a web server that serves content for a particular website.
  • Failover routing policy: This routing policy requires you to achieve high availability by configuring active–passive failover. If your application goes down in one region, then all the traffic can be routed to another region automatically.
  • Geolocation routing policy: If the user belongs to a particular location then you can use a geolocation policy. A geolocation routing policy helps to route traffic to a specific region.
  • Geoproximity routing policy: This is like a geolocation policy, but you have the option to shift traffic to other nearby locations when needed.
  • Latency routing policy: If your application is running in multiple regions, you can use a latency policy to serve traffic from the region where the lowest latency can be achieved.
  • Weighted routing policy: A weighted routing policy is used for A/B testing, where you want to send a certain amount of traffic to one region and increase this traffic as your trial proves more and more successful.

Implementing a load balancer

The load balancer can be physical or virtual. You need to choose a load balancer based on your application’s need. Commonly, two types of load balancer can be utilized by the application:

  • Layer 4 or network load balancer: Layer 4 load balancing routes packets based on information in the packet header—for example, source/destination IP addresses and ports. Layer 4 load balancing does not inspect the contents of a packet, which makes it less compute intensive and therefore faster. A network load balancer can handle millions of requests per second.
  • Layer 7 or application load balancer: Layer 7 load balancing inspects, and routes packets based on the full contents of the packet. Layer 7 is used in conjunction with HTTP requests. The materials that inform routing decisions are factors such as HTTP headers, URI path, and content type. This allows for more robust routing rules but requires more compute time to route packets. The application load balancer can route the request to containers in your cluster based on their distinctive port number.

Managing performance monitoring

Monitoring solutions can be categorized into active monitoring and passive monitoring solutions:

  • Active monitoring: you need to simulate user activity and identify any performance gap upfront. Application data and workload situations are always changing, which requires continuous proactive monitoring. Active monitoring works alongside passive monitoring as you run the known possible scenarios to replicate user experience. You should run active monitoring across all dev, test, and prod environments to catch any issue before it reaches the user.
  • Passive monitoring: tries to identify an unknown pattern in real time. For a web-based application, passive monitoring needs to collect important metrics from the browser that can cause performance issues. You can gather metrics from users regarding their geolocation, browser types, and device types to understand user experience and the geographic performance of your application. Monitoring is all about data, and it includes the ingestion, processing, and visualization of lots of data.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.