Solutions Architect Series – Part 12: Learning Soft Skills to Become a Better Solution Architect (2/2)

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Design thinking

A solution architect has the primary role of system design, which makes design thinking an essential skill. Design thinking is one of the most successful approaches adopted across industries to solve a challenging and unclear problem. Design thinking helps you to look at problems and solutions from a different perspective, which you might not have considered in the first instance. Design thinking is more focused on delivering results by providing a solution-based approach to solve the problem. It helps to pretty much question the problem, solution, and associated risk, to come up with the most optimized strategy.

Being a builder by engaging in coding hands-on

A solution architect is a builder who learns by doing. A prototype is worth a thousand pictures. It helps to reduce miscommunication and ideate solutions. Presenting a POC and prototyping is an integral part of the solution architect’s role. Prototyping is the pre-solution phase, which helps to deepen your understanding of the application design and user. It helps you to think and build multiple solution paths. With the testing of the prototype, you can refine your solution and inspire others, such as teams, customers, and investors, by demoing your vision.

Becoming better with continuous learning

Solution architects need to continually absorb new knowledge and enhance their skill set to help the organization in better decision making. Continuous learning keeps you relevant and builds confidence. It opens up your mind and changes prospects. Learning could be challenging with a full-time job and a busy family life. Continuous learning is about developing the habit of always learning something new, whereby you have to be motivated and disciplined. You first need to set up learning goals and apply effective time management to achieve them. This often slips through the net when you get busy with regular daily work.

Here are some of the ways to engage yourself in constant learning:

  • Learning new technologies, frameworks, and languages by trying them out
  • Learning new skills by reading books and tutorials
  • Keeping up with technology news and developments by reading articles on websites and blogs
  • Writing your blog, whitepaper, and book
  • Solidify your knowledge by teaching others
  • Taking online classes
  • Learning from teammates
  • Attending and participating in user groups and conferences

Being a mentor to others

Mentoring is about helping others and setting them up for success based on your learning and experience. It is an effective way to develop leaders by having one-to-one mentor/mentee relationships. To be a good mentor, you need to establish an informal communication style where the mentee can develop a comfort zone. The mentee can seek advice in multiple areas such as career development, or in personal aspects such as work-life balance. You should do an informal needs assessment and set up mutual goals and expectations.

Becoming a technology evangelist and thought leader

Technology evangelism is about being an expert, to advocate technology and your product. Some organizations with a large product base roll out a separate technology evangelist role, but often, a solution architect needs to take the role of an evangelist as part of their job. As a technology evangelist, you need to be between people to understand real-world problems and advocate your technology to solve their business concerns.

Solutions Architect Series – Part 11: Learning Soft Skills to Become a Better Solution Architect (1/2)

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Acquiring pre-sales skills

Pre-sales is a critical phase for complex technology procurement, whereby the customer collects detailed information to make a buying decision. In the customer organization, a solution architect is involved in the pre-sales cycle to procure technology and infrastructure resources from various vendors. In the vendor organization, the solution architect needs to respond to customers’ requests for proposals (RFP) and present a potential solution to acquire new business for an organization. Pre-sales requires a unique skill set that combines strong technical knowledge with soft skills, including:

  • Communication and negotiation skills: Solution architects need to have excellent communication skills to engage the customer with the correct and latest details. Presenting precise details of the solution along with industry relevance helps customers to understand how your solution can address their business concerns. Solution architects work as a bridge between the sales and technical teams, which makes communication and coordination a critical skill.
  • Listening and problem-solving skills: Solution architects need to have strong analytical skills to identify the right solution as per the customer need. The first thing is to listen to and understand customer use cases by asking the right questions to create a good solution.
  • Customer-facing skills: Often, the solution architect needs to work with both the internal team and the external customer’s team.
  • Working with teams: The solution architect establishes a relationship with both the business team and the product team.

Presenting to C-level executives

A solution architect needs to handle various challenges from a technical and business perspective. However, one of the most challenging tasks could be to get executive buy-in. Senior executives such as the Chief Executive Officer (CEO), Chief Technology Officer (CTO), Chief Financial Officer (CFO), and Chief Information Officer (CIO) are regarded as C-level as they have a tight schedule and need to make lots of high-stack decisions. As a solution architect, you may have lots of details to present, but your C-level meetings are very time-bound. Here, they need to make the maximum value of their meeting in the allotted time slot.

  • The primary question is: How to get senior executives’ attention and support in a limited time? Often, during any presentation, people tend to put a summary slide at the end, while, in the case of executive meetings, your time may further reduce as per their priority and agenda. The key to an executive presentation is to summarize the primary points upfront in the first 5 minutes. You should prepare in such a way that if your 30-minutes slot reduces to 5 minutes, you should still be able to convey your points and get buy-in for the next step.
  • Explain your agenda and meeting structure even before the summary. Executives ask lots of questions to make proper utilization of their time, and your agenda should convey that they will get the chance to ask a clarification question.
  • Don’t try to present everything in detail by stating information that may seem relevant from your perspective but maybe doesn’t make much sense for an executive audience.

You should be ready to answer the following questions that concern executives more:

  • How the proposed solution will benefit our customers?
  • What assumption did you make to baseline the solution?
  • What will be my ROI?: Executives are always looking for ROI by determining the total cost of ownership (TCO). Be ready with data to provide an estimated cost of ownership, solution maintenance cost, training cost, overall cost savings, and so on.
  • What happens if we continue as it is today and do nothing?
  • What will be our competitor’s reaction in regard to your solution?
  • What is your suggestion, and how can I help?

Taking ownership and accountability

  • Taking ownership and positioning yourself as a leader helps you to win trust with accountability. Ownership doesn’t mean that you need to execute things alone; it is more about taking new initiatives and holding on to them as it is your organization.
  • Accountability is about taking responsibility to drive the outcome. Ownership and accountability go hand in hand, where you are not only creating initiative but working on getting the result. People can trust you to execute any job and drive results. Accountability helps you to build trust with your customers and team, which ultimately results in a better work environment and achieving a goal.

Thinking big

Solution architects should have the ability to see the big picture and think ahead. A solution architect creates a foundation upon which the team puts building blocks and launches the product. Thinking big is one of the critical skills that solution architects should possess to think about the long-term sustainability of an application. Thinking big doesn’t mean you need to take a very unrealistic goal. Your goal should be high enough to challenge you and bring you out of your comfort zone. Thinking big is critical for success at both a personal and an organizational level.

Being flexible and adaptable

Adaptability and flexibility go hand in hand, and you need to be flexible to adapt to the new environment, working culture, and technology. Adaptability means you are always open to new ideas and to working with the team. Teams may adopt a process and technology that is best suited for them. As a solution architect, you need to be flexible in accommodating team requirements during solution design.

Solutions Architect Series – Part 10: Solution Architecture Document (2/2)

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Structure of the SAD

Solution overview: In the solution overview section, you need to give a brief introduction about the solution in a couple of paragraphs, describing the functioning of the solution and its different components at a very high level.

  • Solution purpose: This provides a brief about a business concern that the solution is solving and the justification to build a given solution.
  • Solution scope: This states the business scope that the proposed solution will address. Clarity describes out-of-scope items that the solution will not accommodate.
  • Solution assumptions: List down all the assumptions based on which solution architect came up with the solution—for example, minimum network bandwidth availability.
  • Solution constraints: List all technical, business, and resource constraints. Often, constraints come from industry and government compliances, and these need to be listed in this section. You can also highlight the risk and mitigation plan.
  • Solution dependencies: List all upstream and downstream dependencies. For example, an e-commerce website needs to communicate with a shipping system such as UPS or FedEx to ship a package to customers.
  • Key architecture decisions: List major problem statements and the corresponding proposed solution options. Describe the pros and cons of each option, why a particular decision was made, and the rationale behind it.

Business context: In the business context section, the solution architect needs to provide a high-level overview of business capabilities and requirements that the solution is going to address.

  • Business capabilities: Provide a brief description of business capabilities for which the solution is being designed. Make sure to include the benefits of capabilities and how they will address customer needs.
    Key business requirements: List all key business concerns that the solution is going to address. Provide a high-level view of key requirements and add a reference to the detailed requirements document.
    Key business processes: Solution architects should show key processes with a business process document.
  • Business stakeholders: List stakeholders who are directly or indirectly impacted by the project. This includes sponsors, developers, end users, vendors, partners, and so on.
    NFRs: Solution architects need to focus more on NFRs as these often get missed by the business user and development team.

Conceptual solution overview: The conceptual solution overview section provides an abstract-level diagram that captures a big-picture view of the whole solution, which includes both business and technical aspects.

Solution architecture: The solution architecture section dives deep into each part of the architecture and provides different views that the technical team can use to create a detailed design and work on implementation.

  • Information architecture: This section provides a user navigation flow to the application. At a high level, the solution architect needs to put in an application navigation structure.
  • Application architecture: This section targets the development team. It provides more implementation details upon which a software architect or development team can build a detailed design.
  • Data architecture: This section is primarily utilized by the database admin and development team to understand database schemas and how tables are related to each other.
  • Integration architecture: This section mainly targets vendors, partners, and other teams.
  • Infrastructure architecture: This section is primarily targeted at the infrastructure team and system engineers. The solution architect needs to include the deployment diagram, which can give a view of the logical server location and its dependencies.
  • Security architecture: This section includes all the security and compliance aspects of the application.

Solution delivery: The solution delivery section includes essential considerations to develop and deploy a solution.

  • Development: This section is essential for the development team. It talks about development tools, programming language, code repository, code versioning, and branching, with the rationale behind choices.
  • Deployment: This section mainly focuses on DevOps engineers, and talks about the deployment approach, deployment tools, various deployment components, and deployment checklist, with the rationale behind choices.
  • Data migration: This section helps the team to understand data migration and the ingestion approach, scope of data migration, various data objects, data ingestion tools used, source of data and data format, and so on.
  • Application decommissioning: This section lists existing systems that need to be decommissioned and an exit strategy for the current system if the return on investment (ROI) is not being realized. The solution architect needs to provide an approach and timeline for decommissioning the old system and carry out an overall impact assessment.

Solution management: The solution management section is focused on production support and ongoing system maintenance across other non-product environments. The solution management section is primarily targeted at the operations management team.

  • Operational management such as system patching and upgrades of dev, test, staging, and prod environments.
  • Tools to manage application upgrades and new releases.
  • Tools to manage system infrastructure.
  • System monitoring and alerts; operations dashboard.
  • Production support, SLA, and incident management.
  • Disaster recovery and Business Process Continuation (BPC).

Solutions Architect Series – Part 9: Solution Architecture Document (1/2)

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

The Solution Architecture Document (SAD) provides an end-to-end view of application and helps everyone to be on the same page. In this chapter, you will learn about various aspects of the SAD, which addresses the need for all stakeholders associated with the development of the application.

The SAD helps to achieve the following purposes:

  • Communicate the end-to-end application solution to all stakeholders.
  • Provide high-level architecture and different views of the application design to address the application’s service-quality requirements such as reliability, security, performance, and scalability.
  • Provide traceability of the solution back to business requirements and look at how the application is going to meet all functional and non-functional requirements (NFRs).
  • Provide all views of the solution required for design, build, testing, and implementation.
  • Define the impacts of the solution for estimation, planning, and delivery purposes.
  • Define the business process, continuation, and operations needed for a solution to work uninterrupted after the production launch.

Views of the SAD

  • Business View: Architecture design is all about addressing business concerns and solving business purposes. The Business View shows the value proposition of the overall solution and product. To simplify, the solution architect may choose to detect high-level scenarios related to business and present these as a use-case diagram. The Business View also describes stakeholders and the required resources to execute the project. You can define the Business View as a use-case view as well.
  • Logical View: This presents various packages on the system so that business users and designers can understand the various logical components of the system. The logical view offers a chronicled order of the system in which it should build. It shows how the multiple packages of the system are connected and how the user can interact with them. For example, in a banking application, the user first needs to authenticate and authorize using a security package, and then log in to the account using the account package, or apply for a loan using a loan package, and so on. Here, each package represents a different module and can be built as a microservice.
  • Process View: This presents more details, showing how the key processes of the system work together. It can be reflected using a state diagram. The solution architect can create a sequence diagram if you want to show more details. In a banking application, a process view can present the approval of a loan or account.
  • Deployment View: This presents how the application is going to work in the production environment. It shows how different components of the system (such as network firewall, load balancer, application servers, database, and so on) are connected. The solution architect should create a simple block diagram that business users can understand. You can add more details to the UML deployment diagram to show various node components and their dependencies for technical users, such as the development and DevOps teams. The deployment view represents the physical layout of the system.
  • Implementation View: This is the core of the SAD, and represents architectural and technology choices. The solution architect needs to put the architecture diagram here—for example, if it is 3-tier, N-tier, or event-driven architecture, along with the reasoning behind it. You also need to detail technology choices—for example, using Java versus Node.js, along with the pros and cons of using them. You want to justify the resources and skills required to execute the project in the implementation view. The development team uses an implementation view to create a detailed design such as a class diagram, but that doesn’t need to be part of the SAD.
  • Data View: As most applications are data-driven, this makes the data view important. The data view represents how data is going to flow between the different components and how it will be stored. It can also be used to explain data security and data integrity. The solution architect can use the entity-relationship (ER) diagram to show the relationship between different tables and schemas in the database. The data view also explains the reports and analytics needed.
  • Operational View: This explains how the system is going to be maintained post-launch. Often, you define service-level agreements (SLAs), alert and monitoring functionality, a disaster recovery plan, and a support plan for the system. The operational view also provides details of how system maintenance is going to be carried out, such as by deployment of a bug fix, patching, backup and recovery, handling security incidents, and so on.

You may choose to include additional views—such as a physical architecture view, a network architecture view, and a security (controls) architecture view, and so on—as per the stakeholder’s requirement.

Solutions Architect Series – Part 8: Architectural Reliability Considerations

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Design principles for architectural reliability

The goal of reliability is to keep the impact of any failure to the smallest area possible. By preparing your system for the worst, you can implement a variety of mitigation strategies for the different components of your infrastructure and applications.

  • Making systems self-healing: System failure needs to be predicted in advance, and in the case of failure incidence, you should have an automated response for system recovery, which is called system self-healing.
  • Applying automation: Automation is the key to improving your application’s reliability. Try to automate everything from application deployment and configuration to the overall infrastructure.
  • Creating a distributed system: Monolithic applications have low reliability when it comes to system uptime, as one small issue in a particular module can bring down the entire system. Dividing your application into multiple small services reduces the impact area, so that issue is one part of the application shouldn’t impact the whole system, and the application can continue to serve critical functionality.
    However, the communication mechanism can be complicated in a distributed system. You need to take care of system dependencies by utilizing the circuit breaker pattern.
  • Monitoring capacity: Resource saturation is the most common reason for application failure. Often, you will encounter the issue where your applications start rejecting requests due to CPU, memory, or hard disk overload. Adding more resources is not always a straightforward task as you should have additional capacity available when needed.
  • Performing recovery validation: When it comes to infrastructure validation, most of the time, organizations focus on validating a happy path where everything is working. Instead, you should validate how your system fails and how well your recovery procedures work. Validate your application, assuming everything fails all the time. Don’t just expect that your recovery and failover strategies will work. Make sure to test them regularly, so you’re not surprised if something does go wrong.
  • Recoverability is sometimes overlooked as a component of availability. To improve the system’s Recovery Point Objective (RPO) and Recovery Time Objective (RTO), you should back up data and applications along with their configuration as a machine image. You will learn more about RTO and RPO in the next section. In the event that a natural disaster makes one or more of your components unavailable or destroys your primary data source, you should be able to restore the service quickly and without lost data.
  • Start small and build as needed: Make sure to streamline the first step of taking a backup. Most of the time, organizations lose data as they didn’t have an efficient backup strategy. Take a backup of everything, whether it is your file server, machine image, or databases.

Improving reliability with the cloud

In the cloud, easy monitoring and tracking help to make sure your application is highly available as per the SLA. The cloud enables you to have fine control over IT resources, cost, and handling trade-offs for RPO/RTO requirements. Data recovery is critical for application reliability. Data resources and locations must align with RTOs and RPOs.

With the cloud, you can design a scalable system, which can provide flexibility to add and remove resources automatically to match the current demand. Data is one of the essential aspects of any application’s reliability. The cloud provides out-of-the-box data backup and replication tools, including machine images, databases, and files. In the event of a disaster, all of your data is backed up and appropriately saved in the cloud, which helps the system to recover quickly.

Solutions Architect Series – Part 7: Security Considerations

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Security is always at the center of architecture design

Designing principles for architectural security

  • Implementing authentication and authorization control: The purpose of authentication is to determine if a user can access the system with the provided credentials of user ID and password. While authorization determines what a user can do once they are inside the system, you should create a centralized system to manage your user’s authentication and authorization.
  • Applying security everywhere: Often, organizations have a main focus of ensuring the physical safety of their data center and protecting the outer networking layer from any attack. Instead of just focusing on a single outer layer, ensure that security is applied at every layer of the application.
  • Reducing blast radius: While applying security measures at every layer, you should always keep your system isolated in a small pocket to reduce the blast radius. If attackers get access to one part of the system, you should be able to limit a security breach to the smallest possible area of the application.
  • Monitoring and auditing everything all the time: Put the logging mechanism for every activity in your system and conduct a regular audit. Audit capabilities are often also required from various industry-compliance regulations.
  • Automating everything: Automation is an essential way to apply quick mitigation for any security-rule violation. You can use automation to revert changes against desired configurations and alert the security team
  • Protecting data: Data is at the center of your architecture, and it is essential to secure and protect it. Most of the compliance and regulation in place are there to protect customer data and identity.
  • Preparing a response: Keep yourself ready for any security events. Create an incident management process as per your organizational policy requirements.

Web security mitigation

Security needs to be applied to every layer, and special attention is required for the web layer due to its exposure to the world. For web protection, important steps include keeping up with the latest security patches, following the best software development practices, and making sure proper authentication and authorization are carried out. There are several methods to protect and secure web applications; such as Web Application Firewall (WAF), DDoS mitigation

Data security

Before architecting any solution, you should define basic security practices as per the application objective, such as complying with regulatory requirements. There are several different approaches used when addressing data protection. The following section describes how to use these approaches.

Data classification

At a high level, you can classify data into the following categories:

  • Restricted data: This contains information that could harm the customer directly if it got compromised. Mishandling of restricted data can damage a company’s reputation and impact a business adversely. Restricted data may include customer Personally Identifiable Information (PII) data such as social security numbers, passport details, credit card numbers, and payment information.
  • Private data: Data can be categorized as confidential if it contains customer-sensitive information that an attacker can use to plan to obtain their restricted data. Confidential data may include customer email IDs, phone numbers, full names, and addresses.
  • Public data: This is available and accessible to everyone, and requires minimal protection—for example, customer ratings and reviews, customer location, and customer username if the user made it public.

Data encryption

  • Symmetric-key encryption: With symmetric encryption algorithms, the same key is used to encrypt and decrypt the data. Earlier, symmetric encryption used to be applied as per the Data Encryption Standard (DES), which used a 56-bit key. Now, the Advanced Encryption Standard (AES) is heavily used for symmetric encryption, which is more reliable as it uses a 128-bit, 192-bit, or 256-bit key.
  • Asymmetric-key encryption: With the help of asymmetric algorithms, two different keys can be used, one to encrypt and one to decrypt. In most cases, the encryption key is a public key and the decryption key is a private key. Asymmetric key encryption is also known as public-key encryption. Rivest–Shamir–Adleman (RSA) is one of the first and most popular public key-encryption algorithms used to secure data transmissions over the network. The private key is only available to one user, while the public key can be distributed across multiple resources. Only the user who has a private key can decrypt the data.

Data encryption at rest and in transit

Data at rest means it is stored somewhere such as a storage area network (SAN) or network-attached storage (NAS) drive, or in cloud storage. All sensitive data needs to be protected by applying symmetric or asymmetric encryption, explained in the previous section, with proper key management.

The cloud’s shared security responsibility model

https://aws.amazon.com/compliance/shared-responsibility-model/

Security and compliance certifications

There are many compliance certifications depending on your industry and geographical location to protect customer privacy and secure data. For any solution design, compliance requirements are among the critical criteria that need to be evaluated. The following are some of the most popular industry-standard compliances:

  • Global compliance includes certifications that all organizations need to adhere to, regardless of their region. These include ISO 9001, ISO 27001, ISO 27017, ISO 27018, SOC 1, SOC 2, SOC 3, and CSA STAR for cloud security.
  • The US government requires various kinds of compliance to handle public sector workload. These include FedRAMP, DoD SRG Level-2, 4, and 5, FIPS 140, NIST SP 800, IRS 1075, ITAR, VPAT, and CJIS.
  • Industry-level compliance of application apply to a particular industry. These include PCI DSS, CDSA, MPAA, FERPA, CMS MARS-E, NHS IG Toolkit (in the UK), HIPAA, FDA, FISC (in Japan), FACT (in the UK), Shared Assessment, and GLBA.
  • Regional compliance certification applies to a particular country or region. These include EU GDPR, EU Model Clauses, UK G-Cloud, China DJCP, Singapore MTCS, Argentina PDPA, Australia IRAP, India MeitY, New Zealand GCIO, Japan CS Mark Gold, Spain ENS and DPA, Canada Privacy Law, and US Privacy Shield.

Solutions Architect Series – Part 6: Performance Considerations

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Design principles for architecture performance

For high-performant applications, you need to have low latency and high throughput at every layer of the architecture. Concurrency helps to process a large number of requests. You also learned the difference between parallelism and concurrency and got an insight into how caching can help to improve overall application performance.

Making a computational choice

Now, containers are also becoming popular as the need for automation and resource utilization is increased. Containers are becoming the preferred choice, especially in the area of microservice application deployment. The optimal choice of computing—whether you want to choose server instances, containers, or go for serverless—depends upon application use cases.

Choosing a storage

Storage is one of the critical factors for your application’s performance. Any software application needs to interact with storage for installation, logging, and accessing files.

Choosing the database

There are multiple factors to consider when choosing to use a database—for example, the access pattern can significantly impact the selection of database technology. You should optimize your database based on the access pattern.

  • Online transactional processing (OLTP): Most of the traditional relational databases are considered OLTP. The transactional database is the oldest and most popular method of storing and handling application data. Scaling can be tricky for the relational database as it can scale vertically and hit the upper limit of system capacity. For horizontal scaling, you have to read the replica for read scaling and partition for write scaling.
  • Nonrelational databases (NoSQL): NoSQL databases can store a large amount of data and provide low-access latency. They are easy to scale by adding more nodes when required and can support horizontal scaling out of the box. They can be an excellent choice to store user session data and can make your application stateless to achieve horizontal scaling without compromising user experience. You can develop a distributed application on top of the NoSQL database, which provides good latency and scaling, but query joining has to be handled at the application layer. NoSQL databases don’t support complex queries such as joining tables and entities.
  • Online analytical processing (OLAP): A query for a large volume of structured data for analytics purposes is better served by a data warehouse platform designed for faster access to structured data. Modern data warehouse technologies adopt the columnar format and use massive parallel processing (MPP), which helps to fetch and analyze data faster.
  • Building a data search: Oftentimes, you will need to search a large volume of data to solve issues quickly or get business insights. The ability to search your application data will help you to access detailed information and analyze it from different views. To search for data with low latency and high throughput, you need to have search engines as your technology choice.

Making the networking choice

DNS routing strategy

  • Simple routing policy: As the name suggests, this is the most straightforward routing policy and doesn’t involve any complications. It is useful to route traffic to a single resource—for example, a web server that serves content for a particular website.
  • Failover routing policy: This routing policy requires you to achieve high availability by configuring active–passive failover. If your application goes down in one region, then all the traffic can be routed to another region automatically.
  • Geolocation routing policy: If the user belongs to a particular location then you can use a geolocation policy. A geolocation routing policy helps to route traffic to a specific region.
  • Geoproximity routing policy: This is like a geolocation policy, but you have the option to shift traffic to other nearby locations when needed.
  • Latency routing policy: If your application is running in multiple regions, you can use a latency policy to serve traffic from the region where the lowest latency can be achieved.
  • Weighted routing policy: A weighted routing policy is used for A/B testing, where you want to send a certain amount of traffic to one region and increase this traffic as your trial proves more and more successful.

Implementing a load balancer

The load balancer can be physical or virtual. You need to choose a load balancer based on your application’s need. Commonly, two types of load balancer can be utilized by the application:

  • Layer 4 or network load balancer: Layer 4 load balancing routes packets based on information in the packet header—for example, source/destination IP addresses and ports. Layer 4 load balancing does not inspect the contents of a packet, which makes it less compute intensive and therefore faster. A network load balancer can handle millions of requests per second.
  • Layer 7 or application load balancer: Layer 7 load balancing inspects, and routes packets based on the full contents of the packet. Layer 7 is used in conjunction with HTTP requests. The materials that inform routing decisions are factors such as HTTP headers, URI path, and content type. This allows for more robust routing rules but requires more compute time to route packets. The application load balancer can route the request to containers in your cluster based on their distinctive port number.

Managing performance monitoring

Monitoring solutions can be categorized into active monitoring and passive monitoring solutions:

  • Active monitoring: you need to simulate user activity and identify any performance gap upfront. Application data and workload situations are always changing, which requires continuous proactive monitoring. Active monitoring works alongside passive monitoring as you run the known possible scenarios to replicate user experience. You should run active monitoring across all dev, test, and prod environments to catch any issue before it reaches the user.
  • Passive monitoring: tries to identify an unknown pattern in real time. For a web-based application, passive monitoring needs to collect important metrics from the browser that can cause performance issues. You can gather metrics from users regarding their geolocation, browser types, and device types to understand user experience and the geographic performance of your application. Monitoring is all about data, and it includes the ingestion, processing, and visualization of lots of data.

Solutions Architect Series – Part 5: Avoiding anti-patterns in solution architecture

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Often, the teams can drift away from best practices due to timeline pressure or the unavailability of resources. You always need to give special attention to the following architecture design anti-patterns:

In an anti-pattern (an example of a poorly designed system), scaling is done reactively and manually. When application servers reach their full capacity with no more room, users are prevented from accessing the application. On user complaints, the admin finds out that the servers are at their full capacity and starts launching a new instance to take some of the load off. Unfortunately, there is always a few minutes’ lag between the instance launch and its availability. During this period, users are not able to access the application.

With anti-patterns, automation is missing. When application servers crash, the admin manually launches and configures the new server and notifies the users manually. Detecting unhealthy resources and launching replacement resources can be automated, and you can even notify when resources are changed.

With anti-patterns, the server is kept for a long time with hardcoded IP addresses, which prevent flexibility. Over time, different servers end up in different configurations and resources are running when they are not needed. You should keep all of the servers identical and should have the ability to switch to a new IP address. You should automatically terminate any unused resources.

With anti-patterns, an application is built in a monolithic way, where all layers of architecture including web, application, and data layers are tightly coupled and server dependent. If one server crashes, it brings down the entire application. You should keep the application and web layer independent by adding a load balancer in between. If one of the app servers goes down, the load balancer automatically starts directing all of the traffic to the other healthy servers.

With anti-patterns, the application is server bound, and the server communicates directly with each other. User authentication and sessions are stored in the server locally and all static files are served from the local server. You should choose to create an SOA, where the services talk to each other using a standard protocol such as HTTP. User authentication and sessions should be stored in low latency-distributed storage so that the application can be scaled horizontally. The static asset should be stored in centralized object storage that is decoupled from the server.

With anti-patterns, a single type of database is used for all kinds of needs. You are using a relational database for all needs, which introduces performance and latency issues. You should use the right storage for the right need, such as the following:

  • NoSQL to store user session
  • Cache data store for low latency data availability
  • Data warehouse for reporting needs
  • Relation database for transactional data

With anti-patterns, you will find a single point of failure by having a single database instance to serve the application. Wherever possible, eliminate single points of failure from your architectures. Create a secondary server (standby) and replicate the data. If the primary database server goes offline, the secondary server can pick up the load.

With anti-patterns, static content such as high-resolution images and videos are served directly from the server without any caching. You should consider using a CDN to cache heavy content near the user location, which helps to improve page latency and reduce page load time.

With anti-patterns, you can find security loopholes that open server access without a fine-grained security policy. You should always apply the principle of least privilege, which means starting with no access and only giving access to the required user group.

Solutions Architect Series – Part 4: Principles of Solution Architecture Design 2/2

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Think loose coupling

In modern design, microservice architecture is becoming highly popular, which facilitates the decoupling of an application component. The loosely coupled design has many benefits, from providing scalability and high availability, to ease of integration.

With loose coupling, you can add an intermediate layer such as a load balancer or a queue, which automatically handles failures or scaling for you.

Queue-based decoupling enables asynchronous linking of systems, where one server is not waiting for a response from another server and it is working independently. This method lets you increase the number of virtual servers that receive and process the messages in parallel. If there is no image to process, you can configure auto-scaling in order to terminate the excess servers.

Using the right storage for the right need

Solution architects need to consider multiple factors while choosing the data storage to match the right technology. Here are the important ones:

  • Durability requirement: How should data be stored to prevent data corruption?
  • Data availability: Which data storage system should be available to deliver data?
  • Latency requirement: How fast should the data be available?
  • Data throughput: What is the data read and write need?
  • Data size: What is the data storage requirement?
  • Data load: How many concurrent users need to be supported?
  • Data integrity: How to maintain the accuracy and consistency of data?
  • Data queries: What will be the nature of queries?

While choosing storage options, you need to consider the temperature of the data, which could be hot, warm, or cold:

  • For hot data, you are looking for sub-millisecond latency and required cache data storage. Some examples of hot data are stock trading and making product recommendations in runtime.
  • For warm data, such as financial statement preparation or product performance reporting, you can live with the right amount of latency, from seconds to minutes, and you should use a data warehouse or a relational database.
  • For cold data, such as storing 3 years of financial records for audit purposes, you can plan latency in hours, and store it in archive storage.

Adding security everywhere

The following are the security aspects that need to be considered during the design phase:

  • Physical security of data center: All IT resources in data centers should be secure from unauthorized access.
  • Network security: The network should be secure to prevent any unauthorized server access.
  • Identity and Access Management (IAM): Only authenticated users should have access to the application, and they can do the activity as per their authorization.
  • Data security in-transit: Data should be secure while traveling over the network or the internet.
  • Data security at rest: Data should be secure while stored in the database or any other storage.
  • Security monitoring: Any security incident should be captured, and the team alerted to act.

Automating everything

When designing a solution, think about what can be automated. Consider the following components to be automated in your solution:

  • Application testing: You need to test your application every time you make any changes to make sure that nothing breaks.
  • IT infrastructure: You can automate your infrastructure by using infrastructure as code scripting.
  • Logging, monitoring, and alerting: Monitoring is a critical component, and you want to monitor everything every time. Also, based on monitoring, you may want to take automated action such as scaling up your system or alerting your team to act.
  • Deployment automation: Deployment is a repeatable task that is very time consuming and delays the last-minute launch in many real-time scenarios.
  • Security automation: While automating everything, don’t forget to add automation for security.

Solutions Architect Series – Part 3: Principles of Solution Architecture Design 1/2

This is my learning note from the book Solutions Architect’s Handbook written by Saurabh Shrivastava and Neelanjali Srivastav. All the contents are mostly distilled and copied from the book. I recommend you to buy this book to support the authors.

Another series: Fundamentals of Software Architecture: An Engineering Approach

Scaling workload

Scaling could be predictive if you are aware of your workload, which is often the case; or it could be reactive if you get a sudden spike or if you have never handled that kind of load before.

Predictive scaling

Is the best-case approach that any organization wants to take. Often, you can collect historical data of application workload, for example, an e-commerce website such as Amazon may have a sudden traffic spike, and you need predictive scaling to avoid any latency issues. Traffic patterns may include the following:

  • Weekends have three times more traffic than a weekday.
  • Daytime has five times more traffic than at night.
  • Shopping seasons, such as Thanksgiving or Boxing Day, have 20 times more traffic than regular days.
  • Overall, the holiday season in November and December has 8 to 10 times more traffic than during other months.

Reactive scaling

You will need to understand your existing architecture and traffic patterns, along with an estimate of the desired traffic. You also need to understand the navigation path of the website. For example, the user has to log in to buy a product, which can lead to more traffic on the login page.

In order to plan for the scaling of your server resources for traffic handling, you need to determine the following patterns:

  • Determine web pages, which are read-only and can be cached.
  • Which user queries need just to read that data, rather than write or update anything in the database?
  • Does a user query frequently, requesting the same or repeated data, such as their own user profile?

To offload your web-layer traffic, you can move static content, such as images and videos, to content distribution networks from your web server.

At the server fleet level, you need to use a load balancer in order to distribute traffic, and you need to use auto-scaling to increase or shrink several servers in order to apply horizontal scaling. To reduce the database load, use the right database for the right need—a NoSQL database to store user sessions and review comments, a relational database for the transaction, and apply caching to store frequent queries.

Building resilient architecture

Design for failure, and nothing will fail. Having a resilient architecture means that your application should be available for customers while also recovering from failure.

From the security perspective, the Distributed Denial of Service (DDoS) attack has the potential to impact the availability of services and applications. Exposing your application through the content distribution network (CDN) will provide the inbuilt capability and adding the Web Application Firewall (WAF) rule can help to prevent unwanted traffic.

Resiliency needs to be applied in all the critical layers that affect the application’s availability to implement the design of failure. To achieve resiliency, the following best practices need to be applied in order to create a redundant environment:

  • Use the DNS server to route traffic between different physical locations so that your application will still be able to run in the case of entire region failure.
  • Use the CDN to distribute and cache static content such as videos, images, and static web pages near the user location, so that your application will still be available in case of a DDoS attack or local point of presence (PoP) location failure.
    Once traffic reaches a region, use a load balancer to route traffic to a fleet of servers so that your application should still be able to run even if one location fails within your region.
  • Use auto-scaling to add or remove servers based on user demand. As a result, your application should not get impacted by individual server failure.
  • Create a standby database to endure the high availability of the database, meaning that your application should be available in the instance of a database failure.

At the application level, it is essential to avoid cascading failure, where the failure of one component can bring down the entire system. There are different mechanisms available to handle cascading, such as applying timeout, traffic rejection, implementing the idempotent operation, and using circuit-breaking patterns.

Design for performance

Like resiliency, the solution architect needs to consider performance at every layer of architecture design. The team needs to put monitoring in place to continue to perform effectively, and work to improve upon it continuously. Better performance means more user engagements and increases in return on investment—high-performance applications are designed to handle application slowness due to external factors such as a slow internet connection.

At the server level, you need to choose the right kind of server depending upon your workload. For example, choose the right amount of memory and compute to handle the workload, as memory congestion can slow down application performance, and eventually, the server may crash. For storage, it is important to choose the right input/output operations per second (IOPS). For write-intensive applications, you need high IOPS to reduce latency and to increase disk write speed.

To achieve great performance, apply caching at every layer of your architecture design. The following are the considerations that are required to add caching to various layers of your application design:

  • Use browser cache on the user’s system to load frequently requested web pages.
  • Use the DNS cache for quick website lookup.
  • Use the CDN cache for high-resolution images and videos that are near to the user’s location.
  • At the server level, maximize the memory cache to serve user requests.
  • Use cache engines such as Redis and Memcached to serve frequent queries from the caching engine.
  • Use the database cache to serve frequent queries from memory.
  • Take care of cache expiration and cache eviction at every layer.