By Eric Zhu and Hariharan Ananthakrishnan
Setting up the infrastructure for a product line in an entirely new region with over 340 distinct AWS resources could easily take weeks. At Artera, it takes hours.
Artera’s Platform Engineering team has developed automation using Infrastructure as Code to bring up a new AWS region fast. With parameterized opinionated modules using Terraform, this highly efficient code can support the compute, network, and storage resources for any AWS region.
Artera is an AI startup that develops medical artificial intelligence tests to personalise therapy for cancer patients. Artera is on a mission to personalise medical decisions for patients and physicians on a global scale.
Artera is expanding its offerings globally in many countries to provide precision cancer therapy to patients. Due to the nature of protected health information (PHI) and various country-specific regulations, the cloud infrastructure and patient data must be local to the specific country. Artera uses AWS as its cloud provider to host web applications and for machine learning training and inference. Artera must bring up the tech stack in each region to adhere to data governance.
Challenges
The cloud infrastructure that Artera uses today consists of AWS services such as VPC, Transit Gateways, Nat Gateways, Elastic Kubernetes Service (EKS), Elastic File System (EFS), etc. While we strive to keep the ecosystems simple, the inherent nature of leveraging a variety of technologies leads us to challenges such as:
- Configuration Drift: Infrastructure configurations and default behaviours change between software versions. Keeping them in sync with IaC is critical for security, consistency, and reliability.
- Mean Time To Resolve (MTTR): With hundreds of resources in play, we need to understand the resource configuration and dependencies quickly and declaratively. Our utmost priority is to keep the average time it takes to resolve the failure to a minimum fully.
- Security: To adhere to the highest security levels mandated in the healthcare industry and local government, we need customizable code to maintain without becoming spaghetti workflows.
- Slow Deployment: With various government and project timelines, bringing up the cloud infrastructure in a new region with multiple components will be slow without automation.
Region Expansion is a suite of Infrastructure as Code (IaC), using Terraform modules that provide automated and streamlined deployment to address the identified problems. This approach eliminates manual toil and ensures a reproducible, more secure, and practical deployment across various AWS regions.
Region Expansion
The terraform modules are meticulously crafted, focusing on core principles: Automation, Consistency, Security, and Rapid Deployment. Here is a detailed description of how each has been implemented:
- Orchestration: The modules streamline resource management using Terraform by adopting GitOps principles, focusing on consistency and automation. This method ensures that all infrastructure changes are systematically applied via code, significantly reducing deployment time and enhancing deployment uniformity. Changes to resources are carefully controlled through pull requests and peer review processes, reinforcing the consistency and reliability of infrastructure updates.
- StateHub: Each new region is assigned a dedicated and isolated workspace, which ensures that while the overall workflow benefits from centralised and shared code, each region’s infrastructure operates independently, providing clear state separation.
- Modularization: By creating parameterized opinionated modules, workflows are assembled like Lego blocks, providing the utmost flexibility. These modules encapsulate Artera’s policies and best practices.
- Continuous Scanning: By integrating the IaC scanning platform into the development lifecycle, potential security vulnerabilities and misconfigurations are proactively identified and mitigated at the earliest stages. IaC scanning allows the infrastructure to be efficient and secure by design.
Result
Previously, setting up and properly configuring all required resources in a new AWS region required a full-time employee (FTE) approximately 3 to 4 weeks to complete.
As a result of the Region Expansion modules, a new AWS region with around 340 resources can be brought up within 120 minutes.
Summary
By automating deployments and eliminating manual intervention, Region Expansion significantly reduced the deployment times. After several iterations of the architecture and some tuning, the solution has proven to be able to scale and perform.
We are utilising the infrastructure to deploy in several regions, enhancing operational efficiency and consistency while embodying Artera’s best practices.