Recently, a client approached us with a request to migrate their product application's infrastructure from their current cloud provider to Azure. The primary goal was to unify their cloud environments onto a single platform.
We took this opportunity to take a closer look at the existing architecture. It is not uncommon to see an over-provisioned architecture in the wild. There are many possible reasons for that. For example:
- Improper assumptions about the application’s scale
- Adding extra servers to accommodate dropped requests
- Expanding memory to cover memory leaks
- An inexperienced DevOps team
- Lack of maintenance procedures
Often, these changes are meant to be temporary, but are forgotten after they outlived their purpose. Regardless of origin, the over-provisioned architecture means that you pay for something that you don’t use. In scale of a year it adds up.
In this case, we were able to save our client nearly $40 000 annually on cloud hosting.
Initial Assessment
We began by defining and evaluating the existing architecture, which included:
- Two virtual machines and a load balancer
- A dedicated bare-metal machine for the Postgres database
- An additional virtual machine for a staging environment
- CircleCI for running code linter and tests
The former architecture was significantly over-scaled for the project, resulting in unnecessarily high operational costs. This over-scaling stemmed from outdated assumptions that had not been re-evaluated. For instance, the database machine had 3 TB of storage, while the database's uncompressed dump was under one gigabyte. Given that the client's average throughput was just over 100 transactions per hour, it became evident that the existing infrastructure far exceeded operational needs.
Proposed New Architecture
Our next step was to design a new architecture tailored to the client's actual needs. The proposed streamlined architecture included:
- A single virtual machine using containerized services for the application, background jobs, Postgres database, and Redis database
- A second virtual machine mirroring the first for the staging environment
- Retention of CircleCI
After presenting the proposed architecture and the estimated new operational costs to the client, we received approval to proceed with the migration. Our primary considerations were to prevent downtime and ensure data integrity.
Migration Strategy
Two main focus points guided our migration strategy:
- Data Migration Between Postgres Instances: We opted for the traditional dump and restore method due to the relatively small amount of data and because neither the old nor the new Postgres database faced the internet directly. This ruled out a replication-based strategy.
- DNS Changes Propagation Time and Data Integrity: DNS changes can take up to 48 hours to propagate. During this time, users could be routed to either the new or old application instance, each accessing different databases. It was crucial to prevent the old instance from accepting orders and saving them in the old database.
To address these concerns, the final migration procedure included additional steps and the use of a temporary subdomain to redirect traffic to the new application instance. After thoroughly testing the migration procedure in a staging environment, we scheduled a migration window with the client and executed the plan. Thanks to proper planning and testing, the process went smoothly.
Conclusion
We successfully completed the client's migration task, and as a result, the monthly operational costs for running the application were slashed from $3700 to $600. This experience serves as a reminder that pre-emptive scaling is not always necessary. Regularly verifying resource utilization and making adjustments can lead to significant cost savings. If you need assistance with optimizing your infrastructure, please let us know!