Amazon RDS has long been lauded for its scalability, particularly its ease in expanding storage size. However, the inverse—reducing storage size—has historically been a complex, error-prone process. With the December 2024 introduction of RDS blue/green deployments for resizing, AWS has provided a smoother pathway for this intricate operation. Here, we’ll explore the process, its nuances, and practical lessons learned along the way.
Blue/green deployment is a well-established strategy for rolling out changes with minimal disruption. It involves creating a copy of the production environment (the ‘green’ environment), applying updates, and then switching production traffic to this updated environment. The original production setup is the ‘blue’ environment. Once the green environment proves stable, the old blue environment can be retired.
AWS RDS’s blue/green deployment feature embraces this approach. It enables users to modify various aspects of an RDS instance, including storage and engine version. For reducing instance size, it’s critical to focus solely on storage changes, as combining storage reduction with engine version upgrades can lead to conflicts.
1. Initiate a Blue/Green Deployment: Navigate to the RDS console, select the desired instance, click ‘Actions,’ and choose ‘Create Blue/Green Deployment.’ In the setup wizard, ensure you select the same engine version as the current instance and adjust the storage size to the desired reduced value.
2. Calculate New Storage Size: AWS requires the ‘green’ instance to have at least 20% more storage than the currently used capacity of the ‘blue’ instance. For example, if your current instance has 10 TB of allocated storage with 6.5 TB in use, AWS documentation suggests you should be able to reduce to approximately 7.8 TB. However, real-world testing revealed discrepancies, and it looks like 25% is a more accurate limit. In the scenario above, you would actually need 8.2T to guarantee size reduction.
3. Beware of Silent Failures: If the reduced size is insufficient, the wizard completes without warnings, but the resulting green instance will retain the same size as the blue. Interestingly, in these cases, the deployment process uses RDS snapshots, making the operation faster.
4. Verify Storage Changes: The green instance’s size reduction becomes evident only after the storage configuration step. Initially, it mirrors the blue instance’s size.
Once the deployment completes, thorough testing of the green instance is vital:
When satisfied, initiate the switchover:
- From the RDS console, select the deployment, click ‘Actions,’ and choose ‘Switch Over.’
- Replication Dependencies: Ensure there are no external replication slots. If Database Migration Service (DMS) tasks exist, back up their configurations and delete them before proceeding.
- Auto-Scaling Configuration: The green instance may have auto-scaling enabled by default. Adjust these settings to match your operational needs.
After confirming the green instance is stable:
1. Disable deletion protection on the old blue instance and its read replicas (if any).
2. Select the deployment, choose ‘Actions,’ and ‘Delete.’
3. If necessary, manually delete any residual instances.
If your environment relies on DMS tasks, their definitions should be preserved and restored post-switchover. Export the JSON definitions for both task settings and table mappings.
1. Truncate or drop tables in downstream systems (e.g., Redshift) as required by task definitions.
2. Use robust replication instances (e.g., `dms.c5.4xlarge`) during task re-creation for optimal performance.
3. Distribute tasks across multiple replication instances for efficiency.
1. AWS’s documentation provides general guidelines, but real-world scenarios may demand adjustments.
2. Silent failures in the size reduction process highlight the importance of monitoring and verification.
3. Efficient DMS task management and robust replication setups can streamline the transition and recovery phases.
Reducing RDS instance size using blue/green deployments is a significant improvement over traditional methods, offering greater speed and reliability. However, it requires careful planning, testing, and execution. With the right preparation and attention to detail, you can leverage this feature to optimize storage and maintain seamless operations.