If you work in a big company where scalability matters then your application must have been receiving a large amount of traffic like 40K RPM or more. Your database has millions of records and now your manager asks you to remove an existing column from the database because that column is of no importance now. In such cases, it becomes quite tricky to remove the columns without any error in the application and without facing any downtime. Let’s learn to drop a database column in the Production system
Assumption
Before jumping on to the solution let’s set some assumptions which will help you to understand more. For any large-scale application, there are multiple instances of the same application which point to a single database.
In the above system design
- we have one service only Service A
- we have deployed multiple instances of the same service Service A:1, Service A:2, Service A:3
- All the deployment instances are pointing to the same database.
Let’s say we have a column in the database as an address and we want to drop this address column. In the current code, we have a logic.
- Address is a mandatory value in the create user API flow
- We are inserting the address data in the database.
public void insertUser(User user) {
...
//check mandatory address params
if(createUserRequest.getAddress().isEmpty()) {
throw new Exception4XX("300", "first_name", "First Name can not be empty");
}
...
String sqlString = "INSERT INTO users (first_name, email, address) "+
" VALUES ('"+ user.getFirstName() +"','"+ user.getEmail() +"', '"+ user.address() +"')";
connection.createStatement().execute(sqlString);
}
Approach 1: Drop the address column from the database directly
Someone can suggest an approach that we can directly drop the address column from the database. If we do so then it has some drawbacks.
Disadvantages
- The insertUser method will start to fail because the insert statement has an address column in the query but we dropped it from the database so the request will start to fail and we may face downtime
Approach 2: Remove the dependent code then drop the column
This is the preferred approach and should be followed in large-scale production systems. This is a three-step process.
Remove dependent code first
All code and logic related to the database column which is to be dropped should be removed first. In our case we will remove the mandatory field check logic and then the address column from the insert statement.
So the important point to note here is you should remove all the codebase that has any dependency on the database column to be dropped.
Deploy the service on production
Once the codebase is cleaned then you need to deploy the latest code to production. In this way, the new instances present in the production server have no dependency on the address column
Drop column from the database
Now you can drop the column from the database as it has no importance now.
If we follow Approach 2 then our service will not have any error or downtime.