Today I had an interesting problem while deploying a new version on production servers.
I have a grails application running on tomcat. In order to prevent deployment problems in production, we have a QA environment that is identical to the target production environment, and we test the deployment there. We did a smooth deployment on QA environment, but on production we got an "array index out of bounds" exception which we never got before.
Hitting such problems on production is a real problem. Lucky for us we found the problem soon enough and wasn't sent back by the integration department J
Problem – stale query cache data
The installation procedure on the production servers is as follows:
1. Dry a single server
2. Deploy the new version
3. Try the new version on the new server and add it back to the LB
4. Deploy on other servers
The problem was that one of our domain classes was changed, but the distributed query cache contained objects with the old structures (because other servers are still running in the old version). So, the new server was trying to load some objects, got them from the cache and wasn't able to build them.
In QA we usually deploy all servers simultaneously, so the cache was cleared and the problem didn't introduce itself.
Solution – make sure that cache is reloaded from the most updated server
We had to shut down all the servers (losing a few minutes of service), reload the cache from the new servers (by manually requesting the data on that server), start the other servers and continue from step #3 above. This is only a workaround in order to complete the deployment in the current time frame.
1. Treat pre production environment exactly the same way as production: even though our QA environment is identical to the production, a small difference in the deployment procedure backfired us.
2. When changing the structure of a cached domain class, be sure to clear the cache and reload it upon server start up. This can be done in grails in the following way:
//clear the cache
//make sure session factory is injected to bootstrap class
//reload all the relevant data