Beware of Hibernate query cache when deploying on the cloud

Today I had an interesting problem while deploying a new version on production servers.

I have a grails application running on tomcat. In order to prevent deployment problems in production, we have a QA environment that is identical to the target production environment, and we test the deployment there. We did a smooth deployment on QA environment, but on production we got an "array index out of bounds" exception which we never got before.

Hitting such problems on production is a real problem. Lucky for us we found the problem soon enough and wasn't sent back by the integration department J

Problem – stale query cache data

The installation procedure on the production servers is as follows:

1. Dry a single server
2. Deploy the new version
3. Try the new version on the new server and add it back to the LB
4. Deploy on other servers

The problem was that one of our domain classes was changed, but the distributed query cache contained objects with the old structures (because other servers are still running in the old version). So, the new server was trying to load some objects, got them from the cache and wasn't able to build them.

In QA we usually deploy all servers simultaneously, so the cache was cleared and the problem didn't introduce itself.

Solution – make sure that cache is reloaded from the most updated server

We had to shut down all the servers (losing a few minutes of service), reload the cache from the new servers (by manually requesting the data on that server), start the other servers and continue from step #3 above. This is only a workaround in order to complete the deployment in the current time frame.

Conclusion

1. Treat pre production environment exactly the same way as production: even though our QA environment is identical to the production, a small difference in the deployment procedure backfired us.

2. When changing the structure of a cached domain class, be sure to clear the cache and reload it upon server start up. This can be done in grails in the following way:

//clear the cache

//make sure session factory is injected to bootstrap class

sessionFactory.queryCache.clear()

//reload all the relevant data

כתיבת תגובה

הזינו את פרטיכם בטופס, או לחצו על אחד מהאייקונים כדי להשתמש בחשבון קיים:

הלוגו של WordPress.com

אתה מגיב באמצעות חשבון WordPress.com שלך. לצאת מהמערכת / לשנות )

תמונת Twitter

אתה מגיב באמצעות חשבון Twitter שלך. לצאת מהמערכת / לשנות )

תמונת Facebook

אתה מגיב באמצעות חשבון Facebook שלך. לצאת מהמערכת / לשנות )

תמונת גוגל פלוס

אתה מגיב באמצעות חשבון Google+ שלך. לצאת מהמערכת / לשנות )

מתחבר ל-%s