Wednesday, February 15, 2012

How do I keep a Cache and DB in sync?


While speaking with java architects, I often get asked the question - When I cache information, what happens if the data in the underlying database changes? How do I keep the cache and DB in sync. This blog addresses some of the common ways to address this issue.

Broadly speaking, there are 2 ways to approach this. Firstly you can put the onus on the underlying memory store to fetch the information periodically or when the store determines that the information is stale. Secondly you can put the onus on the database to "push" updates either periodically or whenever data changes. Lets discuss further.

The immediate and most straightforward approach that comes to mind is to set the TTI or TTL on the cache so that the data will be expired periodically. The next request after expiration will result in a cache miss and can be configured to pull in the current value from the underlying system.
Couple of things to note here. Obviously there is the potential for a window when the data is in the cache and not consistent with the underlying system. Plus, the "miss" is taken by the user thread that could be interpreted as a performance penalty.

An alternate approach is to perform cache updates or invalidation periodically - use a batch process (could be scheduled using open source Quartz) running in periodic intervals to either invalidate the cache (probably works for smaller values of cache size) or update the cache (for larger caches).
You could then use RMI, JGroups, or JMS to replicate the put or remove to other instances of the cache to keep them in sync. If using the Terracotta for distribution, updating or invalidating one node will be sufficient for a cluster wide change.

Now, lets work towards transferring the onus on the DB itself.
Below is a blog from Greg Luck (Ehcache founder) about how we could use Oracle Database change notification as a means of doing cache invalidation.
Oracle 11g provides a way to register a call back when any DB updates happen. This can be leveraged to either invalidate or update the cache store.
   
http://gregluck.com/blog/archives/2011/01/something-new-under-the-sun-a-new-way-of-doing-a-cache-invalidation-protocol-with-oracle-11g/

Alternatively you could also use middle ware technology like GoldenGate to capture DB changes when they occur to "push" notifications into the Memory Store.

Having said all this, keep in mind that you need to evaluate these options more in context with your use case, latency, consistency and load requirements before you decide on a design.