Scale Up And Then Scale Out

Friday, May 17, 2013

Boosting Hibernate Performance with In-Memory Data Management

To avoid duplication, let me just point to the other blog location.

http://blog.terracotta.org/2013/05/16/boosting-hibernate-performance-with-in-memory-data-management/

Sunday, November 4, 2012

"Go Big And Go Fast" with BigMemory Max

The below sample shows you how easy it is to scale using BigMemory Max. This shows you how to scale using BigMemory on the following facets

(a) Scale - No of objects (Our test loads up 1M objects)
(b) Scale - Total size of dataset (Our test load up ~ 1.2GB)
(c) Density - Vary the size of each object
(d) Throughput - Our test uses 1 client with 9 threads

Getting started with Big Memory Max

This screencast demonstrates how simple it is to get started with BigMemoryMax Server. This shows you how to download, install, verify the installation and how to run your first distributed lab using BigMemory. Have Fun!

Wednesday, February 15, 2012

How do I keep a Cache and DB in sync?

While speaking with java architects, I often get asked the question - When I cache information, what happens if the data in the underlying database changes? How do I keep the cache and DB in sync. This blog addresses some of the common ways to address this issue.

Broadly speaking, there are 2 ways to approach this. Firstly you can put the onus on the underlying memory store to fetch the information periodically or when the store determines that the information is stale. Secondly you can put the onus on the database to "push" updates either periodically or whenever data changes. Lets discuss further.

The immediate and most straightforward approach that comes to mind is to set the TTI or TTL on the cache so that the data will be expired periodically. The next request after expiration will result in a cache miss and can be configured to pull in the current value from the underlying system.
Couple of things to note here. Obviously there is the potential for a window when the data is in the cache and not consistent with the underlying system. Plus, the "miss" is taken by the user thread that could be interpreted as a performance penalty.

An alternate approach is to perform cache updates or invalidation periodically - use a batch process (could be scheduled using open source Quartz) running in periodic intervals to either invalidate the cache (probably works for smaller values of cache size) or update the cache (for larger caches).
You could then use RMI, JGroups, or JMS to replicate the put or remove to other instances of the cache to keep them in sync. If using the Terracotta for distribution, updating or invalidating one node will be sufficient for a cluster wide change.

Now, lets work towards transferring the onus on the DB itself.
Below is a blog from Greg Luck (Ehcache founder) about how we could use Oracle Database change notification as a means of doing cache invalidation.
Oracle 11g provides a way to register a call back when any DB updates happen. This can be leveraged to either invalidate or update the cache store.

http://gregluck.com/blog/archives/2011/01/something-new-under-the-sun-a-new-way-of-doing-a-cache-invalidation-protocol-with-oracle-11g/

Alternatively you could also use middle ware technology like GoldenGate to capture DB changes when they occur to "push" notifications into the Memory Store.

Having said all this, keep in mind that you need to evaluate these options more in context with your use case, latency, consistency and load requirements before you decide on a design.

Wednesday, February 2, 2011

Scale up before you Scale Out.

Traditionally, Scaling of Enterprise applications meant scaling out. With the advent of BigMemory scaling solutions, its easier and more cost effective to scale up before you eventually scale out.

Traditional Java Applications use the Heap to store the hot set of data. It makes logical sense to keep data that is used over and over again in a place where it can be accessed efficiently with low latency.
However with increased business we deal with larger data sets and more CPU and resource utilization. Also Java applications have to contend with what I call a necessary evil - Garbage Collection.You cannot do without GC but at the same time it periodically slows down your application. It is the bane of existence of distributed caching environments.

Keeping this in mind Enterprise applications design on a scaling out architecture. Have a small heap but distribute it across multiple boxes. Depending on the size of the data sets this might be the right solution architecture.
However, there is an easier way to scale up without GC pauses - providing a clean and low latency solution. A Big Memory solution.
As of Java 1.4 there is an API which enables you to store and retrieve data in off-heap memory. This bypasses traditional GC, provides significantly higher storage, consistent and predictable latencies and eliminates tuning and ineffective workarounds for GC. The onus is now not on the software or architecture used but on the hardware. Big Memory has been tested with the strongest box found - 350GB of RAM. And so far there is no high-limit found.
One more advantage of this solution is that it bypasses the coherency issues that needs to be dealt with in distributed caching solutions.

Distributed Architecture with Terracotta is the right solution for very large data sets. The purpose of this blog is to encourage architects to consider scaling up before scaling out.

Take a few minutes and review this white-paper about Ehcache
http://terracotta.org/resources/whitepapers/ehcache-user-survey-whitepaper
To get started using Ehcache, visit http://www.ehcache.org