Over the past few years as principal engineer / architect (and back again) I've faced a couple of hard problems which I'm sure many of you have too. Once you get to an architecture where you have more than one JVM or server, how do you quickly and easily share information across them? Yes the database is one easy solution but the performance / scalability hit in "always going to the database" becomes quickly onerous.
Then one decides to cache to reduce this load but now how to you keep caches in synch?
Then one might go down the path of deciding to use some messaging system (e.g. JMS, Tibco, WebSphereMQ etc.) to ease intercommunication but there the amount of coding, testing and debugging needed due to the complexity of this roll-your-own solution becomes quite problematic.
Another similar problem comes where you have a load balancer in front of your web / app servers for performance, DR and HA reasons and you have some stateful application (like most web sites are today). How can you share easily session information across servers to make for a seamless experience?
So for all of these problems that's where something like Terracotta comes in. It basically sits between your application and the JVM and allows you to declaratively define what information is to be shared across JVMs. It promises to do so without any code changes (nice!).
A compelling product to say the least. Anyway, in a bid to learn more about this product I decided to review this book.
TITLE: The Definitive Guide to Terracotta: Cluster the JVM for Spring, Hibernate and POJO Scalability
AUTHORS: Terracotta, Inc.
CHAPTER 1 Theory and Foundation: Forming a Common Understanding
The first chapter starts by laying some foundation and defining terms - always a good start. "Terracotta is a transparent clustering service for Java applications" is the mantra and they go on to explain what this term means. They proceed to talk about how the underlying memory model that Terracotta gives you
They discuss at a high level how the service provides you advantages such as high availability, scalability (scale-out) and improved performance (by not requiring a DB hit to share information).
As examples, they cover the classic use-cases of Terracotta
1) Distributed Caching
2) Database Offload
3) Session Replication
4) Workload partitioning
CHAPTER 2 History of Terracotta
Chapter two discusses the forces that resulted in the creation of Terracotta - from the forces to scale out rather than scale-up (e.g., preference for loose coupling, availability of cheap commodity hardware, cheapness of linux etc.).
But with scale-out comes the problem of keeping JVMs / Servers in synch. The solutions such as
1) Scale the Database
2) In-memory replication
3) Partitioning the data
each came with their own problems.
And so Terracotta came into being. Whereas folks such as Amazon (and eBay) took the approach of "eventual correctness" (aka "eventual consistency") where each application instance could complete transactions to local disk and eventually flush to the database, Terracotta's founders chose another solution as their business folks were "not prepared to discuss the ramifications of an eventually correct architecture, where users might be told that a previously confirmed purchase order could not be completed because of miscalculations in inventory long after checkout completed".
And so they sought to effectively create a "General Purpose L2" cache. The original implementation was too intrusive where developers would often forget to serialize and replicate changes to L2-based objects to keep things in synch and this led to regressions and eventually to a significant slow down in the pace of development.
With Scalability and Availability often becoming opposing forces it was refreshing that their solution aided both. The transparency of the solution also does not necessitate the need of one programming model over another e.g. EJB vs. Spring or JPA vs. Hibernate vs. iBatis.
CHAPTER 3 Jumping Into Terracotta
Ah here we go - C O D E! They literally start off with a simple (clustered) "Hello World!" example and start to get into how to configure Terracotta. I wish they had spent some more time here, perhaps a whole chapter, helping someone set-up a REALLY good environment (say multi-machine, or at least an env for multiple programs operating simultaneously) - a lot of this is left up to the user to figure out, let alone perform. That I think dilutes the message of Terracotta and doesn't give the reader a good "WOW" factor when they see this in operation.
In any event we start to get into the "meat" here and discover how Terracotta ensures all changes to shared data have been applied before a read is performed. And just after a write, Terracotta ensures that all memory changes are made available to other Java processes that might need them.
CHAPTER 4 POJO Clustering
So having seen a quick example they correctly now need to dive in to expain how Terracotta handles Java objects and the virtual heap and secondly explain how Terracotta manages thread coordination between JVMs.
They define a "root" - a field in any class that you declare as being clustered. Terracotta traverses the graph of object references from that root to cluster those objects too. Since these objects are clustered and durable beyond the scope of a single JVM they are sometimes referred to as "superstatic" - having the same lifecycle as the virtual heap.
Typically data structures like Map, List and other Collection objects are chosen as root objects.
Terracotta is pretty smart since not EVERY object on the virtual heap needs to be instantiated in every JVM. Terracotta can load an object as needed. Just like the virtual memory subsystem of a modern OS swaps contents to and from physical memory and disk, Terracotta lets your application behave as if there was an almost unlimited physical Java heap.
Then comes information on such topics as Distributed Garbage Collection and how threads are coordinated - again the details are such that there are no real surprises or "tricks" here - the folks at Terracotta have really taken Transparency to heart. "Terracotta provides exactly the same access serialization, coordination and visibility guarantees to threads in different JVMs as the JVM itself provides to threads within the same JVM".
Then we get into more meaty topics of locks in Terracotta and how Terracotta extends the concept of locking. Again by using declarative methods they help keep much of the messy coding inherent in locking out of the developers hands and keep it where it belongs - in the Terracotta infrastructure.
From there we leap to Terracotta "transactions" wherein Terracotta batches changes made to objects into sets, helping to ensure that threads always see a consistent view of clustered objects.
From there we delve into what kinds of objects are not "Portable" (cluster-able with Terracotta) e.g. file-system related classes such as java.io.FileDescriptor (host-machine specific) and instances of java.lang.Thread (JVM-specific) are some examples.
Interestingly there is also the concept of "Physically vs. Logically managed objects" . The former are objects wherein their field data values are distributed to the Terracotta server and from there to the other cluster members. The later (Logically managed) are clustered by Terracotta by recording the method calls on those objects and their arguments and replaying them on the other members of the cluster.
Examples of logically-managed objects are Hashtable, HashMap and HashSet (spot the common theme?) - yes that's because the Hashcodes used to create the internal structure of the object are JVM-specific.
From there we get more into understanding Clustered POJOs but personally I felt much of this information was repeated either earlier or later in the book. But after that there's a more fully formed example used to elucidate much of what was discussed earlier in the chapter.
CHAPTER 5 Caching
We begin this chapter with a discussion of caching and the trade-offs and problems it incurs
- Space for time
- Duplication of data across caches
From there we delve into which of the Map structures are best for such data structures within Terracotta. Interestingly ConcurrentHashMap is generally the best choice when sharing maps but sadly LinkedHashMap is not supported by Terracotta. Harumph!
Then we get into some of the gory details of caching that we all need to know
- Eviction and Expiration policies
- Distributed Caching (again I felt this was repetitive)
- Use of partitioned data
From there we get a quick example with Ehcache (Easy Hibernate cache) and then onto chapter 6.
CHAPTER 6 Hibernate with Terracotta
I found this to be my favorite chapter - quite a bit more details in it than other chapters, good solid examples, and the benefits of the product become abundantly clear.
We start off with a great overview of Hibernate and how Terracotta can be used to improve it - by clustering the second-level Hibernate cache and also by using it to cluster Hibernate session objects.
From there we get a good example of how Hibernate and Terracotta together can be used to save on DB hits. Hibernate's cache runs the risk of staleness if another JVM updates data in the Db and so Terracotta helps fill this gap by preventing such staleness issues.
Then we get some great stuff lacking in other books - HARD NUMBERS!
Hibernate clustered with Terracotta gave a 4x boost over Hibernate with second-level cache alone when focused on DB updates. When focused on DB reads, we get a > 250x boost. Naturally your mileage may vary but at least we're getting some good ideas of what to expect.
We now get into the details of configuring Hibernate to be aware of Terracotta and vice-versa. All straightforward and relatively simple stuff.
CHAPTER 7 Extending HTTP Sessions with Terracotta
This was another good chapter on how to share HTTP Session information across JVMs, servers using Terracotta. A very useful feature that helps avoid most of the problems associated with persisting HTTP session information to afford your cluster the ability to scale-out, be HA etc.
Yet again we see the Transparency of Terracotta as it transparently (to your web app) hooks into the servlet container. From there we get a few nice examples to see all of this works with Tomcat. Fortunately Terracotta supports the following web / app servers
- Apache Geronimo
- Struts 1.1
CHAPTER 8 Clustering Spring
Here we get a pretty short chapter where basically the point is you can point Terracotta at Spring beans rather than declare each class / field yet again in your Terracotta config file.
CHAPTER 9 Integration Modules
Terracotta supports the idea of external configuration for a component you might be shipping that takes advantage of Terracotta's features. This allows you to ship a Jar and the user of that jar then does not need to include Terracotta config information for this component into their own Terracotta config.
This feature is called a "Terracotta Integration Module" or TIM. It consists of config info and perhaps code that specifies how the component should be clustered, how locking is performed etc. They then go on to describe how TIMs are created, used and configured.
CHAPTER 10 Thread Coordination
This chapter seemed like it should have been more up front, also it's quite short and I thought there would be more here. They get into some of the details of thread coordination in relation to Terracotta. I got something out of this chapter - I'm just not sure exactly what it was.
CHAPTER 11 Grid Computing Using Terracotta
Naturally Terracotta lends itself to Grid computing i.e. supporting the splitting of a workload across nodes. From there we get into the "Master/Worker" pattern and an implementation in Java and then into how to refactor the original example for improved performance / scalability by reducing contention, batching work, multiple work queues, addressing fault tolerance.
CHAPTER 12 Visualizing Applications
Finally in chapter 12 we learn about visualization techniques and tools to help you comprehend what a cluster is doing and why it is going slow or fast. They show many metrics the tools capture and what they reveal and how they can be used to tune your application.
This is a rock-solid book with a solid introduction. I wouldn't agree that it's a "Definitive Guide" - but I guess that's just an Apress standard naming convention (the same way Manning has the "In Action" series).
I'd like to have seem more help up front in getting your environment set-up for the examples, some case-studies of how Terracotta has been used, more benchmarks, perhaps even benchmark code. But given the fact that it's the ONLY book I can find on Terracotta it's fortunately pretty good and gets you "in the game".
Clearly Terracotta can't have infinite scalability - Terracotta must communicate between JVMs over the network so a guide of best practices, practical limits etc. would have been useful on things such as how to optimize network architecture, data structure design / optimization etc. would have been great.
PROS: Relatively Short and to-the-point. Examples are simple and straightforward.
CONS: Not enough examples. Not enough code. Would be nice to have them help you set up an environment. Would have been nice to see some more hard numbers of performance / scalability boost over other solutions. Quite a bit of repetition - typical of books written by teams without a clear "separation of concerns".
BUY IT? If you're using starting to use Terracotta - absolutely. If you're interested in making your application faster or scaling-out in general - probably.