Thursday, October 16, 2008

Hibernate best practices...

I have been on a very large enterprise project using hibernate for the persistence. Our data model is upwards of 200 objects. It is a clustered web application and thus requires high performance so we've implemented caching and have all our objects and relationship set to lazy load. The typical database interaction is lots of reads and few rights. The data of one user does not affect the data of another.

Here are some of my best practices which I would have loved to have had in place from the beginning. They would have saved a lot of time and reduced bugs...

Use id based Equality
This is contrary to a lot of the typical recommendations but I don't see a problem with it. It means that you have a working equality method without any further effort. Furthermore, you have a guaranteed equality, the same equality that the database uses. There are some issues with this when it comes to one-to-one relationships, but those can be resolved by adjusting the equals method. The other issue is that there are instances where there is no business based unique key.

Don't use auto generated Keys
So in your model classes set the id. This will mean you need another field as a persistence marker. The fundamental reason for this, coupled with the fact that you're using the id for equals, is that you have immediate equality, both prior to persisting the object and in your tests. By using auto generated keys you you have to wait until the object is persisted before its equals method works. You cannot thus use it in a set prior to persistence, and when you run tests you have to manually set the id every time you create the object.

Use instrumentation
Hibernate has an issue (a big issue IMHO) that if you execute the following code (Person is set to be Lazy loaded).

          Person p = session.load(Person.class, id);
      ContactDetails cd = p.getContactDetails();

And on the database that particular object (contact details) is actually a subclass of ContactDetails called AddressDetails, the real type of the variable cd will be ContactDetails and _not_ Address Details as you would have expected. In order to get the real type you have to do first ask hibernate for the real type of the object and then call load again with that real type. So in order for the variable cd to be the correct type you would require the following code:

          Person p = session.load(Person.class, id);

      ContactDetails cd = p.getContactDetails();
      cd = (ContactDetails)load(id, Hibernate.getClass(cd));

An extra line is required.

Instrumentation however injects code into the compiled class of ContactDetails and intercepts the call to getContactDetails to add the bit to load the real type. It's a small little ant job that runs within the context of our IDE and sorts it all out.

Use Field access over method access
Unfortunately, I only discovered that hibernate had the ability to directly act on the fields rather than go via methods too late to make use of this ability. It would have enabled me to setup some nice validation and/or side effects on the setting of a value in a hibernate managed object. I'm sorry I didn't do it earlier. This is a practice which I have thus not been able to test, it does sound like a good idea but maybe there are other issues I do not know about.

Think hard about your Cascades
Hibernate is not good at saving a whole object tree in one go. The problem is that as soon as any sql is executed on the database, the sql is validated against the state of the database. So any constraint violations are immediately raised. If you think hard on the cascades the ability so save whole trees is significantly improved though it is not totally enabled. One obvious practice is that if it has a not-null constraint the cascade must be set. If this is not set unless you save the foreign object before the local one you will get a constraint violation. The objects go together (thus the not-null) so the cascade should be on.

The above recommendations are fairly generic to any project that uses hibernate in the real world. There's a few that I've learnt on this project that are specific to a high performance and high concurrency web application...

Prefer trawling the object model over running queries
The kind of application we have is one where a user would login and probably send on average 10 - 20 minutes on the web site. Thus we would typically have a lot of the objects they require in cache already. Thus when you need to find some data for the person concerned it is a lot better to simply trawl the objects rather than use a hibernate query. The reason for this is twofold..
  • Querying in hibernate always causes a flush. Data is therefore written to the database at a time when it shouldn't necessarily be. Even though you know the query does not touch any of the pending writes, the flush will still happen.
  • A data connection is made. On our application we gained significant performance concurrency improvements by removing queries - the objects were already in the cache in any case. Where we had deadlocks we simply removed the query from the equation (changed to trawling the model) and the deadlocks were significantly reduced.
Make everying lazy
On our application where isn't a single relationship which is not lazy. We can't think of a case where having lazy off would be useful to us. Yes, I understand that it means the first read will be slow, but after that it'll be in the cache so it will be fast. If lazy is off even if the object is in the cache it will still load that object from the cache

In other words, if you have an object A which has a reference to object B and the reference is specified as non lazy. When you load A it will _always_ load B even if you don't go near B. Even if B is in the cache it will still load B from the cache. On our application we have a lot of static data. This static data was initially set to be lazy disabled (prefetching). Then we would preload all this static data. We found that the preload was very slow - even though it was coming from the cache - because it would have to load the root object plus all the non lazy relationship stemming from that object.

3 comments:

Anonymous said...

I would also suggest, don't use composite keys

Michael Wiles said...

agreed

Anonymous said...

Wonder what you mean with "trawling the object model".

How important is the programming language?

  Python, typescript, java, kotlin, C#, .net… Which is your technology of choice? Which one are you currently working in and which do you wa...