Saturday, October 25, 2008

Why ORM?

I like Spring JDBC Template - it is simple, clean and saves a lot from the JDBC drudgery. It provides nice abstraction, simplifying the usage and management of prepared statements and overall if all you want is to do an ocasional query, it doesn't get any simpler. Oh, and you have full control on the underlying JDBC connection, allowing you to exploit any vendor-specific functionality. The provided functionality only makes the data access easier and supports only very basic property to column mapping out of the box (or you can write your own mapper in Java). Because of its lightweight nature, JDBC Template does not allow for transparent caching, horizontal table partitioning, vertical sharding, etc. It also doesn't offer lazy fetching of relations, identity tracking and other instrumentation magic.

Another tool I like is iBATIS. In comparison with JDBC template, it offers more structure, organizing my SQL statements in a logical way and supports most common forms of mapping the result sets to beans (including graphs of objects and joins). Ibatis does support caching (only on a whole query level), still no sharding and partitioning, lazy fetching and identity tracking. IBATIS doesn't try to parse your queries.

The Abator tool can generate iBATIS CRUD mappings, DDL from object model, and object model from a database, but frankly - in its generated form all these suck. The DDL uses only a few datatypes, has no RI or semantic constraints, nor indexes (I use it sometimes, but only as a boilerplate which I extensively tweak afterwards). The generated beans are isomorphic with the underlying tables, which is great if you are doing CRUD, but I find that most of my applications do data analysis, so the queries tend to be quite hairy, half of them encapsulated in stored procedures (for transaction processing I'd rather use Spring or plain JDBC anyway). The bottom line is that Abator handles the trivial cases, but for most of my projects it doesn't help much, so I'm back to manual coding (which is not that bad anyway.)

The first time I looked at ORM was when I checked TopLink for a project around 2000. Back then I got the impression that it introduces a lot of new concepts in order to simplify something that's already simple enough - loading a couple of beans from a table. For that project I used plain JDBC, it took me perhaps 20 mins to write and I can't remember taking much time to maintain that code - the total cost should have been less than 3 man-hours.

The second time I tried Hibernate and Toplink last year (2006) on a prototype with the same dataset and relatively small domain model (for which I finally ended up using iBATIS). The first thing that stroke me was how slow everything was - the startup time was noticeably slower than with iBATIS (about 500-800%), spent in initializing the JPA entity manager and the queries were consistently 30-50% slower. It might have been that I didn't know how to tune it, but I just couldn't achieve satisfactory results. I researched the facilities for mapping to externally defined schema and it stroke me that both Hibernate and TL are way clunkier than iBatis. There were also a number of restrictions that suggested that I need to modify my schema just to enable certain features of the tool... in the end, the identity tracking is a leaky abstraction - the whole story with the attached and detached entities smells to me.

Now the thousand dolar question, given that most applications out there DO NOT use horizontal partitioning or sharding, caching in the application layer is usually more efficient anyway, identity tracking and the transparent (hidden) lazy fetching often causes more problems than it solves (and countless hours lost in app tuning with dubious results), why are so many people obsessed with the idea of cramming an ORM into their application?

The other day I wrote a small application that had four composite entities and a couple of fairly sophisticated queries, the the data access part took me about less than an hour (that's implementation, testing and tuning), so:

  • How long would it have taken me if I used JPA? (to get reasonably well performing, tested code)
  • If you are using ORM, how many entities do you have? How much time (percentage) do you spend working on your persistence logic? Are you using the database as operational storage or are you trating it as a long-term asset? Do you share the DB with other applications?
  • Are the productivity gains realized only when your persistent model is so big that it's unmanageable by other means? Is it to futureproof a product (what is a reasonable timeframe for ROI, any examples?)?
  • How important is for you the ability to scale your data access to multiple database servers?
  • Have you ever switched an application from one SQL server to another? How much effort was it? What does the application do with the DB?
  • What are the benefits of using ORM smart caching and partial queries in comparison with in-memory database like TimesTen or H2?

Please enlighten me...

1 comment:

Cristian said...
This comment has been removed by a blog administrator.

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.