Saturday, May 24, 2008

An Integration Story or 5 Ways to Transform a Message

It all started when we decided to replace Moxie with Devissa*. Moxie was a decent system and it had aged well, but its years had started to show. The rigid data schema, the inflexible order representation, the monoloitic C++ server... Don't get me wrong, it was and still is working great, but with the time we realized that we need something more. Something that would let us define the way we do business instead of having us change the business to fit in its model.

* All names have been changed to protect the innocent

The global roll out of Devissa looked like a good opportunity to bring in a more capable trading system. Devissa itself, was a huge beast, composed of hundreds of instances of several native processes running with variety of configurations, held together by TCL code, cron jobs and a templated meta-configuration.

The Moxie communication protocol was simple - fixed length records sent in one direction, 32 bit status code in the other, over a TCP socket (actually 2 sockets - uplink and downlink). Devissa was much more complex - the messages were framed using XML-like self-describing hierarchical format (logically it was the standard map of strings-to-arrays of maps... ending up with primitive values at the leaf nodes). The session level protocol was simple and luckily there was a java library for it (I'll bitch about it some other time). On top of the sessions, sit a bunch of application level protocols, each with different QoS and MEP. There is also a registry, authentication service and a fcache/replicator/database/event processor thingie that sits in the center, but I am digressing.

I'm actually started this article to share some interesting stuff I learned while we migrated the order flow from Moxie to Devissa. The phase-zero was to make a point to point integration Devissa to Moxie using the FIX gateways of the respective products, routing orders entered into Devissa to Moxie, so the traders could work them in the familiar Moxie interface. It allowed us to receive flow from other offices which were already on the Devissa bandwagon and it was great because we didn't have to code data transformations and behaviour orchestration logic - it all 'just worked'.

The next task was to make sure that we can trade on Devissa and still be able to produce our end-of-day reports from a single point. Right then all reporting was done from Moxie, so what seemed to make most sense was to capture the reportable events from Devissa and feed them back to Moxie. I'll spare you the BA minutae for now.

As we were looking for a suitable base for creating a platform on which to build various applications around Devissa, I shortlisted a couple of ESB solutions (although it's an interesting topic, I won't talk about "what's an ESB and do I need one"). I looked at Artix, Tibco, Aqualogic, ServiceMix and Mule. I found that Artix ESB was great, Artix DS looked like a good match for our data mapping needs, the only thing I was concerned was the cost. Before I get in contact with the vendor, I asked my managers about our budget - thay replied with almost surprise that we don't know - if it was good and worth the money we might try to pitch it to the global architecture group - in other words, commercial product was not really an option. This ruled out pretty much everything, leaving ServiceMix and Mule (if I was starting now I would also consider Spring Integration). I read a bit about JBI. I tried to like it, I really did... still I couldn't swallow the idea about normalizing your data on each endpoint and being forced to handle all these chunks of XML flying arround. At that time Mule looked like the obvious answer for OS ESB.

The first thing I had to do was to build custom transport for Moxie and Devissa. That took about 2-3 days. They didn't have any fancy features (actually they barely worked), but I was able to receive a message from one and stuff a message in the other. During the following year both transport evolved a lot, ending up with full rewrite last month, porting them to Mule2 and adding goodies like container-managed dispatcher threading, half-sync support, support for all Devisa application protocols and others.

The second phase was to build a neutral domain model as described in the Eric Evans's "Domain Driven Design" which I had read recently. Then I wrote two transformers - Devissa2Domain and Domain2Moxie, implemented a simple POJO with about 15 lines of real code and voila - all our Devissa orders and Executions appeared in Moxie. Forking the flow to a database was really easy, since I could use the Mule JDBC connector and it took only 10 lines of config. Storing the messages in XML was also easy with the Mule XStream transformer and the Mule File connector. The world was great.

Not really. It turned out that the DB storage and the file-based audit were not real requirements, so we cut them really quick (or perhaps they made the first release). Soon, during UAT, it turned out that even though the the BAs had created quite detailed requirements, they didn't match what the business wanted. Even worse - the business itself wasn't sure what they wanted. We were going through a few iterations a day, discovering more data that needs to be mapped, formats that need to be converted, vital pieces of information that were present in one model and not in the other and they had to be either looked up from static table or calculated from couple of different fields and sometimes ended up stuck in a field that had different purpose, which we were not using right now.

During all this time, the domain model was growing. Each new piece of information was captured clearly and unambiguously in a Java bean with strongly typed properties, validation and stuff. We went live on December 14-th. On the next day the system broke. We kept tweaking the business logic for quite some time and for each tweak, there were always three places to change - the domain model, the inbound transformer and the outbound transformer.

One day I decided to see what would it be if we drop the domain model altogether and replace the inbound transformer with isomorphic conversion from the Devissa data classes to standard Java collections and then use a rule engine to build the outgoing Moxie message. Enter Drools. The experiment was success - in a couple of days, I was able to ditch my domain model (which has grown to be so specific to the application that it wasn't really neutral any more). Drools was working fine, though I had the feeling that something was wrong... I never asserted, nor retracted any facts in my consequences - I was abusing the Rete engine. Actyally, all I was doing was a glorified switch statement.

While I was at it, I decided to ditch Drools as well and use MVEL - one of the consequence-dialects of Drools, which turned out to be a nice, compact and easy to embed language. MVEL is designed mainly as expression language, though it has control-flow statements and other stuff. With MVEL, all my transformation fitted on one screen and had the familiar imperative look and feel, but without the cruft. I was able to plug some Java functions using the context object, which allowed me to hide some ugly processing; and the custom resolvers allowed me to resolve MVEL variables directly from the Devissa message and assign them directly to the properties of the Moxie message beans.

Some time after that, for different project, building on the same foundation, I decided to see if I can infer an XML schema from the XML serialization of the Devissa messages. After some massaging I used that schema to generate the domain model using JAXB and tried to see how it feels. It was a disaster. A typical Devissa message has more than 50 properties (often more than 100). Usually you need 10-20 of them. Alsi, the generated property names were ugly. Even after conversion from CONSTANT_CASE to camelCase, they were still ugly. The automatically generated beans was practically unusable, the XML looked not-human-editable, the XSD was not adding any real value since it lacked any semantic restrictions, so the whole thing felt like jumping through hoops. In the end I dropped the whole JAXB idea and went with MVEL again.

3rd time lucky, beginning of this March, I started a new project. This time I again decided to try a new approach - in the inbound transformer, I was wrapping the raw Devissa message in an adapter, exposing the fields I need as bean properties, but carrying the full dataset of the original messages. It works well. One particular benefit is that you can always look at the source data and see if there is anything there that might be useful.

In conclusion I'll try to summarize:

  • Neutral model plus double translation can yield benefits when the domain is well known, especially if it is externally defined (i.e. standard). On the other hand it's a pain in the ass to maintain, especially if the domain objects change frequently.
  • Rule engines are good when you have... ahem, rules. Think about complex condition and simple consequence. Actually, in the original Rete paper, the consequences are only meant to assert and retract facts. Changing an object in the working memory or doing anything else with side-effect behind the engine's back is considered a bad practice at best or (usually) plain wrong. Even when using fact invalidation (truth maintenance), it has big performance impact.
  • Direct mapping using expression language works well, especially for big and complex messages. The scripts are compact and deterministic, which makes them maintainable. You might need to write your own variable resolvers and extend the language with custom functions. Also, debugging could be a nusance, but if you keep your control-flow to minimum and use plugged Java functions, it's quite OK.
  • Adapters are a middle ground between double translation and direct mapping. They tend to work well to provide internal representation for the application, you can also stuff some intelligence in them without worrying that somebody might regenerate them. With a bean mapping framework like Dozer you can even automate the transformation to the output datatype, though for many cases that would be overkill (sometimes 200 lines of straight Java code are more maintainable than 50 lines of XML or 10 lines of LISP).
  • Xml works well if your output format is XML; if you need to apply transformations with XSLT or render it using XSL:FO. As we know, you can run XPath on bean and collection graphs using JXpath; also any expression language can provide sililar capabilities.

Next time, I'll write about component decomposition, content-based routing vs coarse-grained components and how to decide whether to do the transformation in a component or in a transformer.

9 comments:

Unknown said...

Great post, thanks for sharing!

Regarding "Xml works well if your output format is XML": I had good success outputting to POJOs from raw XML thanks to XStream.

Dimitar said...

Thanks David, could you elaborate more about this:

* Were you reusing existing domain model or were the POJOs part of your application?

* Was it a standard data schema or one specific for your application?

* Did you map the entire input XML or only the parts you used?

* Did you have the XML generated from the same POJOs or did you map them using XStream annotations and stuff?

* Did you massage the XML in any way (XSL, XPath extraction, etc.) before feeding it to XStream?

* Did you have anything fancy in the model? Lists, dates in funny formats, values derived from multiple fields, subclass discriminators, etc? Yes, I know that XStream can handle these, only that it adds to the work.

* Roughly, what was the complexity of the model? How many doctypes you had, how many POJOs, how many fields in a POJO?

Unknown said...

I do not use application domain model objects in the ESB, at least not when I deploy it as a common enterprise service (I had other application-embedded deployments where I was using domain model objects).

The POJOs I am targeting from XML through XStream are either from the JDK (for example, maps) or internal data transfer objects the external callers are unaware off.

As you guessed, the XML goes through XSL-T massaging to adopt the representation expected by XStream.

Tom Fennelly said...

Hi Dimitar.... came across your post here. Very interesting!

Of interest in this area is Smooks. It can handle many of the usecases you outline in your post here:

* Binding to Java Object models that don't "line up" with the source message model (which can be XML, EDI, CSV, Java etc). No need to "massage" messages into a different format for the sake of the binding process.

* Binding only the data entities of interest and ignoring the rest.

* Binding to multiple Java Object Models concurrently - useful for splitting and routing of messages.

* Binding to Virtual Object Models (no need to define a Physical Object Model) - useful for templating.

* Supports complex data decoding e.g. specifying date formats... merging multiple fields into a single property using Expression Based Bindings.

* ... and all sorts of other useful stuff.

BTW... Smooks is not just a binding framework! It just happens to have these features in their own right, as well as in support of other Smooks features!!

Smooks is well integrated in JBossESB. It's semi-integrated into Mule.

The following is an example of how Smooks is used in JBossESB. There's a flash demo with it. I think it will interest you: JBossESB Quickstart - "Huge Split, Enrich, Transform and Route"

Dimitar said...

Hi Tom,

My problem with Smooks is that I still can't quite understand what is it good for. On the first look, the configuration is quite verbose (even for the example link you pasted, I know that I can do this with 2 lines of MVEL.) I like some of the ideas (i.e. beind able to process different sections of an XML doc in different ways), but these are niche scenarios, which don't justify going wholesale for a solution this complex. Then again, I might be wrong.

I do plan to read the manual soon and give it a try, and I shouldn't say anything more before I've used it 'in anger'.

BTW, it would be interesting if you can contrast a solution using Smooks transformations to the approaches I've outlined in this article (double translation, rule engine, EL/script, JAXB binding, payload adapters). If you want, I can create a couple of use cases, and demonstrate how one can implement them in different ways, where you can provide a Smooks implementation.

Tom Fennelly said...

Hi Dimitar.

Yeah, I think it would be useful to see some examples and make some concrete comparisons. I'm especially looking forward to seeing those 2 lines of MVEL ;-)

You're absolutely right about the Smooks v1.0 config being a bit verbose. This stems from the fact that the core is quite generic => so is the config => making it a bit verbose. For v1.1, we plan on introducing an extensible configuration model, which will hopefully make the config less verbose (and easier to read).

Re ELs, JAXB etc... maybe it'd be better to wait and do some of the comparisons, but sure... JAXB is great... if you're message is XML and you have an XSD for it, and the Java Object model to which you're binding was generated from that XSD using the JAXB binding tools (i.e. it's annotated). If not :-(

Dimitar said...

Hi Tom,

I didn't mean to imply that Smooks is not good and scripting is the way to go. What I meant is that the example in Expression Based Bindings section is much longer than it needs to be.

In scripting language you would write something like this:

bean = new com.company.project.BeanName();
bean.date = new DateFormat("yyyy/MM/dd HH:mm:ss").format(String.format(
    "%s/%s/%s %2d:%2d:%2d",
    payload.year, payload.month, payload.date,
    payload.hour, payload.minute, payload.second
));

Assuming that you have bound your payload in the execution context, this snippet will create new bean, assemble date string and parse it just like the Smooks example does. It is easier to read and easier to modify (e.g. to use Calendar instead of SimpleDateFormat).

In my experience, the problems with scripting start when you start adding control structures dependent on external conditions and use introspection.

From my point of view, the strongest point about Smooks is that it's fully declarative, which makes it relatively easy to round-trip the config or generate it based on a simpler model and templates. I'd love to see Erudine-style GUI sitting on top of Smooks.

Tom Fennelly said...

Ah sorry Dimitar... I thought you were referring to what was behind the other link (JBossESB Quickstart - "Huge Split, Enrich, Transform and Route"). Implementing all that in 2 lines would have been quite an achievement :-)

So, just to clarify, much of what you're seeing behind the Expression Based Bindings link is to do with binding the actual source data from the message (XML, EDI, CSV etc) into the context (i.e. populating the "payload" properties), making it available to the scripting language. Whereas, the example you provide assumes this is already done (via JAXB, or whatever). That's fine though... at least we know what each other is talking about now (I think).

So in Smooks, with that example... you could just as easily have used the expression you specified there for populating the date binding value. It was just an example! (BTW... Smooks is also using MVEL for this).

I can easily understand your comment from the earlier post re not really appreciating what Smooks was, because at it's core... it's not really anything other than an event stream processor - no mention of message transformation, binding, splitting, routing etc. These other capabilities are built on top. The core is just a runtime that can process a data source, convert it to a stream of events and allow you to target processing logic at that event stream. That's kind-of useless in itself, but it does allow you to build out many features on top of it. It also allows you solve other problems such as processing of huge messages, because it allows you to execute tasks on the message payload as you are filtering it i.e. you don't need to way until you've slurped up the full message payload (into memory) before you can start executing tasks on that data (such as generating split messages using a templating engine, routing to file/database/JMS/ESB-Endpoint etc).

I absolutely agree that the pain point at the moment (or one of them at least) is the config (that said... most people that use Smooks don't mind this so much after using it for a bit). Addressing this is a top priority, not just for us (within the project), but for others too (using the project). We're going to introduce the Extensible config models which should make the config less verbose and a lot more readable. Add to this that someone else (I can't mention right now) is actively looking at investing significant resources into building Eclipse tooling around Smooks... I think we can make some progress on this (and soon)!!

We should hook up on email (you should have my email - I sent you one) because I would be interested in seeing some of the solutions you've built using your ideas (or whatever you could show). I'd hope I can learn something from what you've done (and maybe you could get something from it too).

Anonymous said...
This comment has been removed by a blog administrator.

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.