Showing posts with label ESB. Show all posts
Showing posts with label ESB. Show all posts

Monday, January 12, 2009

The other half of the puzzle

This post on service service granularity from Ross Mason (the original developer of Mule ESB) got me in retrospective mode. I've been thinking about this before and with the time, I have come up with some criteria how to break functionality into services. Unfortunately that turned out to be only half of the puzzle. The other half is the implementation and maintenance strategy that would allow you to achieve your nicely layered boxes-and-arrows vision, while delivering ROI on quarterly intervals and not falling into the JBOWS or stovepipe-system pitfalls.

I personally favor starting top-down, focusing on specific applications and task-level services, and harvesting entity and utility services from working production code. The downside of this is that it is generally more work, often seen as 'tinkering' and in case we consolidate utility code from multiple systems, will likely introduce temporary instabilities.

This makes it a difficult sell to management - what they see is the IT guys trying to get budget to work on some kind of metaphysical inner-beauty thing instead of delivering new features. Saying that this would reduce bugs is not a great argument, because you will be reminded that's why we pay the QA team and by the way, as a professional you are not supposed to write bugs in the first place.

On the other side, the PHBs of the world seems to love the idea of the Centers of Excellence, located in China or Malaysia, full of trained and cheap labour, building the utility services and coding the entity diagrams designed by the Architecture Group in their ivory tower, while the solution engineers use point and click tools to churn 'codeless' composite applications. Unfortunately this breaks at the other end - unexpected interactions are discovered at deployment time, maintenance takes too long as the whole turns out to be more than its parts, exhibiting "emergent behavior" (also known as "underspecified features") left and right.

So, there is no quick and easy way. I guess we need to become IT salesmen and we need to compete with the professional vendor scamsters and the consultancy loan-sharks (that's a topic for another post).

Tuesday, September 2, 2008

Dynamic components in Mule

Imagine that you are given to implement the following scenario:

  • A system is composed of a number of uniform entities, each entity publishes its status on a separate pub-sub topic and accepting instructions over separate control queue.
  • Your application receives a stream of instructions over a control channel, instructing it to monitor and control or stop controlling certain entity.

or you receive a stream of active orders and want to subscribe to the right marketdata to calculate a relative performance benchmark, or...

Usually I write about how to implement stuff with Mule, but this time I'm going to write why Mule might not be your best choice if your use-case is anything like above.

The most straight forward approach for implementing the system above is to create a simple POJO, encapsulating the management algorithm and have a service listen to the control channel and instantiate/dispose a new component for each managed entity.

The problem is that in Mule the components are somewhat heavyweight and it's not so nice when you start having hundreds (approaching thousands) of them. Another problem is that the code for manually creating and registering a componment along with the whole paraphernalia of routers, endpoints, filters and transformers is quite verbose.

The latter problem is easier to solve. There are a couple of JIRA issues that aim to simplify the programmatic configuration (MULE-2228, MULE-3495, MULE-3495). The Annotations project at Muleforge is also worth watching (right now there's no documentation, so you'd need to dig in Subversion).

The problem with the too many components is more fundamental. The main application domain for Mule is service integration. Though it also happens to be a fairly good application platform, it does fall short when it comes to managing a lot of components. One of the important things missing is the ability to create groups of components with shared thread pools. Another missing thing is better support for variable subscriptions (provide an API to easily add and remove subscriptions to the same component using some kind of template). Another notable improvement would be to elaborate the models into full blown scope-controllers, allowing to create composite applications with proper interfaces, routing policies, etc. All of these are addressable, but there is a long way to go.

There are also some improvements that can be done to the JMX agent - use JMX relation service to connect component to its endpoint and statistics, all components of certain type, user defined groups of components. Allow the component to exercise some control over the Mule-generated ObjectName so users can organize the different components in groups and ease the monitoring by EMS tools by allowing to query the parameters from a specific group.

Please share if you've had similar issues, how you dealt with them and whether you think {with the benefit of a hindsight) that the solution was worth the effort. Also I'd like to hear from users/developers of other integration frameworks or composite application containers (Newton?) how they handle these problems.

Monday, September 1, 2008

Transformers vs. Components - Take 2

In my previous article, I claimed that one of the main criteria for which Mule artifact to choose is the multiplicity of the inbound and outbound messages. I've been thinking about it and I want to add two more points:

  • Transformers are one instance per endpoint reference. This means that even if you use pooled components, all messages go through single transformer instance. In such case, you need to make sure that the transformer is threadsafe (which is not true if it is holding a JDBC connection or Hibernate session). The components do not have to be threadsafe, though if you are injecting some shared state you still need to properly synchronize the access to it.
  • Mule2 services allow you to configure different ways of instantiating the component (singleton, pooled, looked up from Spring/JNDI/OSGi/etc.). In contrast, all transformers defined using the mule-core namespace are prototype-scoped.
  • I actualy agree that direct database access belongs in a component. I still think that accessing a cache is kindof a gray area though.

Wednesday, July 23, 2008

Transformers vs Components

When I started using Mule, for some time I used to wonder what's the big difference between a transformer and a component. Why don't we just stick with the plain pipes and filters model and we complicate it with the 'component' concept?

In general, transformers have conceptually simpler interface - they are supposed to get a payload and return a payload, possibly of different type. You should prefer transformers for reusable transformation tasks. On the other hand, transformers can not:

  • Swallow messages - to achieve this use filter or router.
  • Return multiple messages - use router.
  • Pause temporarily

In addition, despite their more complex configuration components provide some neat features:

  • Message queuing - if your message goes through multiple stages of enrichment, each taking variable time, you might be better off implementing the stages as components and separating them with VM queues.
  • Statistics - automatically measures the processing time and volume through each in or outbound endpoint.
  • Lifecycle control - you can start, stop or pause component multiple times (while a component is not running the messages will queue up.) The transformers are only initialized and disposed once.
  • A componen allows you to tack on multiple filters, transformers, routers, etc. A transformer is one part of this chain.

To put it together, here is an overview of Mule's component roles in a simple message routing sequence (I'm trying to explain the roles, not the exact sequence.)

  • The message enters through an inbound endpoint, where the transport extracts the payload and the message headers and bundles them in a message. (The message and some other objects are bundled in EventContext, but it is still a mystery to me why do we need it.)
  • After the message is created it passes through a filter chain, where the filters are applied one after another and each of them votes with true or false a full consensus is required or otherwise the message is rejected. Depending on the config the rejected messages can be routed somewhere else. The filters are not supposed to modify the message (in practice they can, but it's a very bad idea.)
  • If a message passes all filters it goes through a transformer chain, where each transformer is applied to the output of the preceding one. A transformer is not able to gracefully reject a message, but it can throw an exception.
  • We might have an interceptor wrapped around the component, which can log stuff before or after invocation or measure latency, etc.
  • Based on the payload and on the component type, Mule resolves the method it should call and if necessarry transforms the payload to something that fits. The return value of the method is used as a payload to a single outgoing message.
  • The outgoing message is passed to an outbound router, that can split it in multiple messages or just drop it. The router then usually sends the message(s) to arbitrary number of outgoing endpoints. The outgoing router uses the filter chains of the outgoing endpoints to decide which ones to use. The outgoing endpoints can be Mule internal endpoints or periphery endpoints, conecting the application to external system.

In general, I would recommend that you start with transformer and refactor it to a component if you realize that you want to add any of the aspects mentioned above. You can even easily use transformers as POJO components.

Monday, July 21, 2008

Testing Mule applications

Antoine Borg has posted an intersting article at Ricston Blog about the feasibility of using mock objects for testing Mule applications. The bottom line is that in integration scenario is usually difficult to capture the application specification in mocks and often it's easier to write stub applications simulating the external systems (outside of your application process).

I've found that often I don't really test the whole application, but instead I test single components with a few attached transformers, filters and routers. I start by writing a simple Mule configuration - a single component using VM endpoints with queuing. Then I use MuleClient to feed in canned input data and assert the output from the outbound endpoint(s). As the application takes shape, I add transformers and routers as needed to approximate the real usage.

I could imagine that the next step would be to reuse my production configuration and extract the perimeter endpoint definitions in a new file (separating them from the model, connectors and internal endpoints) and pass the two configuration files to the config builder. This would allow me to create an alternative perimeter endpoints file using VM endpoints, so I can instantiate it in test case and use MuleClient for testing.

The benefits of the approach are that you are testing your component and routing logic and are not exposed to the peculiarities of the external system. Ideally we still want to have a full integration tests, including ones covering crash-failure, failover, connectivity loss and overload scenarios. We can achieve parts of this by stubbing the external system, but so far I usually find it difficult to reproduce the behaviour faithfully enough (especially when we have limited understanding about the external system).

No bulletpoints this time.

Tuesday, May 27, 2008

Pushing Files

In the next couple of posts I'll try to capture a couple of scenarios that I think Mule (and in general ESB) fits in. The target audience are people like my friend who once asked me once 'Well, I got it - it's all about transports, transformers, routers and components. In the end, what would you use it for?'

When I started reading about Mule, the first thing that caught my eye was the huge collection of transports and third party modules. It allows you with a relatively little configuration to automate scenarios like sucking a file, uploading it to FTP and sending an email if it failed or moving it to a audit dir on success. Add a few lines and you can have Mule try to upload, if it fails, try to post to JMS queue and if it still fails automatically create a JIRA issue.

Still, even though Mule is easy to use, it's not simple. Actually even for basic use of Mule, you need to know XML, have idea about Enterprise Integration Patterns and be familiar with Java server applications. Suffice to say that many of my colleagues do not meet these requirements. Compare this with writing a bash or Perl script - it's much easier and quite often it is good enough. I wouldn't be a stretch to say that the TCO of a Perl script for the previous scenario is 10 times less than the TCO of an application built using Mule or any other free or commercial integrated solution.

Things change a bit when you have a big number of integration jobs. If the tasks are similar, a smart scripter would abstract away the processing, capturing it into a parameterized sub-routine or shell script, with multiple simple wrappers with different arguments. Unfortunately, much too often the strategy is copy-and-modify (especially when the guy who wrote the original script has left the company, and the guy who did the first 30 clones also). Cloning is a pragmatic, low-risk strategy since it minimizes the chance that you break stuff that works. Such developers are usually praised for their get-it-done, goal oriented attitude and management likes their low-risk approach. Still, the dirty secret that nobody likes to talk about is that the system grows in complexity until the moment when nobody knows what is it doing anymore (I think Skynet started that way).

Even we assume that the TCO of a single Perl script is 10 times less than the one of a Mule implementation, a single Mule instance with relatively little custom development can easily handle the job of all the 170 scripts that I counted in one directory on a production server I use. If we assume that these scripts were derived from 10 archetypes (actually it's less) this still gives us a cost reduction of 170%. Even more important - the Mule and Spring configurations give you a roadmap, which while not trivial is much easier to comprehend and audit than the 27,227 lines of repetitive Perl code.

To be fair, in this example I am comparing bad scripting code, written by multiple people over the time, without proper governance to an hypothetical Mule solution written by decent developers. Let's assume that you have a team of good scripting developers, writing well structured code and taking care to provide maintenance documentation and guidelines for extending the system. Actually, this would work fine. Effectively these guys will be defining a platform quite similar to what you get from an ESB (in the end it's all Turing-complete languages). The only catch is that in my experience, in a typical enterprise it's much less probable to meet such Perl coders than the above mentioned Java guys. The argument that Perl programs don't have to be messy is quite similar to "guns don't kill people".

Even if we assumed that the mythical Perl programers did exist (and you had all three of them in your IT department), there are still a couple of points that make an Mule an attractive solution.

  • Mule gives you a proven architectural framework, so you don't have to invent your own. This includes guidelines, some documentation (getting better) and abstract base classes for many of the EIP patterns.
  • The business code written against Mule is usually framework agnostic (POJO).
  • Various transports, transformers and out-of-the box components.
  • Handles advanced stuff as XA transactions and threading models that are just not possible using Perl.
  • Since the bulk of your application will be built on a platform, it will be easier to find people whi have some experience with it, which will shorten the learning curve.
  • Extensive JMX monitoring and management.

And here are some good reasons to keep cloning scripts:

  • Naturally resilient to regressions.
  • Allow you to hire cheaper developers (even if you hire expensive high-qualified ones, they will be just as good as the cheap ones).
  • If you plan to scrap the whole thing in the near future (i.e. because the whole system is being replaced)
  • If the job is not critical, doesn't change often and forking a script is so easy that there's no point investing in anything better.
  • leg·a·cy [ˈle-gə-sē] - code that actually works.

Monday, May 26, 2008

Component Granularity Musings

This article was written based on my experience with Mule ESB, but the general principles should apply for other similar products as well.

Short description of an ESB application

A typical ESB (Enterprise Service Bus) application starts with an endpoint, which is an abstraction for a way to receive some 'event' which would make us do some work. The event carries some information, which could be structured (i.e. JMS message or HTTP request) or simple (the timer fired) - we'll call this information 'payload'. The endpoint passes the event (containing the payload) to a 'component', which does some kind of work and optionally emits a single output event. The output event goes to an outbound endpoint which can send the result back to the client, write it into a database, relay it to an external system, or do something else.

This was a high-level description - the benefits seen from here are that we get to reuse the communication logic between applications and since the endpoints are abstract, we can easily swap the actual transports. Still, a typical ESB provides some additional functionality. In the example above, when we receive an event it was relayed to a predefined component, but that doesn't have to be the case. We can have multiple components configured in the ESB and use 'routers' to determine the component that will process this particular event.

The routers could determine the actual component depending on the payload properties (in which case we call them CBR - content based routers) or based on anything else (e.g. we can have a support call router which routes to different endpoints based on the time of day). When router receives an event, it optionally emits one or many events (unlike a component that optionally emits one). Also, components are supposed to contain exclusively business logic, while routers are mostly concerned with mediating the events in proper manner. Other useful types of routers include aggregating, splitting, resequencing and chaining routers.

For the sake of completeness there are also 'transformers' which can be plugged between endpoint and component to apply some transformation on the event payload, converting the data or adding additional information (enriching); and 'filters' which can be used to discard the messages based on some criteria.

Since the routers, filters and transformers allow us to conditionally specify the components and endpoint, we can choose whether we want to implement our logic inside the component or outside (in the ESB configuration).

Ultra-fine grained routing (anemic components)

When each component contains no control-flow statements (straight-line imperative code), all the application logic is implemented in the ESB configuration. Some would argue that a system implemented in this fashion is more flexible and allows for faster turnaround and better introspection. While these statements are true, we need to consider the other side as well.

First, you are mixing business logic and communication logic at the same level, which is bad since it forces the maintainers to understand both (ideally they should be separated as much as possible). Not only this, but your code becomes platform-dependent (the configuration is code and in this case the ESB becomes more of a platform and less of a glue). This in turn increases the chance that upgrades of the ESB software would break your stuff (especially if you use undocumented or experimental features). We also need to keep in mind that the domain complexity has not changed - we are just expressing it using a different language.

Second, chances are that you have more developers knowing Java than the ESB config language. Also, during development it's much easier to step through the code of a single component rather than trace a message as it goes through multiple queues and thread pools.

From operations point of view, there are more knobs to turn, which increases the chance to turn the wrong ones. The monitoring can take advantage that there are more inspection points, but then you need the consider the usefulness of this information. Do you really care to know how many executions have you got for sell orders and how many for buy orders?

Many vendors (IONA, TIBCO, Progress) offer proprietary process modeling and metadata management facilities, allowing to express your business rules, routing and transformations in graphical notation and enforcing integrity based on metadata. These are expensive products and well worth their money if your project is big enough and you want to accept the vendor lock in. In that case, make sure that you make the most of it and take the time to learn how to use them instead of putting a half-assed simplification layer on top of them (just an example - so far I've seen two wrappers around Spring (in different companies) aiming to make it 'easier' to use. No need to say that none of them had any documentation).

Ultra-coarse grained routing (fat components)

In this scenario we have one component only. Multiple inbound endpoints go in (possibly passing through transformers and filters); and a single stream of events goes out. We use a content-based router to dispatch each event from the output stream, to one of the multiple outbound endpoints.

This is a code-centric approach (you won't make much use from a box-and-arrows editor). It allows one to use Java and standard Java tools and debuggers to implement, unit-test and debug the bulk of the application, while still abstracting the communication code and mundane stuff like transformations. If you want to expose application details for monitoring you have to do it by manually registering your custom MBeans or using the Spring JMX exporter.

One sign that you should consider splitting your component is if you start doing threading. This includes maintaining worker pools, doing half-sync/half-async dispatching, messing with locking, synchronization and notifications.

Sometimes there might be better abstractions for some pieces of code. For example, if your component is implementing a simple generic transformation and you are using Mule, consider extracting a Transformer. On the other hand, if the transformation is simple, but you feel that it is not generic enough then you have a choice to a) move it to a transformer and have your component focus on the business logic; b) leave it there and reduce the number of classes you maintain.

The Fine Line between Fat and Voluptuous... Components

...this time it is not the J Lo's booty. Actually, as much as I've looked I haven't found a good set of recommendations about how to structure components in an ESB. There's some stuff from the SOA guys, but they have some ill gotten assumptions that all the invocation between services has to be remote and marshaled through XML, which is not necessarily the case for a single application using ESB as a platform.

I am in no way authority on EAI, but here is my attempt at defining some guidelines (take them with a big lump of salt):

  • Multiple components make your application complex. A good starting point for a new application is sticking everything in one component and extract components as necessary using the following guidelines. In general I've found it easier to split components than to merge components (Java's method-invocation semantics is less expressive than ESB routing).
  • If you need to checkpoint your processing, consider splitting the stages into different components and separating them with durable queues. The queues could be either JMS or Mule VM queues with persistence enabled. With proper use of transactions, this would allow you to survive application crash or do a failover (you still need to take care of any non-transactional endpoints). Alternative is to keep using one component and checkpoint using a distributed transactional cache configured with redundancy.
  • If you need to control the resources allocated to certain part of the processing, you can extract it in a separate component and change the number of workers using a SEDA approach (check also this article; the Scatter-Gather and Composed Message Processor patterns). Another valid approach is to use a compute grid like GridGain, giving you more functionality, but increasing the total complexity of the application. We should also mention GigaSpaces, providing a platform based on JINI and JavaSpaces with good integration capabilities.
  • When parts of the processing have different state lifecycle. It is best illustrated by example: we receive a stream of events on an inbound endpoint (let's call it 'control stream'). Each event has a payload consisting of condition and a 'data stream' endpoint address . On certain conditions, we want to start receiving and processing the events on the data stream in a way which requires independent state. The implementation would be to have one component that would monitor the control stream and keep track of the registered processors, creating new ones as necessary. The processors are concerned only with their data stream of events and each of them has separate state. A clumsy alternative (antipattern?) is to have a single component subscribed for all data streams and use a cache to lookup the state based on some key derived from the incoming event. A possibly better alternative for this specific case is to embed an ESP processor like Esper
  • If you want to extract a component to reuse it, consider extracting it as a domain object first. If it encapsulates processing and is ESB aware, then perhaps it's better off as transformer, router or agent (using Mule terminology here). So far I haven't had the need to reuse a business-logic component between projects.
  • If two components are tightly coupled and do not take advantage of any ESB-supplied functionality on the connection between them, consider hiding them behind a single Java facade. It's easier to unit test a POJO than to integration-test a component.

As always, any opinions are welcome.

Saturday, May 24, 2008

An Integration Story or 5 Ways to Transform a Message

It all started when we decided to replace Moxie with Devissa*. Moxie was a decent system and it had aged well, but its years had started to show. The rigid data schema, the inflexible order representation, the monoloitic C++ server... Don't get me wrong, it was and still is working great, but with the time we realized that we need something more. Something that would let us define the way we do business instead of having us change the business to fit in its model.

* All names have been changed to protect the innocent

The global roll out of Devissa looked like a good opportunity to bring in a more capable trading system. Devissa itself, was a huge beast, composed of hundreds of instances of several native processes running with variety of configurations, held together by TCL code, cron jobs and a templated meta-configuration.

The Moxie communication protocol was simple - fixed length records sent in one direction, 32 bit status code in the other, over a TCP socket (actually 2 sockets - uplink and downlink). Devissa was much more complex - the messages were framed using XML-like self-describing hierarchical format (logically it was the standard map of strings-to-arrays of maps... ending up with primitive values at the leaf nodes). The session level protocol was simple and luckily there was a java library for it (I'll bitch about it some other time). On top of the sessions, sit a bunch of application level protocols, each with different QoS and MEP. There is also a registry, authentication service and a fcache/replicator/database/event processor thingie that sits in the center, but I am digressing.

I'm actually started this article to share some interesting stuff I learned while we migrated the order flow from Moxie to Devissa. The phase-zero was to make a point to point integration Devissa to Moxie using the FIX gateways of the respective products, routing orders entered into Devissa to Moxie, so the traders could work them in the familiar Moxie interface. It allowed us to receive flow from other offices which were already on the Devissa bandwagon and it was great because we didn't have to code data transformations and behaviour orchestration logic - it all 'just worked'.

The next task was to make sure that we can trade on Devissa and still be able to produce our end-of-day reports from a single point. Right then all reporting was done from Moxie, so what seemed to make most sense was to capture the reportable events from Devissa and feed them back to Moxie. I'll spare you the BA minutae for now.

As we were looking for a suitable base for creating a platform on which to build various applications around Devissa, I shortlisted a couple of ESB solutions (although it's an interesting topic, I won't talk about "what's an ESB and do I need one"). I looked at Artix, Tibco, Aqualogic, ServiceMix and Mule. I found that Artix ESB was great, Artix DS looked like a good match for our data mapping needs, the only thing I was concerned was the cost. Before I get in contact with the vendor, I asked my managers about our budget - thay replied with almost surprise that we don't know - if it was good and worth the money we might try to pitch it to the global architecture group - in other words, commercial product was not really an option. This ruled out pretty much everything, leaving ServiceMix and Mule (if I was starting now I would also consider Spring Integration). I read a bit about JBI. I tried to like it, I really did... still I couldn't swallow the idea about normalizing your data on each endpoint and being forced to handle all these chunks of XML flying arround. At that time Mule looked like the obvious answer for OS ESB.

The first thing I had to do was to build custom transport for Moxie and Devissa. That took about 2-3 days. They didn't have any fancy features (actually they barely worked), but I was able to receive a message from one and stuff a message in the other. During the following year both transport evolved a lot, ending up with full rewrite last month, porting them to Mule2 and adding goodies like container-managed dispatcher threading, half-sync support, support for all Devisa application protocols and others.

The second phase was to build a neutral domain model as described in the Eric Evans's "Domain Driven Design" which I had read recently. Then I wrote two transformers - Devissa2Domain and Domain2Moxie, implemented a simple POJO with about 15 lines of real code and voila - all our Devissa orders and Executions appeared in Moxie. Forking the flow to a database was really easy, since I could use the Mule JDBC connector and it took only 10 lines of config. Storing the messages in XML was also easy with the Mule XStream transformer and the Mule File connector. The world was great.

Not really. It turned out that the DB storage and the file-based audit were not real requirements, so we cut them really quick (or perhaps they made the first release). Soon, during UAT, it turned out that even though the the BAs had created quite detailed requirements, they didn't match what the business wanted. Even worse - the business itself wasn't sure what they wanted. We were going through a few iterations a day, discovering more data that needs to be mapped, formats that need to be converted, vital pieces of information that were present in one model and not in the other and they had to be either looked up from static table or calculated from couple of different fields and sometimes ended up stuck in a field that had different purpose, which we were not using right now.

During all this time, the domain model was growing. Each new piece of information was captured clearly and unambiguously in a Java bean with strongly typed properties, validation and stuff. We went live on December 14-th. On the next day the system broke. We kept tweaking the business logic for quite some time and for each tweak, there were always three places to change - the domain model, the inbound transformer and the outbound transformer.

One day I decided to see what would it be if we drop the domain model altogether and replace the inbound transformer with isomorphic conversion from the Devissa data classes to standard Java collections and then use a rule engine to build the outgoing Moxie message. Enter Drools. The experiment was success - in a couple of days, I was able to ditch my domain model (which has grown to be so specific to the application that it wasn't really neutral any more). Drools was working fine, though I had the feeling that something was wrong... I never asserted, nor retracted any facts in my consequences - I was abusing the Rete engine. Actyally, all I was doing was a glorified switch statement.

While I was at it, I decided to ditch Drools as well and use MVEL - one of the consequence-dialects of Drools, which turned out to be a nice, compact and easy to embed language. MVEL is designed mainly as expression language, though it has control-flow statements and other stuff. With MVEL, all my transformation fitted on one screen and had the familiar imperative look and feel, but without the cruft. I was able to plug some Java functions using the context object, which allowed me to hide some ugly processing; and the custom resolvers allowed me to resolve MVEL variables directly from the Devissa message and assign them directly to the properties of the Moxie message beans.

Some time after that, for different project, building on the same foundation, I decided to see if I can infer an XML schema from the XML serialization of the Devissa messages. After some massaging I used that schema to generate the domain model using JAXB and tried to see how it feels. It was a disaster. A typical Devissa message has more than 50 properties (often more than 100). Usually you need 10-20 of them. Alsi, the generated property names were ugly. Even after conversion from CONSTANT_CASE to camelCase, they were still ugly. The automatically generated beans was practically unusable, the XML looked not-human-editable, the XSD was not adding any real value since it lacked any semantic restrictions, so the whole thing felt like jumping through hoops. In the end I dropped the whole JAXB idea and went with MVEL again.

3rd time lucky, beginning of this March, I started a new project. This time I again decided to try a new approach - in the inbound transformer, I was wrapping the raw Devissa message in an adapter, exposing the fields I need as bean properties, but carrying the full dataset of the original messages. It works well. One particular benefit is that you can always look at the source data and see if there is anything there that might be useful.

In conclusion I'll try to summarize:

  • Neutral model plus double translation can yield benefits when the domain is well known, especially if it is externally defined (i.e. standard). On the other hand it's a pain in the ass to maintain, especially if the domain objects change frequently.
  • Rule engines are good when you have... ahem, rules. Think about complex condition and simple consequence. Actually, in the original Rete paper, the consequences are only meant to assert and retract facts. Changing an object in the working memory or doing anything else with side-effect behind the engine's back is considered a bad practice at best or (usually) plain wrong. Even when using fact invalidation (truth maintenance), it has big performance impact.
  • Direct mapping using expression language works well, especially for big and complex messages. The scripts are compact and deterministic, which makes them maintainable. You might need to write your own variable resolvers and extend the language with custom functions. Also, debugging could be a nusance, but if you keep your control-flow to minimum and use plugged Java functions, it's quite OK.
  • Adapters are a middle ground between double translation and direct mapping. They tend to work well to provide internal representation for the application, you can also stuff some intelligence in them without worrying that somebody might regenerate them. With a bean mapping framework like Dozer you can even automate the transformation to the output datatype, though for many cases that would be overkill (sometimes 200 lines of straight Java code are more maintainable than 50 lines of XML or 10 lines of LISP).
  • Xml works well if your output format is XML; if you need to apply transformations with XSLT or render it using XSL:FO. As we know, you can run XPath on bean and collection graphs using JXpath; also any expression language can provide sililar capabilities.

Next time, I'll write about component decomposition, content-based routing vs coarse-grained components and how to decide whether to do the transformation in a component or in a transformer.

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.