Monday, May 26, 2008

Component Granularity Musings

This article was written based on my experience with Mule ESB, but the general principles should apply for other similar products as well.

Short description of an ESB application

A typical ESB (Enterprise Service Bus) application starts with an endpoint, which is an abstraction for a way to receive some 'event' which would make us do some work. The event carries some information, which could be structured (i.e. JMS message or HTTP request) or simple (the timer fired) - we'll call this information 'payload'. The endpoint passes the event (containing the payload) to a 'component', which does some kind of work and optionally emits a single output event. The output event goes to an outbound endpoint which can send the result back to the client, write it into a database, relay it to an external system, or do something else.

This was a high-level description - the benefits seen from here are that we get to reuse the communication logic between applications and since the endpoints are abstract, we can easily swap the actual transports. Still, a typical ESB provides some additional functionality. In the example above, when we receive an event it was relayed to a predefined component, but that doesn't have to be the case. We can have multiple components configured in the ESB and use 'routers' to determine the component that will process this particular event.

The routers could determine the actual component depending on the payload properties (in which case we call them CBR - content based routers) or based on anything else (e.g. we can have a support call router which routes to different endpoints based on the time of day). When router receives an event, it optionally emits one or many events (unlike a component that optionally emits one). Also, components are supposed to contain exclusively business logic, while routers are mostly concerned with mediating the events in proper manner. Other useful types of routers include aggregating, splitting, resequencing and chaining routers.

For the sake of completeness there are also 'transformers' which can be plugged between endpoint and component to apply some transformation on the event payload, converting the data or adding additional information (enriching); and 'filters' which can be used to discard the messages based on some criteria.

Since the routers, filters and transformers allow us to conditionally specify the components and endpoint, we can choose whether we want to implement our logic inside the component or outside (in the ESB configuration).

Ultra-fine grained routing (anemic components)

When each component contains no control-flow statements (straight-line imperative code), all the application logic is implemented in the ESB configuration. Some would argue that a system implemented in this fashion is more flexible and allows for faster turnaround and better introspection. While these statements are true, we need to consider the other side as well.

First, you are mixing business logic and communication logic at the same level, which is bad since it forces the maintainers to understand both (ideally they should be separated as much as possible). Not only this, but your code becomes platform-dependent (the configuration is code and in this case the ESB becomes more of a platform and less of a glue). This in turn increases the chance that upgrades of the ESB software would break your stuff (especially if you use undocumented or experimental features). We also need to keep in mind that the domain complexity has not changed - we are just expressing it using a different language.

Second, chances are that you have more developers knowing Java than the ESB config language. Also, during development it's much easier to step through the code of a single component rather than trace a message as it goes through multiple queues and thread pools.

From operations point of view, there are more knobs to turn, which increases the chance to turn the wrong ones. The monitoring can take advantage that there are more inspection points, but then you need the consider the usefulness of this information. Do you really care to know how many executions have you got for sell orders and how many for buy orders?

Many vendors (IONA, TIBCO, Progress) offer proprietary process modeling and metadata management facilities, allowing to express your business rules, routing and transformations in graphical notation and enforcing integrity based on metadata. These are expensive products and well worth their money if your project is big enough and you want to accept the vendor lock in. In that case, make sure that you make the most of it and take the time to learn how to use them instead of putting a half-assed simplification layer on top of them (just an example - so far I've seen two wrappers around Spring (in different companies) aiming to make it 'easier' to use. No need to say that none of them had any documentation).

Ultra-coarse grained routing (fat components)

In this scenario we have one component only. Multiple inbound endpoints go in (possibly passing through transformers and filters); and a single stream of events goes out. We use a content-based router to dispatch each event from the output stream, to one of the multiple outbound endpoints.

This is a code-centric approach (you won't make much use from a box-and-arrows editor). It allows one to use Java and standard Java tools and debuggers to implement, unit-test and debug the bulk of the application, while still abstracting the communication code and mundane stuff like transformations. If you want to expose application details for monitoring you have to do it by manually registering your custom MBeans or using the Spring JMX exporter.

One sign that you should consider splitting your component is if you start doing threading. This includes maintaining worker pools, doing half-sync/half-async dispatching, messing with locking, synchronization and notifications.

Sometimes there might be better abstractions for some pieces of code. For example, if your component is implementing a simple generic transformation and you are using Mule, consider extracting a Transformer. On the other hand, if the transformation is simple, but you feel that it is not generic enough then you have a choice to a) move it to a transformer and have your component focus on the business logic; b) leave it there and reduce the number of classes you maintain.

The Fine Line between Fat and Voluptuous... Components

...this time it is not the J Lo's booty. Actually, as much as I've looked I haven't found a good set of recommendations about how to structure components in an ESB. There's some stuff from the SOA guys, but they have some ill gotten assumptions that all the invocation between services has to be remote and marshaled through XML, which is not necessarily the case for a single application using ESB as a platform.

I am in no way authority on EAI, but here is my attempt at defining some guidelines (take them with a big lump of salt):

  • Multiple components make your application complex. A good starting point for a new application is sticking everything in one component and extract components as necessary using the following guidelines. In general I've found it easier to split components than to merge components (Java's method-invocation semantics is less expressive than ESB routing).
  • If you need to checkpoint your processing, consider splitting the stages into different components and separating them with durable queues. The queues could be either JMS or Mule VM queues with persistence enabled. With proper use of transactions, this would allow you to survive application crash or do a failover (you still need to take care of any non-transactional endpoints). Alternative is to keep using one component and checkpoint using a distributed transactional cache configured with redundancy.
  • If you need to control the resources allocated to certain part of the processing, you can extract it in a separate component and change the number of workers using a SEDA approach (check also this article; the Scatter-Gather and Composed Message Processor patterns). Another valid approach is to use a compute grid like GridGain, giving you more functionality, but increasing the total complexity of the application. We should also mention GigaSpaces, providing a platform based on JINI and JavaSpaces with good integration capabilities.
  • When parts of the processing have different state lifecycle. It is best illustrated by example: we receive a stream of events on an inbound endpoint (let's call it 'control stream'). Each event has a payload consisting of condition and a 'data stream' endpoint address . On certain conditions, we want to start receiving and processing the events on the data stream in a way which requires independent state. The implementation would be to have one component that would monitor the control stream and keep track of the registered processors, creating new ones as necessary. The processors are concerned only with their data stream of events and each of them has separate state. A clumsy alternative (antipattern?) is to have a single component subscribed for all data streams and use a cache to lookup the state based on some key derived from the incoming event. A possibly better alternative for this specific case is to embed an ESP processor like Esper
  • If you want to extract a component to reuse it, consider extracting it as a domain object first. If it encapsulates processing and is ESB aware, then perhaps it's better off as transformer, router or agent (using Mule terminology here). So far I haven't had the need to reuse a business-logic component between projects.
  • If two components are tightly coupled and do not take advantage of any ESB-supplied functionality on the connection between them, consider hiding them behind a single Java facade. It's easier to unit test a POJO than to integration-test a component.

As always, any opinions are welcome.


Neil Ellis said...

Hey Dimitar, I've tried to take a little bit of a different view on integration. I found the ideas behind Mule great - but I also found that you'd end up with too much in XML configuration files for multiple products. So I started Einstein which is a 4GL in the works to provide a complete language for integration and co-ordination. It's an attempt to get beyond the huge hodge potch of DSLs, ESBs and the like to have a single language responsible for all aspects (state machine, service co-ordination, transactions, distributed execution, messaging, routing). Please take a look and let me know if it interests you. All the best Neil.

Mike Francis said...

Hi. If you are interested in doing SEDA or adding scale and resilience to your Mule implementation you may also want to take a look at the Infiniflow ESB Runtime Service and its open source foundations Mule4Newton.

Mike (Paremus)

Dimitar said...

Hi Neil, Mike,

I've actually been looking at Infiniflow/Newton for some time and I find it a very powerful. I'm going monitor the progress of Mule4Newton, especially the load-balanced routing sounds really useful.

Einstein looks promising, though I can't really imagine how well would it work for a real system yet. Perhaps providing a few examples (e.g. different implementations of Loan Broker) would help to clear this up?

In general I don't see why Einstein should be bound to Newton - IMHO it would make sense to be able to compile an Einstein system description to Mule config, Camel config, etc. (or even in the case of Mule have an alternative EinsteinConfigurationBuilder).

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.