Dimitar's Blog: September 2008

Wednesday, September 10, 2008

On Scopes

Let's define component as a programming abstraction that exposes a number of interfaces and is instantiated and managed by a container. A component has implementation that sits behind the interface and provides the functionality, which could be a custom Java class, a proxy, a BPEL engine, a mock object, or even a mechanical turk solving captchas.

In the case of Java class components, they are usually composed of smaller Java classes like the ones in java.lang.* and java.util.* packages. All but the most simple Java objects are actually object graphs which are instantiated, wired togerher and orchestrated by the Java language and the JVM. The Java language specifies the composition structure and behavior, while JVM provides the means to invoke operations and share data. In contrast, the components are instantiated by a container and the container is responsible to wire them together (usually based on a blueprint in the form of a configuration or class annotations). The container can also provide a number of declarative services to its components: transaction management, access control, performance statistics and transparent distribution. Some popular containers are OSGi, Spring, EJB, ActiveX, MTS/COM+.

Even if a container can expose a local component over remote interface without requiring any code changes, practice has shown that taking a single-server application and distributing it in such fashion usually results in a disaster. This doesn't mean that the non-invasive distribution support is useless though. Given responsible usage, it can enable you to build distributed applications without having to deal with the transport API, using clean and reusable code (POJO). There are a number of new frameworks that are built around the concept and try to provide the right level of transparency and control (Mule ESB, Spring Remoting, Spring Integration are some popular choices).

Once you get too many components in the same contaner, autowiring starts doing most unexpected things, debugging becomes difficult and you start loosing track of your system. The solution is to add scoping. Spring supports limited scoping facilities in the form of parent-child application contexts. Although it solves the problem with configuration visibility, it does not provide facilities for controlling the public interface of the scope (explicitly exported objects). OSGi tackles this problem from a different direction using controlled classloaders, Spring OSGi is trying to be the best of both worlds, giving you scoped configuration and runtime autodiscoverable scoped components.

<interlude/>

I just finished reading Distributed Event Based Systems and they have an excellent chapter on utilizing scopes as means to structure a distributed event-driven application (chapter 6). They suggest a scoped architecture, where each component can belong to multiple scopes, scopes are linked in an arbitrary graph and notifications are governed by the different scope and component properties. The scopes described can declaratively provide a number of services (think AOP for messages). Some of these services are:

Containment - logical grouping of components and other scopes, allowing easy addressing for event dissemination and policy enforcement.
Scope Interface - defining which messages can cross the scope boundary and enforcing this policy
Transmission policy - determine which scope components will receive a certain message and/or which components have a right to publish/receive messages that cross the scope boundary.
Mappings - when a message crosses the scope boundary, enrich or transform it based on the source and destination scopes.

Scopes could be used to delineate groups with different QoS requirements, or to partition the application in geographical or security domains (or even both at the same time) and apply the appropriate policies. Integrating an external systems is also easier, because they are treated as yet another scope (as opposed to "enemy in the bee's nest") and you can provide rich access to the messages that the third party is entitled to without having to bolt on yet another authentication and authorization mechanism.

The book focuses in great detail and implementation strategies, analyzing different approaches ranging from simplistic (flooding) to very complex (integrated routing). Right now, the described architecture can not be implemented directly in any eventing system that I know of, though we can see some of the ideas in Jini (peer to peer with cooperative filtering), JMS (in implementation supporting topic hierarchies), OSGi (intra-JVM) or SCA (a lot of talk about scopes and composites, but I don't see them doing anything useful with them). The authors also propose a configuration language for managing scoped systems with the specified features. I can see AMQP as a flexible low-level specification that would allow to build something like this by implementing custom exchanges. We can use the scope configuration language as an intermediate representation, from which we can generate the custom exchange code.

Ultimately, I believe that in 10 years from now this scoping and routing facilities will become standard part of the messaging middleware (and perhaps hardware?) I'm looking in the general direction of AMQP, possibly SCA (despite their WS-* fetish) and new high level languages directly expressing architectural concerns like ACME and Einstein.

Tuesday, September 2, 2008

Dynamic components in Mule

Imagine that you are given to implement the following scenario:

A system is composed of a number of uniform entities, each entity publishes its status on a separate pub-sub topic and accepting instructions over separate control queue.
Your application receives a stream of instructions over a control channel, instructing it to monitor and control or stop controlling certain entity.

or you receive a stream of active orders and want to subscribe to the right marketdata to calculate a relative performance benchmark, or...

Usually I write about how to implement stuff with Mule, but this time I'm going to write why Mule might not be your best choice if your use-case is anything like above.

The most straight forward approach for implementing the system above is to create a simple POJO, encapsulating the management algorithm and have a service listen to the control channel and instantiate/dispose a new component for each managed entity.

The problem is that in Mule the components are somewhat heavyweight and it's not so nice when you start having hundreds (approaching thousands) of them. Another problem is that the code for manually creating and registering a componment along with the whole paraphernalia of routers, endpoints, filters and transformers is quite verbose.

The latter problem is easier to solve. There are a couple of JIRA issues that aim to simplify the programmatic configuration (MULE-2228, MULE-3495, MULE-3495). The Annotations project at Muleforge is also worth watching (right now there's no documentation, so you'd need to dig in Subversion).

The problem with the too many components is more fundamental. The main application domain for Mule is service integration. Though it also happens to be a fairly good application platform, it does fall short when it comes to managing a lot of components. One of the important things missing is the ability to create groups of components with shared thread pools. Another missing thing is better support for variable subscriptions (provide an API to easily add and remove subscriptions to the same component using some kind of template). Another notable improvement would be to elaborate the models into full blown scope-controllers, allowing to create composite applications with proper interfaces, routing policies, etc. All of these are addressable, but there is a long way to go.

There are also some improvements that can be done to the JMX agent - use JMX relation service to connect component to its endpoint and statistics, all components of certain type, user defined groups of components. Allow the component to exercise some control over the Mule-generated ObjectName so users can organize the different components in groups and ease the monitoring by EMS tools by allowing to query the parameters from a specific group.

Please share if you've had similar issues, how you dealt with them and whether you think {with the benefit of a hindsight) that the solution was worth the effort. Also I'd like to hear from users/developers of other integration frameworks or composite application containers (Newton?) how they handle these problems.

Monday, September 1, 2008

Transformers vs. Components - Take 2

In my previous article, I claimed that one of the main criteria for which Mule artifact to choose is the multiplicity of the inbound and outbound messages. I've been thinking about it and I want to add two more points:

Transformers are one instance per endpoint reference. This means that even if you use pooled components, all messages go through single transformer instance. In such case, you need to make sure that the transformer is threadsafe (which is not true if it is holding a JDBC connection or Hibernate session). The components do not have to be threadsafe, though if you are injecting some shared state you still need to properly synchronize the access to it.
Mule2 services allow you to configure different ways of instantiating the component (singleton, pooled, looked up from Spring/JNDI/OSGi/etc.). In contrast, all transformers defined using the mule-core namespace are prototype-scoped.
I actualy agree that direct database access belongs in a component. I still think that accessing a cache is kindof a gray area though.

Dimitar's Blog