Dimitar's Blog: The Shape of Complexity

Some things are complex and there's no way around it. Still, even if we can't remove the complexity, often we can shape it into different forms. Some applications have shallow and wide complexity - many simple things with relatively few dependencies, but the overall system has mind-boggling emergent behavior (my favorite 25kloc perl scripts example). Other applications have narrow and deep complexity - small code base, but using so much infrastructure amd metaprogramming, that to understand what's going on, you need to be expert in the platform (think about Hello World written with EJB2 or any application using Ruby on Rails).

The horizontal complexity is pretty easy to deal with - find two similar things and create abstraction for them, rinse and repeat. It is important to stop and look every now and then for similar abstractions and for ugly usages. Eliminate the similar abstractions by extracting common functionality in helpers or superclasses; deal with the ugly usages by splitting the abstraction.

By definition, when we introduce abstractions, our complexity becomes 'narrower' and 'taller'. If we put too much stuff into the abstract classes or we nest them too much, we might transform the bunch of simple classes that we couldn't understand when taken together into a somewhat smaller bunch of more complex classes that we can't understand even in isolation. If go overboard in the other direction, e.g. adding too many facades and convenience methods, the API becomes too big without providing enough benefit to learn it (canonical example - I know a project that has a Strings class, containing constants for empty string, single digits, punctoation marks, single letters and other. Apart from being pointless, more verbose and difficult to apply consistently, this also couples all classes in that project to the util package).

So, how do we end up with code base that is neither too tall, neither too wide, but just the right shape?

Acknowledge that the right shape depends on the individual - some people can cope with more abstraction, while others can remember more facts. A metric for abstraction efficiency can be defined as delta-loc/n^{levels-of-indirection} (where n is a constant bigger than 2).
Keep in mind the choice of tools - for example IntelliJ IDEA excels at navigating well-factored code, while Vi people often prefer decoupled classes that can be changed with low risk of impacting other areas of code.
Consider the infrastructure maturity - using XA transactions with JMS is simple, debugging buggy JMS (caugh.. activemq.. caugh) is entirely different issue.
The experience of the team is important - for some people JMS is the most obvious way to send a piece of data once and only once to another system; others treat JMS as black magic and resort to FTP and cunning rename+move schemes, involving multiple directories and recovery scenarios.
And finaly, even if different pieces of infrastructure can provide similar functionality, sometimes they vary by the way they do it - in my previous article about component granularity, I mentioned the fat component vs fat Mule configuration scenarios. In this case, we can move the complexity between the Java code and the Mule code. Mule provides out of the box abstractions for threading, routing and transforming, but it is not great as a general-purpose process definition language. Java provides general purpose language, but does not provide high level primitives for threading and routhing. Another example: consider running OLAP against OLTP schema and dedicated warehouse schema - it works, but the difference in speed and CPU utilization can be orders of magnitude.

Dimitar's Blog

Tuesday, June 3, 2008

The Shape of Complexity

No comments:

Blog Archive