Dimitar's Blog: June 2008

Monday, June 16, 2008

Build, Buy or Steal

This post is based on a text I wrote at my day job. It is edited to reflect my personal opinions and does not represent my employer’s view in any way.

The classic 'build vs. buy' dilemma is further complicated by the new wave of quality open source components.

Even if we assume that the open-source alternatives are generally inferior to their closed-source counterparts, their low price makes them an attractive choice for getting started, provided that there is an easy migration path to more powerful solutions. Additionally, many open-source tools are backed by commercial companies which can provide support, consulting and guidance on demand.

Open-source components are often (not always) higher quality than the internally developed libraries. This is because of the usually pretty-good community QA (many users; no time pressures) and the small-vendor mentality of the backing firms (they need to be much better than the industry standard in order to convince a client). Also, a solid OS project provides a decent documentation and community support, often has books written about it and, in general, makes it easier to recruit people already familiar with it.

We do want to build components in-house when we believe that we can provide and need more value than their OS and commercial counterparts can provide. We always need to keep in account that building good library is a continuous investment in new features, bug fixing and documentation. Providing a sub-optimal internal component is the worst of all worlds.

Another big question when to contribute our changes back to an open source project. Conventional wisdom says that you shouldn't give our work for free. Still, given the previous paragraph, I would claim that it's in our best commercial interest to contribute back any changes to the external module's core. In such cases, the initial investment is small, the maintenance is high (as we need to reapply changes with every release) and the business value of the code is low. Alternative is to fork the open-source component, which brings us to the problem in the previous paragraph.

We might also decide to contribute or open source internally developed extensions of Open Source library, gaining free QA and possibly bugfixes. We should not open source code that gives us [our product] significant advantage over alternative solutions on the market. We must not share non-generic code, capturing: business processes, algorithms, site-specific logic, etc.

And now is the time for the mandatory list at the end of the post. I'm going to enumerate a few build vs buy decision anti-patterns:

Not Invented Here (NIH) Syndrome - you know, when we write your own thing because we are too lazy to do our research or read the docs.
The Wrapping Party - every external library is wrapped in order to integrate into the proprietary architectural framework. Though it might look like a good decision, it often leads to difficulties in the debugging, inability to apply best practices and tools, and generally inefficient use of the library
Nobody Got Fired for Buying Expensive Stuff - when technical decisions are taken by managers without enough information (or understanding). Sometimes this is rationalized as that all the products look the same on paper, so at least this one comes from a reputable company we can sue. Problem is that often the very expensive do-it-all products require a staff of rocket scientists (or vendor consultants) in order to deliver anything after that.
The First One is Free - some vendors try to promote their products as open source, while capturing your data and interfaces in proprietary formats and protocols and then selling services around them. The SOA RAD tools and BPM tools are particularly bad offenders here. The problem is that this limits the ways for evolving your platform. The way to prevent this is to always be aware what part of the solution is platform specific: configuration (is it documented file format), POJO vs proprietary interface components, can you get the whole solution as a bunch of text files, what protocols are used for communication, what is the data storage, can we plug our custom infrastructure, where?

Tuesday, June 3, 2008

The Shape of Complexity

Some things are complex and there's no way around it. Still, even if we can't remove the complexity, often we can shape it into different forms. Some applications have shallow and wide complexity - many simple things with relatively few dependencies, but the overall system has mind-boggling emergent behavior (my favorite 25kloc perl scripts example). Other applications have narrow and deep complexity - small code base, but using so much infrastructure amd metaprogramming, that to understand what's going on, you need to be expert in the platform (think about Hello World written with EJB2 or any application using Ruby on Rails).

The horizontal complexity is pretty easy to deal with - find two similar things and create abstraction for them, rinse and repeat. It is important to stop and look every now and then for similar abstractions and for ugly usages. Eliminate the similar abstractions by extracting common functionality in helpers or superclasses; deal with the ugly usages by splitting the abstraction.

By definition, when we introduce abstractions, our complexity becomes 'narrower' and 'taller'. If we put too much stuff into the abstract classes or we nest them too much, we might transform the bunch of simple classes that we couldn't understand when taken together into a somewhat smaller bunch of more complex classes that we can't understand even in isolation. If go overboard in the other direction, e.g. adding too many facades and convenience methods, the API becomes too big without providing enough benefit to learn it (canonical example - I know a project that has a Strings class, containing constants for empty string, single digits, punctoation marks, single letters and other. Apart from being pointless, more verbose and difficult to apply consistently, this also couples all classes in that project to the util package).

So, how do we end up with code base that is neither too tall, neither too wide, but just the right shape?

Acknowledge that the right shape depends on the individual - some people can cope with more abstraction, while others can remember more facts. A metric for abstraction efficiency can be defined as delta-loc/n^{levels-of-indirection} (where n is a constant bigger than 2).
Keep in mind the choice of tools - for example IntelliJ IDEA excels at navigating well-factored code, while Vi people often prefer decoupled classes that can be changed with low risk of impacting other areas of code.
Consider the infrastructure maturity - using XA transactions with JMS is simple, debugging buggy JMS (caugh.. activemq.. caugh) is entirely different issue.
The experience of the team is important - for some people JMS is the most obvious way to send a piece of data once and only once to another system; others treat JMS as black magic and resort to FTP and cunning rename+move schemes, involving multiple directories and recovery scenarios.
And finaly, even if different pieces of infrastructure can provide similar functionality, sometimes they vary by the way they do it - in my previous article about component granularity, I mentioned the fat component vs fat Mule configuration scenarios. In this case, we can move the complexity between the Java code and the Mule code. Mule provides out of the box abstractions for threading, routing and transforming, but it is not great as a general-purpose process definition language. Java provides general purpose language, but does not provide high level primitives for threading and routhing. Another example: consider running OLAP against OLTP schema and dedicated warehouse schema - it works, but the difference in speed and CPU utilization can be orders of magnitude.

Dimitar's Blog

Monday, June 16, 2008

Build, Buy or Steal

Tuesday, June 3, 2008

The Shape of Complexity

Blog Archive