Saturday, November 3, 2007

Build Tools

I had started writhng a post explaining how the Maven repository, artifact resolution and build lifecycle work, but I figured that I'm repeating the Maven Book and the Bullet-point Guide. Instead, think it would be more interesting to go down the memory lane and talk about how my build tools have changed through the years.

Why Build?

I stated programming on Apple][ using Basic, and at that time I stored my programs as source on a 5¼" diskette. Every time I wanted to run a program, I typed 'load MyProg <ENTER> run<ENTER>'. No compilation, no packaging, no complicated dependencies. It all Just Worked™.

Integrated Development Environments

Then, new machines came about, with bigger keyboards, bigger hard drives, bigger screens and (oh, horror) no built in Basic. At that time somebody told me that the Real Programmers don't use Pascal - somehow I failed to get the tongue-in-cheekness - it all made a lot of sense to me. Finally, after I couldn't find a Fortran compiler for IBM XT (actually Pravetz16), determined to become a Real Programmer I settled for QuickBasic. QB was very powerful - it had functions (which I thought were something like GOSUB but with a name) and you didn't have to put line numbers (still I didn't trust it, so I usually typed them in just in case). It also had a number of new commands, and almost none of the Apple II ones. Some time around 1992, I can't remember what happened, but I abandoned QBasic and joined the quiche-eating side of the power - I switched to Turbo Pascal 4.0.

Turbo Pascal was a big jump for me - it hade interesting new abstractions like units and scoping (the latter one being useless feature that only stops you from seeing your own variables), but one notable feature was that one program could be spread over multiple files. At that time text UIs were all the rage, so I had my own library for drawing animated windows, menus, etc. The whole thing was one file and I wrote wrote a couple of toy-apps, each of them having its own copy of the Library (notice the capital letter here). Every day I wanted to show my mom and dad to "what the computer can do" and I tried very hard to convince then start using my expense-tracking app (needless to say, my attempts were futile... My sister had a much better success at trying to get them to eat from her first cake).

At that time I didn't realize how much the IDE was doing for me - all I knew is that I press Ctrl+F9 and a couple of seconds later I get an EXE in the output directory. There was no packaging and I couldn't figure for the life of me why would anybody want to compile outside of the IDE.

Make?

As time went by, the IDEs changed (Turbo Pascal 5-7, Turbo/Borland C++ 2-4, Visual C++ 5-6), but my attitude stayed the same. Come summer 1999, I was working part-time as a developer in a small company and all of us ~20 developers were happily building release binaries with Visual Studio. There was a lone guy that tried to propose to use an obscure utility called make. It looked like you have to write yet another program that would do what the IDE does, but you needed to use obscure syntax, and call the "compiler" and "linker" directly, specifying every command-line parameter, and listing filenames manually - it was a lot more work. The benefit that he tried to put forward weren't very convincing either: "you can build from the command line!" - countered by "and why would you want to do that?" or "people that don't use Visual Studio can build the project", retorted by "are you crazy? Everybody uses Visual Studio." well, that guy was actually using VI... I think he didn't last very long there.

When I joined my next company I had to use Java. They were not using Visual Studio and in fact they didn't have a standard Java IDE. At that company they were building using make. Recursive make. Every directory had its own makefile, most of them containing only boilerplate code, including a toplevel template (quite annoying when you have to debug a build issue and count the number of '../' in the include), and to make it more interesting, some of the makefiles were not using the template, having their own goals, invoking OS commands, etc... Overall it worked (except when it didn't.) Most of the problems we had were related to incorrectly set environment variables and missing external programs. It was difficult to reason about the build process as the build files was spread all over the directory tree. In the end, one of the developers rewrote the whole build using Ant.

Ant

Ant worked. Much better than make. Looking back, I can say that this was because:

  • Ant is much less dependent on environment variables. In that case there was an build.properties file that everybody had to customize once and that was it.
  • Ant does not use OS commands. Everything an Ant build needs is either provided by the distribution or shipped with the source files (you don't have the habbit of plopping random jars in your $ANT_HONE/lib directory, do you?)
  • Ant's syntax is much more restricted than make. A syntactically invalid Ant script wouldn't run; a syntactically invalid makefile can erase your harddrive.
  • Ant was designed for Java, handling many common tasks right out of the box.

Many detractors say that Ant is too verbose and they are right. I personally don't have big problems with this as my editor usually autocompletes the tasknames and the attributes for me and warns me when I make a mistake. The modern Ant (1.6+) also allows you to factor your build fairly well by using includes, presetdefs and macrodefs. Actually Ant's biggest problem is that it is Turing complete. The target's dependency resolution, combined with the if and unless attributes is often abused to simulate control-flow statements, which pollutes the target namespace and complicates the dependencies. Too often the targets don't have good names because their only purpose is to hold a piece of code reused in some other targets (actually this use-case is served better by macrodefs, but many people still use targets). The assign-once-ignore-following semantics of the Ant properties is good for implementing overriding, but when we use Ant as a language given the lack of scoping, the namespace gets polluted really quickly and you might end up having strange interactions between unrelated targets.

As experiment I've tried useing Ant tasks from Jython. It works great - you have real variables, real control structures, the code is much more concise and you can use any other Java library you want. One downside is that for straight-forward builds (compile bunch of files, package them in a jar and zip them with some scripts) Ant is arguably easier to read, as there are fewer things one needs to be aware about. But the real dealbreaker is that you don't get any tool support - no IDE autocompletion, no syntax checking on-the-fly, no integrated build runners, nothing!

One thing I didn't mention is that Ant is very easy to customize - extend a class, provide some getters and setters, write your logic in the execute method and you are done! To use your custom task, you need to ship your jar with the build script and add an one-line definition to your build. If you have more tasks you can package them together with a simple descriptor and import them all at once using a namespace (this is called antlib).

Make!

Few years later, I was porting Java games for mobile phones from Doja to EzAppli and VSCL. I had common scripts for each of the platforms and every time I started a new port, I just had to tweak a template-script containing an import statement and a couple of properties. If I needed to port the same game for another platform, all I needed to change was the include statement. That was nice.

One day I got a port for a new system - it was called BREW and the API was in C. Initially I considered writing some Ant tasks to handle the native toolchain, but after some consideration, I read a couple of articles (see Recursive Make Considered Harmful) and decided to give make another try.

One of the useful make features is the dependency inference rules. This way, you can say that a *.c file generates *.o file. Then you just specify which *.o files your binary is comprised of and make will automatically guess your source files. If the source file is not newer then the object file, make is smart enough not to recompile it. The Java compiler does this by default (when using wildcards).

Make does not care about what commands you put in its goal definitions. That's why, out of the box it doesn't deal with transitive dependencies. To deal with this, most C compilers can generate a dependency listing in make format, which you can include in your makefile and regenerate when the dependency graph changes. In Java, the same thing can be achieved by using Ant's dependset task and some IDEs (like IntelliJ IDEA) can track all your dependencies (including transitives) as you type and recompile all impacted files.

In the end, I had a pretty well factored build system using make, requiring minimum configuration (much like the Java one), allowing for cross-compilation targeting x86 and ARM architectures, using different toolchains and everything. If I compare the Ant/makefile approach with the IDE, I'd say that the build scripts take more time to pay off. If you work on one project and your build is not complex and you don't need repeatable builds (because you work alone and your customer doesn't care), then the IDE might be a better proposition.

Shells, Perls and Pasta

Once again, I started on a new job and it turned out that in my department nobody uses a build tool. Everybody was usually building in their IDEs and copying straight to production or using ad-hock shell scripts or perl to build from the sources directly on the production box (the latter was rationalized as "this way we can fix bugs faster").

In the end, all scripts were simple compile+jar, sometimes even skipping the 'jar' step. They did get the job done and the business was happy. There were a number of things missing, like reproducible builds, reliable roll-back, etc. but it is a matter of tradeoff whether one wants to spend the necessarry time studying and implementing a build system or spend the same time implementing new functionality or fixing application bugs. There's nothing wrong with either way.

Maven

After spending some time working on an application with Ant Build from Hell, I was dreaming of a brave new world, where each application will be layed out in modules and packages with controlled dependencies, each module's build script would be simple and clean and one can focus on the actual application functionality.

Enter Maven (actually Maven2). After being burned by Maven1, I still thought that the ideas were good, and it was the actual implementation that sucked so bad. Maven2 is a new start, and a new chance to reinvent the wheel. The project developers have taken the working concepts from Maven1, pruned the ones that turned out to be a bad idea, and reimplemented everything from scratch. It's still not clear why did they decide to use their own DI container and classloader management (instead of say Spring and OSGI), but it works.

Maven has the chance to hit the sweet spot between a build-scripting tool and an IDE-style pure declarative build. In the core of Maven is the build lifecycle, which is just an abstract sequence of steps. Then, in your POM you define (or inherit) a packaging. The packaging defines a set of default plugins and executions. You can think about the plugins as a bunch of Ant-tasks (or 'goals' in mavenspeak), which are versioned together. The executions define parameters for the actual goal and are bound to a lifecycle phase.

Most of the parameters in a goal are optional, using sensible defaults. The defaults are either sensible constants or references to different parts of the POM. E.g. the compiler:compile goal would get the source directory from the POM reference ${pom.build.sourceDirectory} and use the constant "false" for it's fork parameter. All the POMs (or their parent POMs) in Maven2 inherit from a common "Super-POM". The common POM specifies many defaults (i.e. directory layout), so you don't need to, if you keep to the Maven Conventions. An important part is complying as much as possible to the standard Maven Directory Layout - it makes your life much easier.

There are some areas in Maven that are still rough. The release plugin is still quite limited (although there is work under way to implement features like release staging and binary promotion). There are a couple of annoying bugs in the assembly plugin, which are fixed on the head, but not released for more than an year. Some issues (like the explicit support for aggregator plugins) are being postponed for Maven 2.1 (which will probably ship around Q2-3 of 2008). But overall, I think it is an improvement.

Conclusions

So, I'm planning to use Maven2 for the time being and perhaps write a plugin or two for some tasks which it does not handle well (right now I'm still cheating, using the antrun plugin). I'm still (ab)using Ant for common scripting tasks like restarting a remote server through SSH connection, deleting files on remote machine, setting up a database table or deploying configuration files in remote environment. All these things do not fit in the build lifecycle and wouldn't benefit as much from writing Maven plugins for them. The main benefit in this case is that I get a simple, completely cross-platform scripting language, providing many common commands lacking from the normal Unix environment (btw did I mention that expect sucks?)

And finally, here are some more tools that I'm planning to check out:

  • scons and rake - build tools using Python amd Ruby respectively, each of them using the underlying platform and some clever code for doing build stuff
  • buildr - another Ruby tool that builds on rake, allows you to use Ant tasks, designed as a drop-in replacement for Maven2 (hopefully allowing for mixed environment).

2 comments:

Cookie #3 said...

Now all we need is someone that will develop an awesome gui for slapping together goals packages and the lot in a visual way. Yeah I know, you cant' really use that ona cmd line, but hey, it would help, if only just to show the "new" guy on the team how your build scripts are structured in a visual way.

Dimitar said...

Gizmod, the main advantage on the maven approach is that you *can't* slap together goals and tasks in any way you seem fit. What you specify is your static structure (which modules and libraries we depend on) and use the predefined Maven build process.

For a newbie, the best thing would be good documentation (still lacking) and patience.

Right now, Maven provides tools to analyze your dependencies - there are plugins you can run from the command line, there are a couple of GUI tools and of course there are all the IDEs.

On the other hand, presenting something in a graphical way does not reduce the complexity, so I'm doubtful about how much we should invest in this direction.

I like the buildr approach, because it allows for better modularisation and overall is more concise, which means that you spend less time reading the same text over and over.

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.