Saturday, October 27, 2007

On Maven2

Maven is a tool with an interesting history dating back to 2001. In its first years it got deservedly bad reputation for being unstable, poorly documented and more or less experimental piece of work. The release of version 2.0 in 2005 fixed many of the early quirks and set right many of the short-sighted design decisions. After having some bad experience with Maven 1, I was weary to get on the M2 bandwagon, but when I moved to a new job in 2006 I decided to give it a go. So far there have been ups and downs, but I'm fairly happy with it. I still haven't abandoned all my Ant and shell scripts, but I find that I'm using Maven as a primary building tool for most of my projects.

The core proposition of Maven is that one should be able to declare what they are building in some sort of manifest file and the build tool should be able to figure how to build it. The manifest should contain only the information that is specific for the project and all the build procedures should be implemented as plugins. Each build should be related to exactly one artifact of certain type. The artifacts are stored in repositories (more about this later.)

In Maven parlance, the manifest file is called POM (that stands for Project Object Model). If a project adheres to a predefined filesystem layout, the actual XML one has to write can be very small. Here is a minimal example:

<project schemalocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd ">
  <modelversion>4.0.0</modelversion>
  <groupid>com.acme.foo</groupid>
  <artifactid>foobar</artifactid>
  <version>3.14-alpha</version>
</project>
This configuration is enough to enable Maven to build, package, install, deploy and clean your project. It would even generate the repots site if you run mvn site. That's the way I usually start my projects. Later, you can tack more elements as needed.

The top-level element defines the default schema for the POM. The schema has a version which has to match the modelversion element below. While the schema is not strictly necessary, it makes the POM editing much easier if you are using XML aware editor. It is well constrained and there are few extensibility points, so the autocompletion works very well. It also features annotations for each element, which means that if you use IntelliJ IDEA you can just press Ctrl+Q on any element and get instant documentation. The schema design is a bit annoying, but it is very regular and easy to understand:

  • No attributes - everything is an element.
  • No mini-languages - everything is an element and any custom textual notations are avoided as much as possible (though in many places they use URLs). This allows for simple parsing and processing.
  • If an element can be repeated more than once it is enclosed in container element that can appear only once. This ensures that all elements of one type are textually next to each other. Each container can contain only elements of the same type.
Overall, the POM bears a resemblance to an IDE configuration file and they serve the same functionality. Both Maven and IDEs runs a predefined build process, parameterized by the information in the project file (or POM). One major difference is that Maven is designed to be ran from the command line and also encapsulate all environment-specific factors into the POM and the settings.xml files. You can use Maven to generate project files for IDEA, Eclipse and Netbeans based on the information in the POM.

The machine-specific and user-specific configuration is specified in the settings.xml files. There are two of them, the machine-specific settings are stored under the $M2_HOME/conf directory and apply to all the users on the machine. Usually the contents is standardized within the team (internal repositories, proxy settings, etc.) In our company, this file is posted on the wiki where everybody can download it. Alternatively we could have built our internal Maven distribution with the file pre-included. The second settings.xml file resides under ~/.m2 and contains user-specific settings overriding the machine-specific. One can use the user-specific settings to keep login credentials, private keys, etc. On Unix machine, this file should be readable only by the user.

Though the POM is very flexible and can be tweaked to accommodate a number of different scenarios, it is very recommended to refrain from overriding the defaults and use the Maven conventions as much as possible. This way, new developers on the project can get up to speed faster and (important) it's much less likely that you get bitten by untested plugin 'feature'.

One of the thing that new users tend to dislike most is the standard directory layout. In brief, you have pom.xml in the root, and your files go under a directory called src. By default, all files (artifacts) generated during the build go under a directory called target which helps for an easy cleanup. Note that there is no 'lib' as all the libraries reside in the local repository (more about this in another post.)

So far so good, but then under source we usually have main and possibly test, integration and site directoryes and then under them we have java, resources and only there we put the actual source files. This means that we have at least 3 directory levels used for classification above our sources and if you jump between them using Windows Explorer or bash it makes for a lot of clicking/typing. On the other hand, this is the price one pays for the Maven's magic - each directory level means something to the plugins that build your project. E.g. the unit test goal knows that it should runt he tests under test and not the ones under integration. All the files in the resources directory are copied in the final JAR, while the ones under classes are not and so on and so forth.

This post became rather long, so I'll finish it here. Next week I'm going to cover Maven's dependencies management and repository organization. Again I'll try to talk more about "why's" and less about the "what's" that are already covered pretty well in the following tutorials:

Friday, October 26, 2007

Ninjava Presentation

First Post!

A month ago I volunteered to give a presentation at Ninjava (a Java User Group in Tokyo). At first it looked easy - the topic was "Java Project Lifecycle" and that's something I've been doing for the last two years, so I should have a lot to say, right? Well, this actually turned to be my first problem - after putting my thoughts on paper, I realized that there are too many things and there is no way that I can fit the whole talk in one hour.

Since everything looked relevant to the main topic and everything looked important, I figured that I needed to tighten the scope. I thought that I can focus on a minimal process and how we can use tools to automate the chores like building, releasing and configuration management. In the end, I put together this plan to build a small application, starting with a domain layer (actually that might be too big of a word for a class multiplying two BigDecimals), test, add command-line interface, release, add simple GUI, release, show how the whole thing can be split into 3 modules and packaged in different configs - GUI only, CLI only or both. Finally I was going to add a feature on the head and backport it to a stable branch. I planned to use the following tools:

  • Subversion - version control system similar to CVS.
  • Maven2 - project automation tool from Apache. The tool is build around the POM (Project Object Model). The POM is an abstract model, but in practice, it is usually an XML document which describes information about the project. The POM does not describe what actions can be performed on a project.
  • Artifactory - repository management server for Maven2. Usually it's configured as caching proxy in front of the internet repository, sindicating its contents with the internal repo. The deployment can be done either through WebDAV, HTTP PUT or a convenient web UI for manual deployment.
  • TeamCity (check out the EAP for the 3.0 release)- a continuous integration (CI) server from JetBrains. Apart from the usual watch-and-build functionality, it also features some interesting stuff like conditional commit (send the changes to the server which will commit them only if the build passes), duplicates analyzer and static analysis tool working on AST level, plugins for many different IDEs, etc.
  • Jira - an issue tracker. Actually I should clarify - the best issue tracker I've used (especially with the Green Hopper plugin and the JIRA Client GUI)
  • Confluence - a damn good wiki. In our department it also doubles as reporting tool.
  • Fisheye + Crucible - a server indexing your version control system, with web interface for browsing changesets, files, revisions, filtering by committer and full text search. Crucible is an add-on to Fisheye, that lets you select a bunch of files and create a code review ticket, adding inline notes. Multiple people can collaborate, posting replies to the notes and proposing changes until the ticket is resolved.

The main factor for choosing these tools was that I am currently using them either at work of in the development of the Mule JMX Transport (BTW, all of them have free licenses for open-source). The first rehearsal was one evening when I stayed after work and went through all the steps on my workstation - it took me about 40. Perhaps 10 mins I spent looking up documentation on the internet, so I thought it was not too bad. On the presentation day I took a day-off from work, so I can have the time to set up Daniela's laptop as my primary demonstration machine (as I had already installed all the server apps on my poor Kohjinsha) and everything was going fine, until I started the dress rehearsal. After I chased Daniela out of the apartment, I went through the already familiar steps this time talking as I would on the presentation. Oops... it turned out that 15 minutes were gone and I was still at the very beginning (and I didn't even manage in detail explain what I was doing). Attempt #2 - this time I consciously tried to only say what I am doing and not what I am achieving or why I was doing it. This time the speed was much better, but I found out that I was getting these ~30 sec pauses when switching active files or when searching for symbol, which were ruining the whole flow.

A quick look in Sysinternals' ProcessExplorer and ProcessMonitor showed that the working set of java.exe is 300mb and it's paging like crazy. I stopped some services, closed JiraClient and all the other applications and tried again. This time it was better - it was also clear that this is not going to work. I was at step 13 and I was well past 1 hour in the presentation... The wall-clock was showing 4pm... It was to late to change the scope, so the only thing that seemed to make sense was to just go out, start talking and try to provoke questions, which will help me focus on something.

I showed up at Cerego around 7:30 (after meeting Nikolay at Shibuya station) and while I was thinking which kind of excuse should I use, when Peter proposed that I could use one of the machines in the conference room. There was nothing to lose, so I settled to set up an environment in the remaining 30 mins - I downloaded Maven, IDEA, JiraClient, JDK, set up environment variables and just in case checked out the JMX Connector. I tried to set up Jira Client against the JIRA instance running on the Kohjinsha, but for some reason it didn't resolve the host name, so I decided not to bother. By the time I finished there were about 10-15 people in the room.

I started talking the usual stuff about how some tools work well in some situations and suck in others (except SourceSafe, which sucks in every situation); how the primary criterion in choosing a is that they should help us get the job done in less time/risk and blah, blah.. (more about this in a latter post) and then I moved on to my current environment and to Maven. Then Curt had a question about the POM, then another and then just one more... well, I guess I have to thank him for this. Overall, we talked a lot about Maven, went quickly through TeamCity, just mentioned Crucible and didn't even touch most of the other stuff.

Now I realize that even my cut-down scope was huge and it was impossible to cover everything in one hour with enough detail. The other fundamental problem was that I started the presentaion without having "something to sell". That's why the whole talk was lacking direction, so only when I started "selling" Maven it started taking shape. It would have been better if I had picked Maven from the very beginning, so I could prepare more examples and not waste time with general stuff. In the end, people said they liked it and may be some really did.

Yesterday it came too me that a blog would probably be a much better place for all the ramblings I wanted to share, so here we are. This is the first post I'll try to publish at least one post weekly (hopefully this time I won't lose interest after the first few weeks)

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.