Thursday, October 30, 2008

Mavenizaton - A real life scenario

The other day I posted a message on a certain vendor’s support forum asking about the possibility to include minimal POMs in their jars (so they will be easier to mass-import into local repository) and I got the following reply:

We want to move from ant to maven for a long time but never got a chance to do it. Our ant script is pretty big and complex so we are afraid it will take us a while to migrate to maven. Since we have other important things to work on, we keep delaying this kind of infrastructure things that have relatively less impact than bugs. If you know any way to migrate from ant to maven quickly, I would love to hear.

So I asked if I can get access to their Ant script and they kindly sent me that 3000-lines of a beast.

The Challenge

The product is comprised of several modules which have well-defined dependencies on one another and can be used in various configurations. Each configuration is described in a build.properties file by xxx-source-path, xxx-properties-path and xxx-output-path properties. The source-path contains both Java files and misc. resources (there are no tests). The xxx-properties-path contains property files only.

In the end, for each module the build produces the following artifacts:

  • JAR files – depending on properties, these can be:
    • Obfuscated with Zelix KlassMaster
    • Debug builds (still obfuscated, but with line-numbers)
    • Postprocessed by Retroweaver for Java 1.4 compatibility
  • Zipped sources
  • Zipped source stubs – public and protected method definitions and javadoc (no method bodies)
  • Zipped Javadocs
  • User manual in PDF format

In addition to these primary artifacts, the project:

  • Generates a couple of demo applications packaged as single-click JARs and JNLP.
  • Filters all property and some source files, substituting placeholders with values defined in the build.properties.
  • Compiles on the fly a doclet and then uses it to generate the source stubs.
  • Creates a bunch of bundles, containing different combinations of the aforementioned artifacts.

The vendor has a policy of releasing new demo-build every day which expires after 28 days.

The First Step – Fake It

In fact, we, the customers don’t care whether you are using Maven internally. All we care is to receive a POM with correct dependencies and (if possible) to be able to download the libraries from the official Maven repository (a.k.a. Repo1) or if not – your private repository (i.e. http://repo.your-company.com/demo or http://repo.your-company.com/licensed ).

In order to achieve this:

  1. For each artifact, create a pom.properties file containing 3 properties: version, groupId and artifactId.
  2. For each artifact, create a minimal POM containg the artifact group, name, version and dependencies information. Alternatively you can put placeholders and use Ant filtering to resolve the group/name/version from the pom.properties
  3. Integrate the new files into your build process, treating them as yet another resource file and placing them under the META-INF/maven/your/artifact/group in each artifact JAR.
  4. If desired use artifact:deploy task to push the artifacts to a Maven repository.

This is all you need to do for the ‘regular’ JARs. For the various other files, you would want to use the same artifact name and a classifier I would suggest using:

  • sources – standard Maven convention, since the full sources are not released to regular customers, it would make sense to have them either under a different classifier (e.g. fullsources) or same classifier, but from a private repo
  • javadocs – standard Maven convention
  • jdk14 – for retrozapped binaries
  • debug – for obfuscated binaries with line numbers

Then you Migrate

If we step back, the current build proces is something like this:

  1. Preparation Phase
    1. Set variables
    2. Compile custom doclet
  2. Generate Sources
    1. Filter particular files to resolve values from properties
  3. Compile
  4. Copy resources
  5. Package binaries (create jar file)
  6. Obfuscate
  7. Retrozap
  8. Javadocs
  9. Source Stubs
  10. Assemble distribution archives
  11. Build the demos

Here is how it would look after we convert the build to Maven:

  1. All the path variables go away, replaced by standard Maven directory layout (you should be able to reorganize the source trees for a few hours on a Friday afternoon.) As a last resort you can define custom directories in the build section.
  2. The company-name, version, etc. props go away replaced by the respective POM elements.
  3. The rest of the settings are specified in profiles defined on project or user level.
  4. The resource filtering is a built in feature for maven, but if you need to pre-process Java files, you would need to attach your own plugin to the grnerate-sources lifecycle phase.
  5. Compile and Package are standard Maven targets and you don’t need to do anything.
  6. The ZKM obfuscation would be tricky because there is no Maven plug in (or at least I couldn’t find any). I suggest that you use the AntRun plugin as a temporary hack.
  7. The Javadoc plugin would automatically generate your Javadocs, you need to explicitly specify a second execution for your custom doclet
  8. The assembly plugin should help you organize your different distributions.

This leaves out a lot of details, but ultimately some of the decisions are yours to make. I would suggest using a script to reorganize your sources, so you can do a number of dry-runs and verify that the result is as expected before you change the script to perform the move in the version control system.

Project Layout Considerations

Each Maven POM has exactly one parent, but can be part of multiple reactors. Both parent and reactors are pom-packaged artifacts, but their role is different. Most tutorials use the same pom for both purposes, but there is benefit in separating the functions (more about this in a bit).

POM Inheritance

The role of the parent is to provide defaults for the child POMs that inherit from it. The POMs are resolved by the normal Maven dependency resolution mechanism, which means that if you want the child projects to reflect a change in a parent-pom, you need to install/deploy the parent first. Alternatively, you can use the project/parent/relativePath element to explicitly specify the parent-pom’s path in the file layout (in this case, the changes are picked from the file-system.)

At the top-level parent you should use dependencyManagement to specify versions and transitive-dependency-exclusions for all artifacts used by the project. No child-artifacts should explicitly declare versions and exclusions.

You should use pluginManagement to fix the plugin versions and (optionally) specify common options for plugins. It’s ok (although not recommended) to override the plugin options in the children-pom’s, but you should not specify versions there. If you need the same overrides in multiple children, consider extracting them into intermediary-parent pom.

Finally, you should try to specify most of your profiles on a parent-pom level. This, combined with the pluginManagement allows you to define the bulk your build in the parent and leave the children DRY.

The children have to explicitly specify the master-pom version which makes it a bit of a pain to increment the version (on the other side, the master-pom usually changes much slower.)

Building in Dependency Order

The normal way for Maven to resolve dependencies between projects is through the local repository. This means that if project A depends on project B and we make a change in B, in order to test A against the new change, we need to first install or deploy B and then build A, so it would pick up the new changes (this is assuming we’re using SNAPSHOT versions).

The reactor projects take care about such closely related artifacts and automatically rebuild them in the correct order. Also they act as a grouping for the sub-projects (useful when you build assemblies). The reactors themselves are artifacts using pom-packaging (same as parents), but they explicitly list the relative path to each constituting module, and this is the reason that I recommend separation between parents and reactors:

Creating Distributions

The standard way of packaging distributions in Maven is using the Assembly plugin . It allows you to describe your distribution package in terms of project resources, artifactsIds and classifiers. A typical assembly descriptor would be:

  • Create a tar and zip files with classifier ‘distro’.
    • Add all files from /src/main/assembly
      • excluding **.xml, and **.sh.
    • Add all files from /src/main/assembly
      • that match **.sh
      • and set the unix permissions to 551.
    • Add the maven artifact
      • with group-id=com.company.product.modules and artifact-id=module1 and classifier=obfuscated
      • put it under lib.
    • Add all the transitive dependencies of this project (assuming this descriptor is in a reactor project)
      • and put them under lib/ext

You can also unpack and consolidate JARs, specify manifest-entries and signing for single-JAR distributions and so on.

Each artifact can produce more than one assemblies. If you want your assemblies to go into the repo, make sure you use the ‘attached’ goal. Most of the times, I use the reactor poms for distributions, but sometimes I just define the assembly in the JAR module’s POM.

Parents and Children vs. Reactors and Modules
Category Parent/Child Reactor/Module Merged Parent+Reactor
Direction of Coupling Children know the names of their parent, parents don’t know their children Reactors knows the locations of their modules, modules don’t even know whether they are being built as a part of a reactor Creates circular dependency between children/modules and parent/reactor artifacts.
Multiplicity Each child has exactly one parent, specified in the child POM. A module can be part of more than one reactors. You must keep it to one reactor. If you define alternative reactor, the parent will still be resolved to the main parent-POM (as specified in the children POM’s), which can lead to a great deal of confusion (and undefined behaviour for some plugins).
Location The parent is usually resolved through the repo (optionally through relative path) The modules are always resolved through relative path You need to physically layout your project, so the parent POM is accessible by relative path (which is not needed otherwise).
Change Drivers The parent changes when your build process changes. The reactor changes when you add a new module By the artificial coupling, your POM is subject to more change. This is particularly bad for the parent role, because you need to manually update the parent-versions of the children.

Given this table, I’d recommend a project layout along these lines:

-+-+- project-root/
 |
 +-+- modules/
 | |
 | +-+- module1/ 
 | | |
 | | +--- pom.xml       | inherit from modules parent; specify dependencies 
 | | |                  | and module details (name, version, etc.)
 | | +-+- src/
 | |   +-+- main/
 | |   | +--- java/
 | |   | +--- resources/
 | |   |
 | |   +-+- test/
 | |     +--- java/
 | |     +--- resources/
 | |
 | | 
 | +-+- module2/ 
 | | |
 | | ...
 | |
 | +-+- module3/
 | | |
 | | ...
 | |
 | +-+- reactor1/       | e.g. all modules - used for the CI build and pre-commit
 | | |
 | | +--- pom.xml       | no parent; specify the modules as ../module1, ../module2, etc.
 | |
 | +-+- reactor2/       | e.g. only module2+module3 - used by the module3 developers only
 | | |
 | | +--- pom.xml   
 | |
 | +--- pom.xml         | modules parent - inherits from the main parent pom and specifies 
 |                      | module-specific profiles and plugin settings
 |
 +-+- applications/
 | |
 | +-+- application1/ 
 | | |
 | | +--- pom.xml       | inherit from apps parent; specify dependencies, module details 
 | | |                  | and whatever custom steps necesarry for the application
 | | ...
 | | 
 | +-+- application2/ 
 | | |
 | | ...
 | |
 | +-+- reactor/        | e.g. all applications
 | | |
 | | +--- pom.xml       | no parent; specify the modules as ../application1, ../application2, etc.
 | |  
 | |
 | +--- pom.xml         | apps parent - inherits from the main parent pom and specifies 
 |                      | module-specific profiles and plugin settings
 |
 +--- main-reactor      | specify as modules ../applications/reactor and ../modules/reactor1
 |
 +--- pom.xml           | the main parent pom

A few Tips

When You getStuck with a Bad Plugin

For all operations that you can’t easily achieve with the existing Maven plugins, you can fall back to the Antrun plugin and reuse snippets of your current build script. Yes, it’s a hack, but it gets the job done and you can fix it when it breaks.

Try to keep these hacks in the parent pom.

Dependency Management

When you define dependencies, take care to specify proper scopes (default is ‘compile’, also you should consider ‘provided’ and ‘test’). You should also mark all optional dependencies like POI HSSF. For some artifacts you might need to specify dependency-excludes, so Maven doesn’t bundle half of the Apache Commons with you.

Evaluation Builds

You might want to release the daily evaluation builds as snapshots. Maven has specific semantic for snapshots that it always refreshes them based on timestamp. For example, if you release an artifact my.group:module:1.0.0, Maven will download it once to the local repository and never look for another version unless you delete it manually (that's why you should NEVER re-cut a release once it's been pushed out - just declare the version broken and increment).

If you use a snapshot version (i.e. my.group:module:1.0.0-eval-SNAPSHOT), Maven will check once a day (unless forced with '-u') and automatically any new version base on timestamp. To achieve this you might want to cut an eval-branch from the tag of each release and change the version using a shell script or something.

No comments:

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.