Dimitar's Blog: Build

Showing posts with label Build. Show all posts

Thursday, October 30, 2008

Mavenizaton - A real life scenario

The other day I posted a message on a certain vendor’s support forum asking about the possibility to include minimal POMs in their jars (so they will be easier to mass-import into local repository) and I got the following reply:

We want to move from ant to maven for a long time but never got a chance to do it. Our ant script is pretty big and complex so we are afraid it will take us a while to migrate to maven. Since we have other important things to work on, we keep delaying this kind of infrastructure things that have relatively less impact than bugs. If you know any way to migrate from ant to maven quickly, I would love to hear.

So I asked if I can get access to their Ant script and they kindly sent me that 3000-lines of a beast.

The Challenge

The product is comprised of several modules which have well-defined dependencies on one another and can be used in various configurations. Each configuration is described in a build.properties file by xxx-source-path, xxx-properties-path and xxx-output-path properties. The source-path contains both Java files and misc. resources (there are no tests). The xxx-properties-path contains property files only.

In the end, for each module the build produces the following artifacts:

JAR files – depending on properties, these can be:
- Obfuscated with Zelix KlassMaster
- Debug builds (still obfuscated, but with line-numbers)
- Postprocessed by Retroweaver for Java 1.4 compatibility
Zipped sources
Zipped source stubs – public and protected method definitions and javadoc (no method bodies)
Zipped Javadocs
User manual in PDF format

In addition to these primary artifacts, the project:

Generates a couple of demo applications packaged as single-click JARs and JNLP.
Filters all property and some source files, substituting placeholders with values defined in the build.properties.
Compiles on the fly a doclet and then uses it to generate the source stubs.
Creates a bunch of bundles, containing different combinations of the aforementioned artifacts.

The vendor has a policy of releasing new demo-build every day which expires after 28 days.

The First Step – Fake It

In fact, we, the customers don’t care whether you are using Maven internally. All we care is to receive a POM with correct dependencies and (if possible) to be able to download the libraries from the official Maven repository (a.k.a. Repo1) or if not – your private repository (i.e. http://repo.your-company.com/demo or http://repo.your-company.com/licensed ).

In order to achieve this:

For each artifact, create a pom.properties file containing 3 properties: version, groupId and artifactId.
For each artifact, create a minimal POM containg the artifact group, name, version and dependencies information. Alternatively you can put placeholders and use Ant filtering to resolve the group/name/version from the pom.properties
Integrate the new files into your build process, treating them as yet another resource file and placing them under the META-INF/maven/your/artifact/group in each artifact JAR.
If desired use artifact:deploy task to push the artifacts to a Maven repository.

This is all you need to do for the ‘regular’ JARs. For the various other files, you would want to use the same artifact name and a classifier I would suggest using:

sources – standard Maven convention, since the full sources are not released to regular customers, it would make sense to have them either under a different classifier (e.g. fullsources) or same classifier, but from a private repo
javadocs – standard Maven convention
jdk14 – for retrozapped binaries
debug – for obfuscated binaries with line numbers

Then you Migrate

If we step back, the current build proces is something like this:

Preparation Phase
1. Set variables
2. Compile custom doclet
Generate Sources
1. Filter particular files to resolve values from properties
Compile
Copy resources
Package binaries (create jar file)
Obfuscate
Retrozap
Javadocs
Source Stubs
Assemble distribution archives
Build the demos

Here is how it would look after we convert the build to Maven:

All the path variables go away, replaced by standard Maven directory layout (you should be able to reorganize the source trees for a few hours on a Friday afternoon.) As a last resort you can define custom directories in the build section.
The company-name, version, etc. props go away replaced by the respective POM elements.
The rest of the settings are specified in profiles defined on project or user level.
The resource filtering is a built in feature for maven, but if you need to pre-process Java files, you would need to attach your own plugin to the grnerate-sources lifecycle phase.
Compile and Package are standard Maven targets and you don’t need to do anything.
The ZKM obfuscation would be tricky because there is no Maven plug in (or at least I couldn’t find any). I suggest that you use the AntRun plugin as a temporary hack.
The Javadoc plugin would automatically generate your Javadocs, you need to explicitly specify a second execution for your custom doclet
The assembly plugin should help you organize your different distributions.

This leaves out a lot of details, but ultimately some of the decisions are yours to make. I would suggest using a script to reorganize your sources, so you can do a number of dry-runs and verify that the result is as expected before you change the script to perform the move in the version control system.

Project Layout Considerations

Each Maven POM has exactly one parent, but can be part of multiple reactors. Both parent and reactors are pom-packaged artifacts, but their role is different. Most tutorials use the same pom for both purposes, but there is benefit in separating the functions (more about this in a bit).

POM Inheritance

The role of the parent is to provide defaults for the child POMs that inherit from it. The POMs are resolved by the normal Maven dependency resolution mechanism, which means that if you want the child projects to reflect a change in a parent-pom, you need to install/deploy the parent first. Alternatively, you can use the project/parent/relativePath element to explicitly specify the parent-pom’s path in the file layout (in this case, the changes are picked from the file-system.)

At the top-level parent you should use dependencyManagement to specify versions and transitive-dependency-exclusions for all artifacts used by the project. No child-artifacts should explicitly declare versions and exclusions.

You should use pluginManagement to fix the plugin versions and (optionally) specify common options for plugins. It’s ok (although not recommended) to override the plugin options in the children-pom’s, but you should not specify versions there. If you need the same overrides in multiple children, consider extracting them into intermediary-parent pom.

Finally, you should try to specify most of your profiles on a parent-pom level. This, combined with the pluginManagement allows you to define the bulk your build in the parent and leave the children DRY.

The children have to explicitly specify the master-pom version which makes it a bit of a pain to increment the version (on the other side, the master-pom usually changes much slower.)

Building in Dependency Order

The normal way for Maven to resolve dependencies between projects is through the local repository. This means that if project A depends on project B and we make a change in B, in order to test A against the new change, we need to first install or deploy B and then build A, so it would pick up the new changes (this is assuming we’re using SNAPSHOT versions).

The reactor projects take care about such closely related artifacts and automatically rebuild them in the correct order. Also they act as a grouping for the sub-projects (useful when you build assemblies). The reactors themselves are artifacts using pom-packaging (same as parents), but they explicitly list the relative path to each constituting module, and this is the reason that I recommend separation between parents and reactors:

Creating Distributions

The standard way of packaging distributions in Maven is using the Assembly plugin . It allows you to describe your distribution package in terms of project resources, artifactsIds and classifiers. A typical assembly descriptor would be:

Create a tar and zip files with classifier ‘distro’.
- Add all files from /src/main/assembly
  - excluding **.xml, and **.sh.
- Add all files from /src/main/assembly
  - that match **.sh
  - and set the unix permissions to 551.
- Add the maven artifact
  - with group-id=com.company.product.modules and artifact-id=module1 and classifier=obfuscated
  - put it under lib.
- Add all the transitive dependencies of this project (assuming this descriptor is in a reactor project)
  - and put them under lib/ext

You can also unpack and consolidate JARs, specify manifest-entries and signing for single-JAR distributions and so on.

Each artifact can produce more than one assemblies. If you want your assemblies to go into the repo, make sure you use the ‘attached’ goal. Most of the times, I use the reactor poms for distributions, but sometimes I just define the assembly in the JAR module’s POM.

Parents and Children vs. Reactors and Modules

Category	Parent/Child	Reactor/Module	Merged Parent+Reactor
Direction of Coupling	Children know the names of their parent, parents don’t know their children	Reactors knows the locations of their modules, modules don’t even know whether they are being built as a part of a reactor	Creates circular dependency between children/modules and parent/reactor artifacts.
Multiplicity	Each child has exactly one parent, specified in the child POM.	A module can be part of more than one reactors.	You must keep it to one reactor. If you define alternative reactor, the parent will still be resolved to the main parent-POM (as specified in the children POM’s), which can lead to a great deal of confusion (and undefined behaviour for some plugins).
Location	The parent is usually resolved through the repo (optionally through relative path)	The modules are always resolved through relative path	You need to physically layout your project, so the parent POM is accessible by relative path (which is not needed otherwise).
Change Drivers	The parent changes when your build process changes.	The reactor changes when you add a new module	By the artificial coupling, your POM is subject to more change. This is particularly bad for the parent role, because you need to manually update the parent-versions of the children.

Given this table, I’d recommend a project layout along these lines:

-+-+- project-root/
 |
 +-+- modules/
 | |
 | +-+- module1/ 
 | | |
 | | +--- pom.xml       | inherit from modules parent; specify dependencies 
 | | |                  | and module details (name, version, etc.)
 | | +-+- src/
 | |   +-+- main/
 | |   | +--- java/
 | |   | +--- resources/
 | |   |
 | |   +-+- test/
 | |     +--- java/
 | |     +--- resources/
 | |
 | | 
 | +-+- module2/ 
 | | |
 | | ...
 | |
 | +-+- module3/
 | | |
 | | ...
 | |
 | +-+- reactor1/       | e.g. all modules - used for the CI build and pre-commit
 | | |
 | | +--- pom.xml       | no parent; specify the modules as ../module1, ../module2, etc.
 | |
 | +-+- reactor2/       | e.g. only module2+module3 - used by the module3 developers only
 | | |
 | | +--- pom.xml   
 | |
 | +--- pom.xml         | modules parent - inherits from the main parent pom and specifies 
 |                      | module-specific profiles and plugin settings
 |
 +-+- applications/
 | |
 | +-+- application1/ 
 | | |
 | | +--- pom.xml       | inherit from apps parent; specify dependencies, module details 
 | | |                  | and whatever custom steps necesarry for the application
 | | ...
 | | 
 | +-+- application2/ 
 | | |
 | | ...
 | |
 | +-+- reactor/        | e.g. all applications
 | | |
 | | +--- pom.xml       | no parent; specify the modules as ../application1, ../application2, etc.
 | |  
 | |
 | +--- pom.xml         | apps parent - inherits from the main parent pom and specifies 
 |                      | module-specific profiles and plugin settings
 |
 +--- main-reactor      | specify as modules ../applications/reactor and ../modules/reactor1
 |
 +--- pom.xml           | the main parent pom

A few Tips

When You getStuck with a Bad Plugin

For all operations that you can’t easily achieve with the existing Maven plugins, you can fall back to the Antrun plugin and reuse snippets of your current build script. Yes, it’s a hack, but it gets the job done and you can fix it when it breaks.

Try to keep these hacks in the parent pom.

Dependency Management

When you define dependencies, take care to specify proper scopes (default is ‘compile’, also you should consider ‘provided’ and ‘test’). You should also mark all optional dependencies like POI HSSF. For some artifacts you might need to specify dependency-excludes, so Maven doesn’t bundle half of the Apache Commons with you.

Evaluation Builds

You might want to release the daily evaluation builds as snapshots. Maven has specific semantic for snapshots that it always refreshes them based on timestamp. For example, if you release an artifact my.group:module:1.0.0, Maven will download it once to the local repository and never look for another version unless you delete it manually (that's why you should NEVER re-cut a release once it's been pushed out - just declare the version broken and increment).

If you use a snapshot version (i.e. my.group:module:1.0.0-eval-SNAPSHOT), Maven will check once a day (unless forced with '-u') and automatically any new version base on timestamp. To achieve this you might want to cut an eval-branch from the tag of each release and change the version using a shell script or something.

Saturday, November 3, 2007

Build Tools

I had started writhng a post explaining how the Maven repository, artifact resolution and build lifecycle work, but I figured that I'm repeating the Maven Book and the Bullet-point Guide. Instead, think it would be more interesting to go down the memory lane and talk about how my build tools have changed through the years.

Why Build?

I stated programming on Apple][ using Basic, and at that time I stored my programs as source on a 5¼" diskette. Every time I wanted to run a program, I typed 'load MyProg <ENTER> run<ENTER>'. No compilation, no packaging, no complicated dependencies. It all Just Worked™.

Integrated Development Environments

Then, new machines came about, with bigger keyboards, bigger hard drives, bigger screens and (oh, horror) no built in Basic. At that time somebody told me that the Real Programmers don't use Pascal - somehow I failed to get the tongue-in-cheekness - it all made a lot of sense to me. Finally, after I couldn't find a Fortran compiler for IBM XT (actually Pravetz16), determined to become a Real Programmer I settled for QuickBasic. QB was very powerful - it had functions (which I thought were something like GOSUB but with a name) and you didn't have to put line numbers (still I didn't trust it, so I usually typed them in just in case). It also had a number of new commands, and almost none of the Apple II ones. Some time around 1992, I can't remember what happened, but I abandoned QBasic and joined the quiche-eating side of the power - I switched to Turbo Pascal 4.0.

Turbo Pascal was a big jump for me - it hade interesting new abstractions like units and scoping (the latter one being useless feature that only stops you from seeing your own variables), but one notable feature was that one program could be spread over multiple files. At that time text UIs were all the rage, so I had my own library for drawing animated windows, menus, etc. The whole thing was one file and I wrote wrote a couple of toy-apps, each of them having its own copy of the Library (notice the capital letter here). Every day I wanted to show my mom and dad to "what the computer can do" and I tried very hard to convince then start using my expense-tracking app (needless to say, my attempts were futile... My sister had a much better success at trying to get them to eat from her first cake).

At that time I didn't realize how much the IDE was doing for me - all I knew is that I press Ctrl+F9 and a couple of seconds later I get an EXE in the output directory. There was no packaging and I couldn't figure for the life of me why would anybody want to compile outside of the IDE.

Make?

As time went by, the IDEs changed (Turbo Pascal 5-7, Turbo/Borland C++ 2-4, Visual C++ 5-6), but my attitude stayed the same. Come summer 1999, I was working part-time as a developer in a small company and all of us ~20 developers were happily building release binaries with Visual Studio. There was a lone guy that tried to propose to use an obscure utility called make. It looked like you have to write yet another program that would do what the IDE does, but you needed to use obscure syntax, and call the "compiler" and "linker" directly, specifying every command-line parameter, and listing filenames manually - it was a lot more work. The benefit that he tried to put forward weren't very convincing either: "you can build from the command line!" - countered by "and why would you want to do that?" or "people that don't use Visual Studio can build the project", retorted by "are you crazy? Everybody uses Visual Studio." well, that guy was actually using VI... I think he didn't last very long there.

When I joined my next company I had to use Java. They were not using Visual Studio and in fact they didn't have a standard Java IDE. At that company they were building using make. Recursive make. Every directory had its own makefile, most of them containing only boilerplate code, including a toplevel template (quite annoying when you have to debug a build issue and count the number of '../' in the include), and to make it more interesting, some of the makefiles were not using the template, having their own goals, invoking OS commands, etc... Overall it worked (except when it didn't.) Most of the problems we had were related to incorrectly set environment variables and missing external programs. It was difficult to reason about the build process as the build files was spread all over the directory tree. In the end, one of the developers rewrote the whole build using Ant.

Ant

Ant worked. Much better than make. Looking back, I can say that this was because:

Ant is much less dependent on environment variables. In that case there was an build.properties file that everybody had to customize once and that was it.
Ant does not use OS commands. Everything an Ant build needs is either provided by the distribution or shipped with the source files (you don't have the habbit of plopping random jars in your $ANT_HONE/lib directory, do you?)
Ant's syntax is much more restricted than make. A syntactically invalid Ant script wouldn't run; a syntactically invalid makefile can erase your harddrive.
Ant was designed for Java, handling many common tasks right out of the box.

Many detractors say that Ant is too verbose and they are right. I personally don't have big problems with this as my editor usually autocompletes the tasknames and the attributes for me and warns me when I make a mistake. The modern Ant (1.6+) also allows you to factor your build fairly well by using includes, presetdefs and macrodefs. Actually Ant's biggest problem is that it is Turing complete. The target's dependency resolution, combined with the if and unless attributes is often abused to simulate control-flow statements, which pollutes the target namespace and complicates the dependencies. Too often the targets don't have good names because their only purpose is to hold a piece of code reused in some other targets (actually this use-case is served better by macrodefs, but many people still use targets). The assign-once-ignore-following semantics of the Ant properties is good for implementing overriding, but when we use Ant as a language given the lack of scoping, the namespace gets polluted really quickly and you might end up having strange interactions between unrelated targets.

As experiment I've tried useing Ant tasks from Jython. It works great - you have real variables, real control structures, the code is much more concise and you can use any other Java library you want. One downside is that for straight-forward builds (compile bunch of files, package them in a jar and zip them with some scripts) Ant is arguably easier to read, as there are fewer things one needs to be aware about. But the real dealbreaker is that you don't get any tool support - no IDE autocompletion, no syntax checking on-the-fly, no integrated build runners, nothing!

One thing I didn't mention is that Ant is very easy to customize - extend a class, provide some getters and setters, write your logic in the execute method and you are done! To use your custom task, you need to ship your jar with the build script and add an one-line definition to your build. If you have more tasks you can package them together with a simple descriptor and import them all at once using a namespace (this is called antlib).

Make!

Few years later, I was porting Java games for mobile phones from Doja to EzAppli and VSCL. I had common scripts for each of the platforms and every time I started a new port, I just had to tweak a template-script containing an import statement and a couple of properties. If I needed to port the same game for another platform, all I needed to change was the include statement. That was nice.

One day I got a port for a new system - it was called BREW and the API was in C. Initially I considered writing some Ant tasks to handle the native toolchain, but after some consideration, I read a couple of articles (see Recursive Make Considered Harmful) and decided to give make another try.

One of the useful make features is the dependency inference rules. This way, you can say that a *.c file generates *.o file. Then you just specify which *.o files your binary is comprised of and make will automatically guess your source files. If the source file is not newer then the object file, make is smart enough not to recompile it. The Java compiler does this by default (when using wildcards).

Make does not care about what commands you put in its goal definitions. That's why, out of the box it doesn't deal with transitive dependencies. To deal with this, most C compilers can generate a dependency listing in make format, which you can include in your makefile and regenerate when the dependency graph changes. In Java, the same thing can be achieved by using Ant's dependset task and some IDEs (like IntelliJ IDEA) can track all your dependencies (including transitives) as you type and recompile all impacted files.

In the end, I had a pretty well factored build system using make, requiring minimum configuration (much like the Java one), allowing for cross-compilation targeting x86 and ARM architectures, using different toolchains and everything. If I compare the Ant/makefile approach with the IDE, I'd say that the build scripts take more time to pay off. If you work on one project and your build is not complex and you don't need repeatable builds (because you work alone and your customer doesn't care), then the IDE might be a better proposition.

Shells, Perls and Pasta

Once again, I started on a new job and it turned out that in my department nobody uses a build tool. Everybody was usually building in their IDEs and copying straight to production or using ad-hock shell scripts or perl to build from the sources directly on the production box (the latter was rationalized as "this way we can fix bugs faster").

In the end, all scripts were simple compile+jar, sometimes even skipping the 'jar' step. They did get the job done and the business was happy. There were a number of things missing, like reproducible builds, reliable roll-back, etc. but it is a matter of tradeoff whether one wants to spend the necessarry time studying and implementing a build system or spend the same time implementing new functionality or fixing application bugs. There's nothing wrong with either way.

Maven

After spending some time working on an application with Ant Build from Hell, I was dreaming of a brave new world, where each application will be layed out in modules and packages with controlled dependencies, each module's build script would be simple and clean and one can focus on the actual application functionality.

Enter Maven (actually Maven2). After being burned by Maven1, I still thought that the ideas were good, and it was the actual implementation that sucked so bad. Maven2 is a new start, and a new chance to reinvent the wheel. The project developers have taken the working concepts from Maven1, pruned the ones that turned out to be a bad idea, and reimplemented everything from scratch. It's still not clear why did they decide to use their own DI container and classloader management (instead of say Spring and OSGI), but it works.

Maven has the chance to hit the sweet spot between a build-scripting tool and an IDE-style pure declarative build. In the core of Maven is the build lifecycle, which is just an abstract sequence of steps. Then, in your POM you define (or inherit) a packaging. The packaging defines a set of default plugins and executions. You can think about the plugins as a bunch of Ant-tasks (or 'goals' in mavenspeak), which are versioned together. The executions define parameters for the actual goal and are bound to a lifecycle phase.

Most of the parameters in a goal are optional, using sensible defaults. The defaults are either sensible constants or references to different parts of the POM. E.g. the compiler:compile goal would get the source directory from the POM reference ${pom.build.sourceDirectory} and use the constant "false" for it's fork parameter. All the POMs (or their parent POMs) in Maven2 inherit from a common "Super-POM". The common POM specifies many defaults (i.e. directory layout), so you don't need to, if you keep to the Maven Conventions. An important part is complying as much as possible to the standard Maven Directory Layout - it makes your life much easier.

There are some areas in Maven that are still rough. The release plugin is still quite limited (although there is work under way to implement features like release staging and binary promotion). There are a couple of annoying bugs in the assembly plugin, which are fixed on the head, but not released for more than an year. Some issues (like the explicit support for aggregator plugins) are being postponed for Maven 2.1 (which will probably ship around Q2-3 of 2008). But overall, I think it is an improvement.

Conclusions

So, I'm planning to use Maven2 for the time being and perhaps write a plugin or two for some tasks which it does not handle well (right now I'm still cheating, using the antrun plugin). I'm still (ab)using Ant for common scripting tasks like restarting a remote server through SSH connection, deleting files on remote machine, setting up a database table or deploying configuration files in remote environment. All these things do not fit in the build lifecycle and wouldn't benefit as much from writing Maven plugins for them. The main benefit in this case is that I get a simple, completely cross-platform scripting language, providing many common commands lacking from the normal Unix environment (btw did I mention that expect sucks?)

And finally, here are some more tools that I'm planning to check out:

scons and rake - build tools using Python amd Ruby respectively, each of them using the underlying platform and some clever code for doing build stuff
buildr - another Ruby tool that builds on rake, allows you to use Ant tasks, designed as a drop-in replacement for Maven2 (hopefully allowing for mixed environment).

Saturday, October 27, 2007

On Maven2

Maven is a tool with an interesting history dating back to 2001. In its first years it got deservedly bad reputation for being unstable, poorly documented and more or less experimental piece of work. The release of version 2.0 in 2005 fixed many of the early quirks and set right many of the short-sighted design decisions. After having some bad experience with Maven 1, I was weary to get on the M2 bandwagon, but when I moved to a new job in 2006 I decided to give it a go. So far there have been ups and downs, but I'm fairly happy with it. I still haven't abandoned all my Ant and shell scripts, but I find that I'm using Maven as a primary building tool for most of my projects.

The core proposition of Maven is that one should be able to declare what they are building in some sort of manifest file and the build tool should be able to figure how to build it. The manifest should contain only the information that is specific for the project and all the build procedures should be implemented as plugins. Each build should be related to exactly one artifact of certain type. The artifacts are stored in repositories (more about this later.)

In Maven parlance, the manifest file is called POM (that stands for Project Object Model). If a project adheres to a predefined filesystem layout, the actual XML one has to write can be very small. Here is a minimal example:

<project schemalocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd ">
  <modelversion>4.0.0</modelversion>
  <groupid>com.acme.foo</groupid>
  <artifactid>foobar</artifactid>
  <version>3.14-alpha</version>
</project>

This configuration is enough to enable Maven to build, package, install, deploy and clean your project. It would even generate the repots site if you run mvn site. That's the way I usually start my projects. Later, you can tack more elements as needed.

The top-level element defines the default schema for the POM. The schema has a version which has to match the modelversion element below. While the schema is not strictly necessary, it makes the POM editing much easier if you are using XML aware editor. It is well constrained and there are few extensibility points, so the autocompletion works very well. It also features annotations for each element, which means that if you use IntelliJ IDEA you can just press Ctrl+Q on any element and get instant documentation. The schema design is a bit annoying, but it is very regular and easy to understand:

No attributes - everything is an element.
No mini-languages - everything is an element and any custom textual notations are avoided as much as possible (though in many places they use URLs). This allows for simple parsing and processing.
If an element can be repeated more than once it is enclosed in container element that can appear only once. This ensures that all elements of one type are textually next to each other. Each container can contain only elements of the same type.

Overall, the POM bears a resemblance to an IDE configuration file and they serve the same functionality. Both Maven and IDEs runs a predefined build process, parameterized by the information in the project file (or POM). One major difference is that Maven is designed to be ran from the command line and also encapsulate all environment-specific factors into the POM and the settings.xml files. You can use Maven to generate project files for IDEA, Eclipse and Netbeans based on the information in the POM.

The machine-specific and user-specific configuration is specified in the settings.xml files. There are two of them, the machine-specific settings are stored under the $M2_HOME/conf directory and apply to all the users on the machine. Usually the contents is standardized within the team (internal repositories, proxy settings, etc.) In our company, this file is posted on the wiki where everybody can download it. Alternatively we could have built our internal Maven distribution with the file pre-included. The second settings.xml file resides under ~/.m2 and contains user-specific settings overriding the machine-specific. One can use the user-specific settings to keep login credentials, private keys, etc. On Unix machine, this file should be readable only by the user.

Though the POM is very flexible and can be tweaked to accommodate a number of different scenarios, it is very recommended to refrain from overriding the defaults and use the Maven conventions as much as possible. This way, new developers on the project can get up to speed faster and (important) it's much less likely that you get bitten by untested plugin 'feature'.

One of the thing that new users tend to dislike most is the standard directory layout. In brief, you have pom.xml in the root, and your files go under a directory called src. By default, all files (artifacts) generated during the build go under a directory called target which helps for an easy cleanup. Note that there is no 'lib' as all the libraries reside in the local repository (more about this in another post.)

So far so good, but then under source we usually have main and possibly test, integration and site directoryes and then under them we have java, resources and only there we put the actual source files. This means that we have at least 3 directory levels used for classification above our sources and if you jump between them using Windows Explorer or bash it makes for a lot of clicking/typing. On the other hand, this is the price one pays for the Maven's magic - each directory level means something to the plugins that build your project. E.g. the unit test goal knows that it should runt he tests under test and not the ones under integration. All the files in the resources directory are copied in the final JAR, while the ones under classes are not and so on and so forth.

This post became rather long, so I'll finish it here. Next week I'm going to cover Maven's dependencies management and repository organization. Again I'll try to talk more about "why's" and less about the "what's" that are already covered pretty well in the following tutorials:

Dimitar's Blog