Saturday, September 18, 2010

Concurrent development

Background Story

That certain project was in its 4th year and after a few successful deployments in the US/LATAM regions, now the company was trying to push it to EU and APAC. The original developers were all in New York and over the last few months have been frantically working to adapt the product to EU's requirements (Asia was supposed to follow.) The scope was defined as 'whatever it takes', the methodology was a mix between 'trial and error' and bullying the users to contort the requirements to fit the delivered functionality. At the time, the EU team had realized that they would better spend their time staying on top of the changes and making sure the end product meets a minimum standard, so they were not really doing much development.

Time went on, requirements grew, scope shrinked. The project slipped past two deadlines and finally the Asia managers decided they need to take things into their hands and hire a local development team to avoid the communication gap that plagued the EU rollout and regain some control over the schedule. It was first time to have more than one people touching the code the core team was structured in a way that each component has an owner and the owner could do whatever they want. If you have a problem or need a change - ask the owner. The problem was not only that we were in different geographical location, in inconvenient timezone, but we were working on the same code, implementing requirements specified by separate BA teams, chasing schedules devised by separate project-management teams, and it all eventually converged in a common program-steering committee. I could go on, but suffice to say it was quite a mess - the bottom line is that moving from centralized sequential to distributed concurrent development models impose huge burden and the best advice one can give you would be "don't do it!".

Probably the biggest issue was that many people in the core team, just refused to change their way of working in order to accommodate our existence. Every second morning the trunk would not compile, often changes were checked in that prevent servers from starting, our changes were overwritten routinely because somebody's local modifications conflicted and they were unwilling to merge - you name it, we have it. The management layer was protecting them as "due to the years of accumulated experience, the productivity of the core team was much higher that ours, and the productivity hit they would suffer by addressing our petty complaints can not be justified in business terms". Luckily, there were some sensible guys and gradually we got to improve this, still I consider it one of the biggest organizational faults that for a long time the management efforts were focused on suppressing our complaints, rather than backing our suggestions on fixing the environment.

As the first QA delivery was approaching and the trunk was not giving any signs of getting more stable, we tried to think what can we do to stabilize the codebase. Some people said we should branch, others were weary of the cost of merging. The EU team had branched few months ago and all EU implementation were done on the branch and eventually (read 'sometimes') merged to the trunk. When the product was released in EU, they ended up with the problem how do they merge to the trunk. From what I hear it had been a terrible experience, including a lot of functionality rewrites, introduced bugs and regressions.

Knowing the EU problems and knowing that on one hand the trunk was still changing rapidly, on the other hand our requirements were dependent on code that was supposed to be delivered by US, we decided to branch, but keep developing on the trunk. All merges would be in direction trunk-to-branch and this would save us from the dreaded criss-cross merge conflicts. Since most of our problems to that date were with work-in-progress checkins, which we eventually wanted, we decided that we can treat the branch as a stable release line and trunk as unstable bleeding-edge code.

Unstable trunk + Release branch

I was tacitly elected as 'merge-master' and quickly I found myself following the same routine:

  1. Every morning I would pull a list of all the unmerged commits and review them in a text editor. Then would move each commit into one of these categories:
    • WANTED - changes that are required or prerequisites for implementing our business functionality. These should be always merged.
    • BLOCKED - changes that we DO NOT want. These should be always marked as merged (no actual merging, just mark, so they will not appear in the list next time).
    • IRRELEVANT - changes that won't hurt us, but we don't strictly need them. We were merging these in the initial stages as keeping the branch close to trunk makes merging easier, as we got closer to the release, we flipped the policy to improve the stability.
  2. When I merge or mark as merged the WANTED/IRRELEVANT/BLOCKED groups, I would put the category as a first word int he commit message. This made it easier to pick out the changes that were done directly in the branch (which should be kept to minimum and if necesarry ported manually to trunk). I didn't bother separating the individual changes, since the branch was not meant as a merge-source - this was saving me some time. Overall it was taking between 1 and 3 hours a day.
  3. There would be a number of changes that didn't make it to any of the categories. For these I was contacting the comitter and following up. Often it was work in progress, sometimes after clarification they would be categorized on the next day. Usually I would post this communication as a tagged blog-entry in our wiki. There was a page displaying all the entries tagged in this way.

I found out that sorting the changes first by user and then by date simplifies the review significantly. Turned out that TextPad macros can be a very powerful tool for things like this.

The release branch worked well for some time, until a major feature for the next release was implemented on the trunk. We blocked it and ever since then, every commit that touched this dreaded component had to be hand-merged. Often, merging an one-line change resulted in tens of conflicts, so we resorted to rolling back the file in question and manually porting the change. The worst thing is that we tested the trunk extensively, but the change in our release-branch received only cursory examination until it reached QA.

Furthermore, once we reached the second phase of the Asia roll-out, our team split and started to work in parallel on three staged releases, which were supposed to deliver unrelated functionality within 2 months of each other startin 6 months from the date. This meant that we need better mechanism for dealing with divergent codebase and big changes in progress.

Exchange-trunk + Development & Release branches per stream

After taking a step back, we came up with a new branching scheme that satisfied all our requirements. For each pending project phase we would create two parallel branches - development and release (we would call the combination of two branches a 'stream'). In addition, we devised the following policies and procedures:

  1. Developers always commit their code changes in the dev-branches.
  2. Any code committed to the dev-branch MUST compile. If the branch is broken, people should not commit further unrelated changes until the CI says it's fixed.
  3. Each commit in the dev-branch should contain work for a single feature. If there is certain code pertaining to two features, we pick one of them as primary and mark the other one as dependent in the issue-tracking system. Then, all the shared code goes to the primary feature and we know that we can not release the dependent on its own. It is not necesarry that the whole feature is committed in one go or that the dev-branch committed code actually works.
  4. When we need some code from a different stream, we would wait until they publish it to trunk and only then we would merge from trunk to the dev-branch. Cross-stream merges are prohibited. We were calling this 'picking-up' the feature. Pick-up changesets should be marked in the commit message.
  5. Each time we pick-up a feature, after we do the minimum conflict resolution, so the code works, we would commit the changeset immediately (that's the pickup changeset). This way, any additional enhancements, fixes. etc. will be committed in separate changeset, so it will be easier to merge them back to trunk later.
  6. Once a feature is complete and dev-tested on the dev branch, all related changesets for that feature are merged as one consolidated changeset in the release branch. We call this 'feature-promotion'. This practice makes creating release notes relatively easy and allows us to do cool things such as rolling back the whole feature with one command.
  7. When we promote a feature that has been picked from trunk, we immediately mark-as-merged this rel-branch commit into trunk to prevent double-merge conflicts. We would look if we have made any fixes on our branch and consolidate them into a single enhancement/bugfix changeset that will be merged directly from dev-branch to trunk (as in the the rel-branch we consolidate the pickup and enhancement changesets).
  8. If QA finds that the feature did not work, we would add further bugfix changesets to the rel-branch, but we would strive to keep them to minimum.
  9. When a release has passed QA, we would merge each feature-level commit that originated from this stream from the release branch to trunk ('publishing'). There it will be ready for picking up by the other streams (which will merge it in their dev-branch, promote it to release, etc).
  10. For each release we would tag the release branch, since it was already stabilized. Bugfix releases were just further tags on that same branch. For urgent production changes, we would create a bugfix branch from the tag (happened only a few times).

Overall it worked well for us. Few months after we addopted this scheme I moved to another company, but I really hope the process is still useful and being improved. An interesting thing is that every time I explain this, the first reactions are along the lines of "does it have to be that complicated?" And while I can agree that complicated it is, I am still to find a simpler streategy that could work on this scale. Any ideas?

2 comments:

sicklittlemonkey said...

Finally found some time to read this ... I think you chose the best strategy - move to another company. ;-)

Seriously though, I'm surprised this worked well for as long as it did. I suppose if features are relatively constrained in the code they touch it might work ok in the long run, but core system-wide changes, though hopefully infrequent, must inevitably impose a hell of a lot of work.

Much better than no solution though, of course! I had a look at Mercurial a little while back. TortoiseHg isn't as nice as TSVN, but the concept is good.

Cheers,
Nick.

Dimitar said...

Hi Nick, glad you liked it :-) Yes, even with all these strategies and tactics, development was seriously constrained and trivial things were taking longer than they should have. Major platform changes are just avoided as it is easier to write a totally new component than refactor something on which so many people depend.

BTW, weren't you coming to Tokyo these days?

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.