Dimitar's Blog

Sunday, February 23, 2014

Is HTML 5 over-hyped?

For over 10 years I have been watching the efforts of many smart people to “fix” HTML and come up with a new version of this most successful information encoding format. Over this time my attitude has been evolving through:
  • Naïve enthusiasm – HTML4 sucks and we are going to fix it!
  • Feeling daunted by the scope of the undertaking.
  • Frustration by the lack of progress on XHTML 2.0 – it’s all just talk, talk, talk and ever-increasing complexity.
  • Questioning the assumptions, benefits and motivations for the change – is HTML 4 really broken, or just mildly annoying? Is it more annoying than many other things we live with just fine? Yes, WHATWG largely captured the frustration, but are they suggesting anything better?
  • Indifference. I could use HTML4, or 5 or 55 – it is the information and presentation that matter, not how you encode it.

Over the last month I have been implementing a proof of concept solution using modern web-application stack, and I had to catch up with the current state of HTML, CSS and JavaScript. Let’s say that of all three, I find HTML5 to stand out in a weird way – there is a lot of noise in the Web, yet there is nothing so special about it…

In fact, it looks like HTML 5 and its cute logos have turned into an umbrella term for a set of initiatives, of which the actual evolution of the HTML markup language is one of the least interesting aspects.

So, what does HTML 5 bring over the “legacy” HTML 4 / XHTML 1.1 – let's see:

A bunch of new browser APIs and formal specification for many existing ones. This is the most significant achievement of the WHATWG/W3C work, though I am not very clear why is it billed as part of HTML 5. Even if I use a HTML 4, I can still take advantage of these APIs on the JavaScript side.

Standardization and specification of layout rules and processing models. Get rid of the SGML grammar. Again a very significant improvement, and again not related to the changes in HTML The Language.

Improved semantics with new elements like
, and Too little, too late. The world has moved on and the preferred way of expressing semantics is with classes. Clashes also work well with the current crop of JavaScript tools and CSS. Adding the new semantic elements is
codifying small part of all possible semantics in somewhat arbitrary ways (i.e. we’ve got “nav”, but what if we need to semantically distinguish between primary and secondary navigation? How is
anything more than a thin layer of syntax sugar over
or
?)


I've heard people talk how semantic markup is important for "serious publications". The fact is that HTML was never a first choice for serious publications, documents are typically mastered in specialized formats like LaTex, DocBook, FrameMaker, DITA , which are compiled to whatever version of HTML and the new semantic elements only force them to change the mappings in their compile scripts.

The bottom line is that while I will gladly make use of the new elements and I do find them nice, the impact that they will have on my output cannot be compared with the impact of CSS3, D3.js or underscore.js (which get much less news exposure).

Native multimedia support with tags like Again, a lot of noise about marginal improvements. We’ve been able to put media on the web since early 2000s – yes, it was relying on non-standard plugins, but that obviously didn’t stop YouTube from becoming multi-billion business. Nice, needed, but not really innovative or game-changing.

Embedding of SVG and MathML. That is nice – one problem I see that both SVG and MathML are monsters of specifications (good that they didn’t throw XForms into the mix) and we would never get to the point where they would be fully supported by all major browsers. On the positive side, the situation is better than during the Browser Wars, as the specs are already published, and we are not trying to standardize as we go. On a more pedantic point, while the embedding functionality is a feature of HTML, it is hardly innovation - most of the actual work is done by the long time existing SVG and MathML specs, so to say it's a feature of HTML is a bit disingenuous.

So, to summarize – HTML5 brings us a lot of standardizing and specifying of what was already there (making them more robust and simpler, but not really game-changing). It is a major milestone for browser implementers, but for content developers it only brings minor improvements and a whole set of arbitrary changes. If we compare this with the improvements in CSS and the Cambrian explosion in the world of JavaScript in fact HTML5 looks fairly dull and pedestrian.

Monday, December 9, 2013

Gamification

Humans love games – complex or simple, realistic or abstract, lonesome, competitive or cooperative – games are what we do when we are not forced to do other stuff, like you know… work?

Working is what we do in exchange for something we want, while games we enjoy for their own sake. The easiest way to turn a game into work is to introduce a high-stakes reward – something we desire more than the pure joy of playing. By shifting the motivation, we introduce performance stress which has far-reaching implications for the satisfaction of performing and the overall quality of outcome. In team games, high-stakes rewards complicate matters even further, as the reward needs to be distributed. This gives rise to politics and dissatisfaction and ultimately turns an egalitarian meritocracy into competitive hierarchy.

A logical approach would be to try it the other way around – remove the reward component from work and try to get people to do it for free, or as Carl Marx eloquently proclaimed:

In a higher phase of communist society, after the enslaving subordination of the individual to the division of labor, and therewith also the antithesis between mental and physical labor, has vanished; after labor has become not only a means of life but life's prime want; after the productive forces have also increased with the all-around development of the individual, and all the springs of co-operative wealth flow more abundantly—only then can the narrow horizon of bourgeois right be crossed in its entirety and society inscribe on its banners: From each according to his ability, to each according to his needs!


– Carl Marx, “Critique of the Gotha Program”

We all know what followed, and it was hardly a success. So, can we do better? Of course we can – here is another source of inspiration:
“Hello, old chap, you got to work, hey?”

Tom wheeled suddenly and said:

“Why, it’s you, Ben! I warn’t noticing.”

“Say – I’m going in a-swimming, I am. Don’t you wish you could? But of course you’d ruther work – wouldn’t you? Course you would!”

Tom contemplated the boy a bit, and said:

“What do you call work?”

“Why, ain’t that work?”

Tom resumed his whitewashing, and answered carelessly:

“Well, maybe it is, and maybe it ain’t. All I know, is, it suits Tom Sawyer.”

“Oh come, now, you don’t mean to let on that you like it?”

The brush continued to move.

“Like it? Well, I don’t see why I oughtn’t to like it. Does a boy get a chance to whitewash a fence every day?”

That put the thing in a new light. Ben stopped nibbling his apple. Tom swept his brush daintily back and forth – stepped back to note the effect – added a touch here and there – criticised the effect again – Ben watching every move and getting more and more interested, more and more absorbed. Presently he said:

“Say, Tom, let me whitewash a little.”

Now, that’s definitely better – at the end of this story – on top of having the fence painted by other boys, Tom gains an apple, a kite, a dead rat (along with a string to swing it with), twelve marbles, couple of tadpoles, six firecrackers and etcetera and etcetera. In other words, the dream of every gamification consultant.

So, I hear you say, all we need to do is spin the propaganda machine, pump the image of work and induce sense of purpose? Well, that has also been tried. The problem is that it is not sustainable and lacking the competitive dynamics and reward mechanism to distinguish the good from bad workers, the quality of the work is hit or miss.

As any Bulgarian railroad worker will tell you, the money spent on repairing all the sections built by volunteers in the 50s far exceed the money it would have taken to build them with paid labor.

The current trend is to use the latest neuropsychology research and try to structure work activity in ways that evoke gaming responses, without it being actually presented or recognized as game. This in fact appears to work… in a few showcase examples.

The problem is that in order to get the best results, the manipulation should be subtle, honest and carefully targeted, while most of the time what actually gets implemented ends up crude, shallow and generic.

Consider the following gamification clichés:



  • Let’s change the meaning of a bunch of words and do code-speak, expressing trivial thoughts through exotic metaphors.
  • Let’s count some stuff (like badges and points) and applaud the one that gets highest number.
  • Let’s do stuff at random and periodically check whether any of them has solved our problem.

While all of these may tickle our creativity, we can hardly call them an efficient way of solving a problem. Much too often, our brain finds a way to shortcut the uninteresting work and skip straight to the entertainment (i.e. gaming the metrics).

Here is some anecdotal account of our attempt to improve software quality by gamifying our continuous integration:

Few years ago, on a project far, far away, we used the Jenkins’s CI game – you know – the one where one gets points for not breaking the build, removing TODO items and adding tests (and loses points for the opposite).

While it was fun for a while, we also realized that the best way to get high score was to avoid doing things that may trigger a penalty – things like: taking initiative in cleaning up technical debt, improving the build-script, improving the testing framework code, adding new TODO comments to mark problems we can’t fix now, or removing useless tests.

Of course, all these things needed doing and being the responsible team we were, we did them, but then the very people that put the most effort in making the system better - they got the lowest score. After a while, we just stopped caring about the game.

There is another related trend worth mentioning – mixing work with entertainment, so the work becomes less boring (i.e. improving engagement). It is not a new trend – everybody does it to extent, but clearly there is a point when we tend to go overboard. In particular, I find that the propensity for seeking "fun" mirrors the Maslow hierarchy of needs – the higher the living standard, the more concerned people are with making their experience at work “fun”.

In corporate environment this is often manifested as managers, consultants and fat teams, devising process innovations which don’t bring empirical benefits, but comply with all the latest buzzwords and industry best practices, while the lowest level of support are barely scraping to hold the world from falling apart.

Many methodologies and management fads explicitly seek to appeal to the playful nature of our human brain and define complex models and processes, exaggerating the actual benefits of esoteric (but never too difficult) activities, maximizing personal satisfaction, but not necessarily the desired outcomes. Some even go a step further and institute metrics measuring the “fun stuff”, claiming that this is somehow a proxy metric for productivity.

So, where does that leave us? Here is a list of unsubstantiated assertions you may want to consider:


  • Gamification can help with engagement and commitment, but in order to get sustainable results, it needs well defined objectives and overall strategy. Also it needs to be supplemented with adequate, tangible rewards that actually matter to the participants (unless you are running a cult).
  • Setting a “fun” goal can do wonders to create commitment, but you either have to update the goal periodically, or tie it to something meaningful on a personal level in order to retain the effect in the long term.
  • Watching numbers go up is one of the easiest ways for creating engagement, but beware of unintended effects over time, as people rationalize the situation and start to game the metrics.
  • Gamification should be a part of a toolkit – its strength is in kick-starting behavioral change, but used on its own, it is unlikely to deliver tangible and long-lasting results.

...for some gamification means presenting work as game, for others - pretending game is work.

Wednesday, September 14, 2011

Common scenarios for Java application packaging

It all started from an overgrown answer to a question on StackOverflow. I realized that I haven't seen an overview of best practices and patterns for packaging of Java applications. There are the platform specific packaging formats, but they have more to deal with installation, rather than the layout of the packaged application. Below are a few common scenarios I've come across over the last few years.

Note: I've written another, somewhat related article on SO, about common strategies for managing configuration files in Java

A single uberjar with MAIN-CLASS specified in the manifest is convenient distribution format for simple tool applications that are intended to be run on the command line. You can copy it around and plop it somewhere on the path or in your home directory.  If you set the PATHEXT and file associations on Windows, you can run them as if they were native executables. One thing to watch out for is when you bundle libraries with custom entries in the manifest (you might need to manually merge them), when multiple jar's contain files with the same paths but different content (i.e. Spring namespace handlers, ServiceLoader implementations, LDAP SPI providers, etc.)

If for some reason you are bundling a library and all dependencies as a single jar (bad idea), you better take care to relocate the packages (for example, see how the JDK has relocated Xerces and BCEL under com.sun.org...) Not doing so can cause subtle problems to your clients which are extremely annoying, to say the least (*caugh*Weblogic*caugh*). On the other hand, relocating classes causes problem with smart usages of reflection relying on string concatenation. Also, you will most likely need to filter all XML and properties files, replacing the packages with the relocated names. All files under META-INF/services will need to be renamed too. All in all, bundling dependencies in a library jar causes far more problems than it solves.

When your application has more than a few options, you may find you want a config file. One approach I like is to package a template config in my app and look in a predefined location for override (i.e. in ~/.myapp) - you may provide instructions how to extract the template config and place it under the users home or you can have the app do it automatically on first usage. Which one you choose depends on your estimate how often an user will need to tweak the configs (we don't want to pollute the user home unnecessarily). A single directory app is usually installed in your user home directory.

If your application has dependencies on external services or special environment, you will need a startup script (shell or batch file) - in this case, the simplest deployment layout is single directory with uberjar and startup script. As you now have a directory, it is a good idea to add a README file and you may as well put the configs there. There are a number of variations on this theme. Usually, you want to add this directory to the app classpath as well, so any file there may be looked up as resource - that provides convenient way to override things like Spring descriptors, patch single classes, etc. These are usually installed in your home, /usr/local, or sometimes under opt.

If your application bundles many dependencies, has complex configuration, startup scripts for multiple platforms etc. then you will go with full blown deployment structure along these lines (used for big application deployed on multiple platforms):
  • /bin - startup scripts, rc3.d scripts, environment setup files, registry entries, etc.
  • /config - configuration files for app, should be possible to relocate to /etc by specifying env variable.
  • /logs - logs produced by app, safe to delete, config should allow us to move it to /var
  • /cache - temporary data produced by app, safe to delete, config should allow us to move it to /var
  • /data - data produced by application, config should allow us to move it to /etc
  • /lib - main application jar with all dependencies 
  • /lib/win86 - platform specific JNI libs for all supported platforms, selected by the start script, passed as -Djava.library.path=...
  • /lib/linux86
  • /lib/linux64
  • /lib/...
The main application JAR should have a Main-Class attribute starting the app and Class-Path attribute specifying the ordering of the dependencies (all located in the same dir). The Class-Path should start with the ../config path that will be used to load configurations and patch classes. This allows to keep the startup scripts simple.

The last layout allows you to run the application from your home, work all from /opt, or install in read-only /opt and put the configs and writable dirs under /etc and /var like a decent Unix application. The last option also allows you to run a few instances with separate configs and data from the same binaries.

Friday, October 22, 2010

Groovy Classpath Scanner

I just wanted a simple classpath scanner in Groovy - no library, no extra jars, no callback interfaces. I couldn't find any, so I wrote one. I'm posting it here and hopefully it would be useful to somebody. It is licensed under the established and permissive MIT License (if it precludes you from using it, let me know).

import java.util.zip.ZipFile
 
/**
 * <pre><code>
 * def cps = new GroovyClasspathScanner(packagePrefix: 'com.company.application')
 * cps.scanClasspathRoots(classLoader1) // optional
 * cps.scanClasspathRoots(classLoader2) // optional
 * ...
 * List<Class> classes = cps.scanClasses { Class it ->
 *    Event.isAssignableFrom(it)   ||
 *    Command.isAssignableFrom(it) ||
 *    it.isAnnotationPresent(MessageDescriptor)
 * }
 * </code></pre>
 */
class GroovyClasspathScanner {
  String packagePrefix = ''
  List<File> classpathRoots
 
  @SuppressWarnings("GroovyAssignabilityCheck")
  List<File> scanClasspathRoots(ClassLoader classLoader) {
    if (!classLoader) classLoader = getClass().classLoader
 
    def prefixPath = packagePrefix.replace((char) '.', (char) '/') + '/'
 
    def List<URL> urls = []
    for (URLClassLoader cl = classLoader; cl; cl = cl.parent) {
      urls.addAll cl.URLs
    }
 
    return urls
      .each { assert it.protocol == 'file' }
      .collect { new File(it.path) }
      .each { File it -> if (it.isFile()) assert it.name =~ /.*\.(?:jar|zip)$/ }
      .findAll { File it ->
        (it.isDirectory() && new File(it, prefixPath).exists()) ||
        (it.isFile() && new ZipFile(it).entries().find { it.name == prefixPath})
      }
  }
 
  List<String> scanClassNames() {
    if (!classpathRoots) classpathRoots = scanClasspathRoots()
 
    def classNames = []
    def collect = { it, String pathProp ->
      def normalizedPath = it[pathProp].replaceAll('[\\\\/]', '.')
      def packageRegex = packagePrefix.replace('.', '\\.')
      def classRegex = "\\.($packageRegex\\..+)\\.class\$"
 
      def match = normalizedPath =~ classRegex
      if (match) classNames << match[0][1]
    }
 
    classpathRoots.each {
      if (it.isDirectory()) {
        it.eachFileRecurse             { collect it, 'canonicalPath' }
      } else {
        new ZipFile(it).entries().each { collect it, 'name' }
      }
    }
 
    return classNames
  }
 
  List<Class> scanClasses(Closure predicate = { true } ) {
    return scanClassNames()
            .collect { try { Class.forName it } catch(Throwable e) { println "$it -> $e" } }
            .findAll { it }
            .findAll { predicate(it) }
  }
}

Saturday, September 18, 2010

Concurrent development

Background Story

That certain project was in its 4th year and after a few successful deployments in the US/LATAM regions, now the company was trying to push it to EU and APAC. The original developers were all in New York and over the last few months have been frantically working to adapt the product to EU's requirements (Asia was supposed to follow.) The scope was defined as 'whatever it takes', the methodology was a mix between 'trial and error' and bullying the users to contort the requirements to fit the delivered functionality. At the time, the EU team had realized that they would better spend their time staying on top of the changes and making sure the end product meets a minimum standard, so they were not really doing much development.

Time went on, requirements grew, scope shrinked. The project slipped past two deadlines and finally the Asia managers decided they need to take things into their hands and hire a local development team to avoid the communication gap that plagued the EU rollout and regain some control over the schedule. It was first time to have more than one people touching the code the core team was structured in a way that each component has an owner and the owner could do whatever they want. If you have a problem or need a change - ask the owner. The problem was not only that we were in different geographical location, in inconvenient timezone, but we were working on the same code, implementing requirements specified by separate BA teams, chasing schedules devised by separate project-management teams, and it all eventually converged in a common program-steering committee. I could go on, but suffice to say it was quite a mess - the bottom line is that moving from centralized sequential to distributed concurrent development models impose huge burden and the best advice one can give you would be "don't do it!".

Probably the biggest issue was that many people in the core team, just refused to change their way of working in order to accommodate our existence. Every second morning the trunk would not compile, often changes were checked in that prevent servers from starting, our changes were overwritten routinely because somebody's local modifications conflicted and they were unwilling to merge - you name it, we have it. The management layer was protecting them as "due to the years of accumulated experience, the productivity of the core team was much higher that ours, and the productivity hit they would suffer by addressing our petty complaints can not be justified in business terms". Luckily, there were some sensible guys and gradually we got to improve this, still I consider it one of the biggest organizational faults that for a long time the management efforts were focused on suppressing our complaints, rather than backing our suggestions on fixing the environment.

As the first QA delivery was approaching and the trunk was not giving any signs of getting more stable, we tried to think what can we do to stabilize the codebase. Some people said we should branch, others were weary of the cost of merging. The EU team had branched few months ago and all EU implementation were done on the branch and eventually (read 'sometimes') merged to the trunk. When the product was released in EU, they ended up with the problem how do they merge to the trunk. From what I hear it had been a terrible experience, including a lot of functionality rewrites, introduced bugs and regressions.

Knowing the EU problems and knowing that on one hand the trunk was still changing rapidly, on the other hand our requirements were dependent on code that was supposed to be delivered by US, we decided to branch, but keep developing on the trunk. All merges would be in direction trunk-to-branch and this would save us from the dreaded criss-cross merge conflicts. Since most of our problems to that date were with work-in-progress checkins, which we eventually wanted, we decided that we can treat the branch as a stable release line and trunk as unstable bleeding-edge code.

Unstable trunk + Release branch

I was tacitly elected as 'merge-master' and quickly I found myself following the same routine:

  1. Every morning I would pull a list of all the unmerged commits and review them in a text editor. Then would move each commit into one of these categories:
    • WANTED - changes that are required or prerequisites for implementing our business functionality. These should be always merged.
    • BLOCKED - changes that we DO NOT want. These should be always marked as merged (no actual merging, just mark, so they will not appear in the list next time).
    • IRRELEVANT - changes that won't hurt us, but we don't strictly need them. We were merging these in the initial stages as keeping the branch close to trunk makes merging easier, as we got closer to the release, we flipped the policy to improve the stability.
  2. When I merge or mark as merged the WANTED/IRRELEVANT/BLOCKED groups, I would put the category as a first word int he commit message. This made it easier to pick out the changes that were done directly in the branch (which should be kept to minimum and if necesarry ported manually to trunk). I didn't bother separating the individual changes, since the branch was not meant as a merge-source - this was saving me some time. Overall it was taking between 1 and 3 hours a day.
  3. There would be a number of changes that didn't make it to any of the categories. For these I was contacting the comitter and following up. Often it was work in progress, sometimes after clarification they would be categorized on the next day. Usually I would post this communication as a tagged blog-entry in our wiki. There was a page displaying all the entries tagged in this way.

I found out that sorting the changes first by user and then by date simplifies the review significantly. Turned out that TextPad macros can be a very powerful tool for things like this.

The release branch worked well for some time, until a major feature for the next release was implemented on the trunk. We blocked it and ever since then, every commit that touched this dreaded component had to be hand-merged. Often, merging an one-line change resulted in tens of conflicts, so we resorted to rolling back the file in question and manually porting the change. The worst thing is that we tested the trunk extensively, but the change in our release-branch received only cursory examination until it reached QA.

Furthermore, once we reached the second phase of the Asia roll-out, our team split and started to work in parallel on three staged releases, which were supposed to deliver unrelated functionality within 2 months of each other startin 6 months from the date. This meant that we need better mechanism for dealing with divergent codebase and big changes in progress.

Exchange-trunk + Development & Release branches per stream

After taking a step back, we came up with a new branching scheme that satisfied all our requirements. For each pending project phase we would create two parallel branches - development and release (we would call the combination of two branches a 'stream'). In addition, we devised the following policies and procedures:

  1. Developers always commit their code changes in the dev-branches.
  2. Any code committed to the dev-branch MUST compile. If the branch is broken, people should not commit further unrelated changes until the CI says it's fixed.
  3. Each commit in the dev-branch should contain work for a single feature. If there is certain code pertaining to two features, we pick one of them as primary and mark the other one as dependent in the issue-tracking system. Then, all the shared code goes to the primary feature and we know that we can not release the dependent on its own. It is not necesarry that the whole feature is committed in one go or that the dev-branch committed code actually works.
  4. When we need some code from a different stream, we would wait until they publish it to trunk and only then we would merge from trunk to the dev-branch. Cross-stream merges are prohibited. We were calling this 'picking-up' the feature. Pick-up changesets should be marked in the commit message.
  5. Each time we pick-up a feature, after we do the minimum conflict resolution, so the code works, we would commit the changeset immediately (that's the pickup changeset). This way, any additional enhancements, fixes. etc. will be committed in separate changeset, so it will be easier to merge them back to trunk later.
  6. Once a feature is complete and dev-tested on the dev branch, all related changesets for that feature are merged as one consolidated changeset in the release branch. We call this 'feature-promotion'. This practice makes creating release notes relatively easy and allows us to do cool things such as rolling back the whole feature with one command.
  7. When we promote a feature that has been picked from trunk, we immediately mark-as-merged this rel-branch commit into trunk to prevent double-merge conflicts. We would look if we have made any fixes on our branch and consolidate them into a single enhancement/bugfix changeset that will be merged directly from dev-branch to trunk (as in the the rel-branch we consolidate the pickup and enhancement changesets).
  8. If QA finds that the feature did not work, we would add further bugfix changesets to the rel-branch, but we would strive to keep them to minimum.
  9. When a release has passed QA, we would merge each feature-level commit that originated from this stream from the release branch to trunk ('publishing'). There it will be ready for picking up by the other streams (which will merge it in their dev-branch, promote it to release, etc).
  10. For each release we would tag the release branch, since it was already stabilized. Bugfix releases were just further tags on that same branch. For urgent production changes, we would create a bugfix branch from the tag (happened only a few times).

Overall it worked well for us. Few months after we addopted this scheme I moved to another company, but I really hope the process is still useful and being improved. An interesting thing is that every time I explain this, the first reactions are along the lines of "does it have to be that complicated?" And while I can agree that complicated it is, I am still to find a simpler streategy that could work on this scale. Any ideas?

Friday, January 15, 2010

Setting up PuTTY to use public keys for authentication

I've looked on the internet for a quick step-by-step guide how to get PuTTY to use public key authentication with OpenSSH daemon and it took me some time to figure. I'm posting these instructions in case anybody else has the same needs.

Prerequisites

Make sure that your OpenSSH configuration (usually /etc/ssh/sshd_config) contains the following line:

PubkeyAuthentication yes

In my case (CentOS 5.4) it was disabled by default.

Also, you would need the full PuTTY suite which can be downloaded form here (get putty.zip).

Generating the key

This is a way to generate the key with Putty. Alternatively you can generate it with OpenSSH's ssh-keygen tool and convert it to PuTTY format.

  1. Start PUTTYGEN.EXE
  2. In the parameters box at the bottom of the window, choose type of key 'SSH-2 RSA', set the bit size to 2048.
  3. Click the Generate button, move the mouse over the blank area until the progres bar fills up.
  4. Enter your notes in the comment line (this is displayed to you when you use the key, you can change later).
  5. Enter key-phrase, make it long and complex, write it down in a secure place or print it and hide it somewhere in your freezer.
  6. Save the private key (*.ppk) in a reasonably secure filesystem location. Even if somebody gets access to your private key, they will still need your passphrase to use it.
  7. Copy the text from the text box under the 'Public key for pasting into OpenSSH authorized_keys file:' and paste it on one line in a new file called authorized_keys (we'll use that later). The file should contain a single line terminated by Unix-style new-line and there shall not be an empty line after it.
  8. Close PUTTYGEN.EXE

Associating the key with your Unix account

  1. Login to your unix account
  2. Create a .ssh directory under your home if it does not exist
  3. Copy the authorized_keys file there
  4. Do chmod 700 ~/.ssh ; chmod 600 ~/.ssh/authorized_keys

This needs to be done for each machine you are connecting to. In this case it helps if your homw is NFS mounted.

Using the key directly

  1. Start PuTTY
  2. Specify user@host in the 'Session > Host Name' field.
  3. Specify the path to your private key file in the 'Connection > SSH > Auth > Private key file' box.
  4. Click the 'Open' button at the bottom of the PuTTY settings dialog.
  5. When prompted, enter your private-key pass-phrase and you will be logged in without entering your Unix password

Setting up Pageant to cache the decrypted private key

Let's look what we have done. The good thing is that our password does not travel over the wire and is not susceptible to man-in-the-middle attacks. The bad thing is that we used to enter the short and easy password of our Unix account, while now we have to enter the long and difficult pass-phrase of our key every time we establish a new Unix connection. In order to avoid this, we can use PuTTY Pageant which is SSH authentication agent (Unix equivalent is ssh-agent)

  1. Start PAGEANT.EXE
  2. Click on the computer-with-hat icon in your system tray.
  3. Choose the Add Key option and pick your private key (*.ppk)
  4. Enter your pass-phrase
  5. Close the pageant dialog

From now on, when establishing SSH session Putty will try to use the decrypted key from Pageant first and then fall back to password auth if none of the keys match.

You can create a shortcut, starting Pageant and passing the paths to your keys as arguments. This will start Pageant and load the keys in one step, but you will still need to specify the pass-phrase every time you do this (typically after system restart).

Keep in mind that Pageant holds the private key in memory unencrypted. If anybody captures a heap dump of the process, they can get access to your private key without knowing the pass-phrase. That's why, you might want to stop the Pageant if you are not using it for a long time or if you shae the machine in multi-user environment.

If using Pageant, you might also check the Putty option 'Connection > SSH > Auth agent forwarding', which will allow you to use your key from the remote machine on which you are logged on.

How fast an SSD drive do you need

If you need Intel X25-E 32G SSD for 70% of the cheapest listed price (shipping extra), please let me know.

For a long time I thaught that the bottleneck of all builds was the HDD, so when I got my new notebook, the first thing I did was to add a spiffy extra Intel X25-E SSD hard drive to it. As I expected, the builds went much faster. To my surprise, the HDD throughput stayed fairly low during the build, which suggested that the benefit of SSD drives kicks in early and buying higher grade drives doesn't make much difference as the bottleneck moves to the CPU quite quickly. All this makes sense, considering that a typical application has hundreds to thousands of files and the HDD spends a lot of its time seeking rather than reading.

When I cleared the IntelliJ IDEA caches and opened the IDEA Community project (total 900MB, 66k files), during the indexing, one of the cores stayed pegged, the HDD read throughput did not exceed 4mb/s for 4 mins then for 1 minute it rose to top 25mb/s, avg I guess around 15mb/s. The write never exceeded 10mb/s, and for most part it was bellow 4mb/s, the last minute was between 5 and 7mb/s.

During initial compilation, the 2 cores of the CPU (2.53GHz T9400) were quite busy, staying above 80% all the time, the disk utilization during compilation stayed less than 4mb/sec with the ocasional peaks at 6mb/sec. The write peaks were 5mb/sec, for the most time bellow 1mb/sec.

At the end of the compilation, the index update took 50 seconds, with average read throughput ~10mb/s, peaking at 30.5mb/s, the write peaked at 7mb/sec. During that time the CPU utilization dropped around 50%, which only shows that IDEA's indexing is not using both CPUs.

The bottomline is - it's not worth byuing expensive SSDs for consumer usage - cheaper ones are just as good for the average home and software development workflow. Most of the applications do not involve transferring huge volumes of data and the slowness of the spinning-platter HDDs is mostly because of seek times and fragmentation. Expensive SDDs are warranted if you are working extensively with media files or are processing huge amounts of data on the disk.

About Me: check my blogger profile for details.

About You: you've been tracked by Google Analytics and Google Feed Burner and Statcounter. If you feel this violates your privacy, feel free to disable your JavaScript for this domain.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 Unported License.