August 9th, 2007

My adventures with Bzr

A few months back, I decided to try out some of the fancy new version control systems. I'm currently using svn, and this is simply not going to change immediately, so the foremost requirement for another system is that it must be able to interoperate with svn. Given this, I found basically two choices: bzr (with bzr-svn plugin) and git (with git-svn plugin).

I first tried bzr, but back then (with bzr 0.16, I believe), it completely and utterly failed to work with my repository, spewing unintelligible tracebacks at me. I gave up and tried git. Git is very fast. Ultra Mega fast. After fixing a minor bug in the git-svn plugin, it managed to convert my repository (about 60k revs on the trunk) in about 9 hours. Pretty damn good. Unfortunately, I soon ran into serious roadblocks: it seems as if it's pretty much impossible to use branches in git, and still be able to commit to svn using git-svn. That seems to me to ruin the whole point of using git. So I left that alone.

Recently, I decided to give bzr another chance. I'm now using bzr trunk (0.19.0dev0).

So I ran a command: bzr branch svn+ssh://hostname/repository/trunk. Nifty that it's just like branching from a bzr repository, no different command or anything. So, good news: this time, I did end up converting my ~60k revision repository without crashing. I ran into a number of problems along the way, however:

There's three steps bzr goes through in the process of branching a svn repository. Each one had a problem:

  1. Initially loading in the metadata from svn.

    This leaks memory like a sieve, and ended up using 9GB by the time it was done. Luckily I have 12GB of memory in my machine.

  2. Analyzing the repository.

    This was going *realllllly* slow (didn't finish in 10 minutes). It turns out this is because bzr stores a sqlite database in ~/.bazaar, which for me is on NFS. Bad news. I symlinked that to a local directory, and it finished in < 30 seconds. The sqlite database was only 350MB, it could've stored the entire thing in RAM. Maybe if bzr used fewer transactions, it wouldn't be so slow? (

  3. Loading revision data. I thought this was going really well. It finished about 30000 revisions in about an hour. However, by the end of the process, it was simply crawling, taking 25 minutes to finish the last 800 revisions. Additionally, this too leaked memory like a sieve. Part way through the process, I noticed that it was using up all of my memory, so I interrupted it and restarted.

    Interrupting a bzr branch operation is an interesting story in its own right. According to the official story, you can't resume. You can if you try hard enough, though, by manually adding a branch to your repository:
    python -c "import bzrlib.bzrdir;'.').create_branch()

    And the continuing the operation via "bzr pull svn+ssh://blahblah". YMMV, this isn't exactly a supported operation. :) (

Anyhow, yay! I've got a bzr branch of my svn repository.

Next up: trying to do something useful with it.