Hunch's switch to git

Over the last few weeks at Hunch, we’ve switched our version control system from Subversion to git. I’m not an expert on git by any means, but, since I had more knowledge than most and an interest in making the switch, I ended up with the responsibility of getting us across the divide with as few showstoppers as possible. A number of development teams seem to be doing the switch these days, so I figured I’d share my experience and lessons learned to ease the transition for others.

Existing Workflow

Hunch has a small development team (~15 active committers) with only a few (~20) production servers, so our deployment just involves syncing out the trunk of the repository. Easy peasy. What ends up in production is ultimately determined by the developers, which is fine with a small, trusted team. I imagine that as we have more people and projects, we’ll use a packaging system and automate deployment, but that level of complexity isn’t necessary at this point.

When it came to making the switch, my primary goal was to disrupt this workflow as little as possible. Using git allows you to use any number of arbitrarily complicated workflows, but changing too much at once can be really frustrating to developers. Now that we’re on git, we have the same workflow as we had before, but we have the option to add complexity as the team and project scope grow.

The Switch

Switching to git is best split into two components, development and production. The former involves changing how code is pushed to the repository, and the latter involves changing how that code is sent to production. On both ends, the true hero was git-svn.

To begin, the team was encouraged to learn the basics of git (if they hadn’t already) and to start using git-svn, a git interface to Subversion. It allows developers to make commits to a local git repository and rearrange them as if the entire system were using git, but the local git checkout actually pushes the commits to a Subversion repository. Using git-svn allows developers to get familiar with git independently, without affecting everyone else’s workflow.

Production

While everyone began to wrap their head around git, I worked on creating a centralized git repository. We looked at a number of hosted git options and ended up using github:fi. It’s basically github, a few months behind. It’s installed on a machine in our development network so we don’t have to send all of our code out to github. Getting it set up was challenging since it seems like github just threw its stack (and all associated components) over the wall. Support has been somewhat responsive, but if they’re charging $5,000/year for something, I’d expect it to be more solid than it is. That said, it does eventually work, and a huge advantage of github:fi is that the interface is github’s, which many people have used already. Having a familiar repository browser removes a hurdle for developers in switching to git.

Once github:fi was set up, converting the repository from Subversion to git was quite easy. To begin, I created an authors file to map Subversion usernames to their github:fi accounts. Then, I just checked out the Subversion repository with git-svn using this authors-file, added a new remote (our github:fi repository), and pushed the master branch to this new remote. I kept the repository read-only for everyone except myself, and all I did was to push updates to the Subversion repository to our git repository to keep them in sync.

Once we had a git repository, I worked on converting our deployment scripts. We’ve got a few scripts in production that pull from the repository, and I just changed them to read from git rather than Subversion. Since everything on that end is read-only, it was pretty straightforward. One major gotcha was that git doesn’t track empty folders. In Subversion, you can create an empty folder and add it to the repository, and wherever the repository is checked out, the empty folder(s) will be created. In git, you can’t check in an empty directory, so subsequent repository clones won’t create empty directories. There a couple of ways around this, but the first time we made a deployment using the new scripts, part of the site exploded for a few minutes. Lesson learned!

After the deployment scripts were good to go, I converted our other ancillary repository features. We have a little script that polls for commits and reads them to the entire office, which, thanks to the github API, was a piece of cake to convert. Setting up commit emails was more difficult. For whatever reason, the post-receive emails in github:fi just don’t work. Perhaps that’ll be fixed in a later version, but they’re definitely broken in our setup. We get emails from other parts of the site, and other post-receive hooks work, but for whatever reason, post-receive emails don’t. As a workaround, I wrote a service to poll for new commits (similar to the commit-reader). Not a big deal, but I wish the email hooks were implemented.

Development

With the production side of things flipped over to use git, all that remained was the developer side. We decided that a particular Monday at noon would be “G-Day”, where we’d shut down the subversion repository and open up the git repository. Doing a clean switch meant that we didn’t have to merge changes that were committed to Subversion but not git (and vice versa). Setting a date and time meant that people knew to check in whatever they were working on or create patches to commit to the new repository. No surprises.

The switch went something like this:

Shut down the Subversion repository (no reads/writes allowed)
Do one final synchronization from Subversion to the new git repository using git-svn.
Give the development team read/write access to the new repository.
Have everyone blow away their old Subversion checkout and replace it with a git clone.

Since we did it at noon on a Monday, everyone was in the office, and we had all day to clean up any unexpected surprises. Amazingly, though, we had no snafus at all. As we eased into the new version control system, we all had questions, but for the most part, everything went very smoothly! It’s been a week, and for the most part, everything just works. Nothing unexpected has failed miserably (yet), and the development workflow has been largely unaffected. And now, we’re using a much more powerful version control system that’ll allow us to use more intricate workflows as we grow with minimal friction. Yahtzee!

We also have a few non-developers who occasionally commit code to the repository, and their workflows have remained largely unaffected, as well. For simple code changes, all that’s required is learning to use the git commands (commit, push, pull) instead of their Subversion equivalents (commit, update). For the more graphically-inclined, GitX is a fine replacement for Versions as far as we use it, but you’re best off with Brotherbard’s GitX fork, which allows the user to push/pull through the GUI.

Keepin’ It Clean

Regarding workflow, we had to make a couple of decisions as to how to handle our repository. Mimicking Subversion, we created just a single centralized repository that everyone reads from and writes to. But since we’re pushing around groups of commits and merging repository states, not individual commits, we need to do a little more work to keep our repository history clean. Github allows you to fork repositories and push changesets. In many collaborative environments, users fork a central repository, push to their own repository in github, and then make a “pull request” to merge their changes into the main repository. While this is powerful, it’s an added step to the workflow, so we decided to put off using forking and pull requests until we have enough projects and need gatekeepers to keep the repositories in good order.

Keeping our centralized repository clean has been a bit of a challenge and has definitely been the biggest pain point in switching to git. By default, git uses “merge” to coalesce changes to a repository. Long story short, this creates nasty “merge commits” in the repository because it merges remote changes into your existing local changes. Instead, we all use rebase, which pulls remote changes before applying your patches, which keeps the repository nice and clean. It can be a pain to keep everyone’s commits in good order, a problem we didn’t have with Subversion.

Conclusion

Overall, Hunch’s switch to git has been excellent. I’m actually rather surprised as to how seamless the transition has been, though much credit goes to the development team for doing their homework and not being helpless when we finally pulled the trigger. I encourage everyone still on Subversion to make the move to git sooner rather than later. The more a team grows, the more of a pain switching becomes.