Migrating from Google Code to Github and keeping revision history

This was a tricky one, and I ran into a few problems trying to move DCX over from GoogleCode to Github.

  • Usernames on Google Code are emails. When you bring them over, your commit messages are littered with potentially private email addresses that your committers don't want to share with the world (well, it's mainly spambots that they don't want to share with).
  • Windows. If you're trying to do this on Windows, forget it. Save yourself the hassle and grab Linuxmint, Ubuntu or whichever flavour you prefer and run it in Virtualbox. It won't cost you anything and takes less than an hour to download and install.

The time spent setting up Linux will be well worth it compared to trying to finish ANY of this on Windows.

Seriously, don't fight Windows. It wasn't made to work there.

Setting up

Find your terminal and make yourself a folder to work with. This is where all your magic will happen.

mkdir code

cd code

We'll need to install a few things. Enter your password when prompted.

sudo apt-get install subversion

sudo apt-get install git

sudo apt-get install git-svn

sudo apt-get install tig

I don't think I need to explain why subversion and git are necessary.

git-svn is the "in-between" software which imports SVN data into a format git can understand. It also sorts arranges the SVN branches into git branches.

The tool tig is VERY useful for checking your changes as they stand after pulling the repository into your computer (and before pushing it onto Github).

Generating a list of usernames

Check out a copy of your GoogleCode repository in a folder called "googlecode" (replacing DCX with whatever your project name is)

svn checkout https://dcx.googlecode.com/svn/ googlecode

cd googlecode

This one is straight from John Albin's bag of tricks and it's a real winner!

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > ../authors.txt

cd ..

It fetches all the usernames from commit messages, sorts, removes duplicates and saves them to a file called "authors.txt".

Open up this file (using vim or a text editor of any sort) and you should see something like this:

twig@blogspot.com = twig@blogspot.com <twig@blogspot.com>

Keep the left column as is, but feel free to change the right column.

twig@blogspot.com = Twig Nguyen <twig@whatever.com>

The name and email on the right will be used in the git history. Once all the changes are finished, save and close.

Pulling the repository onto your computer

Because Google doesn't allow shell access to the SVN repository, you can't simply dump and copy the files out.

What we have to do is use git-svn and pull them out using SVN and save it in git.

git svn clone https://dcx.googlecode.com/svn/ -A authors.txt --stdlayout gitsvn

Depending on how big your repository is, this may take a while.

Once it's done, take a look at how it's set up so far by typing:

cd gitsvn

git status

# On branch master
nothing to commit, working directory clean

The "trunk" is now called "master" branch in git. This was your stable channel. Type "tig" to see what's going on, and if the usernames were migrated correctly. Select the commit (arrow keys) and open the commit details with the Enter button.

If you're happy with how it is, time to push it into github! Make sure that your account has been set up properly by typing:

ssh -T git@github.com


If you run into errors, please follow the information on these two pages to help troubleshoot your woes.


Once it's up and running, you can link github to your local git repository.

git remote add origin git@github.com:youraccount/yourproject.git

git fetch origin

git merge origin/master

git push origin master

To summise what just happened. We added a "remote" location pointing to your github project and called it "origin" as this is your new home. You fetch the current details about it and then merge in the origin's master branch into your master branch, melding the gitsvn repository with the github repository.

As soon as that's done, we push all our existing files from gitsvn into the empty repository at origin (github). This again should take some time.

For most people, this should be enough.

SVN branches aren't done yet!

If you use SVN branches, then they're not in github yet!

Now for the mind bending bit and training your brain a little about Subversion branches and git branches.

SVN branches.... yeah, it can get messy.

To switch to another branch such as "dcxutf":

git checkout dcxutf

tig (optional)

git push origin dcxutf

Switching the branch means that git will delete any files specific to "master" and revive any files from "dcxutf", along with any changes to files mutual to both branches. It's quick and easy.

Use tig again to check if revision history and changes were migrated properly. Then finally pushing the branch upstream into origin (github).

Repeat this process for any branches you wish to keep.

Cleaning up

Make sure that everything has been pushed into github. Browse your project page a little and you can see things appear instantly.

Once you're done, feel free to remove anything from the "code" folder you made in the very first step.


Copyright © Twig's Tech Tips
Theme by BloggerThemes & TopWPThemes Sponsored by iBlogtoBlog