ROMS/TOMS Developers

Algorithms Update Web Log

kate - May 20, 2009 @ 18:00
Introduction to Git- Comments (1)

This is a little something I cooked up for the ARSC HPC newsletter which I thought might be of interest to the adventuresome folks out in ROMS land:

A colleague and I used to argue the merits of RCS vs. SCCS for
maintaining source code. We then migrated to CVS then SVN and currently
have an SVN site at Rutgers. Each transition was a step forward for the
better. But what about the future? Does SVN have any shortcomings that
have been bugging you?

I’ll tell you about one that’s been bugging me. My colleague and I
both have write permission on our SVN server, him on the trunk, me
on a branch. However, most of the people downloading our code see it
as a read-only site. This is great for software (such as svn) in which
the average user is not expected to change anything or contribute
patches. However, we are dealing with a complex ocean model in which
the average user will at least be changing a few files for setting cpp
choices. The above average user might be changing quite a few files,
adding new capabilities, etc. If these people want to save their own
revision history, they can set up a private svn repository, but any
given sandbox directory can only have one parent repository – either
their own or the Rutgers one. If they aren’t saving their own
changes, they are subject to all kinds of unsafe surprises whenever
they do “svn update” from us. Even I keep two directories going, one
pointing to the trunk and the other pointing to my branch.

So how to solve this problem? Linus Torvalds got so fed-up with
his options for developing the Linux kernel that he wrote a new
distributed version control system, called Git. It is available from, including a documentation page pointing to all
sorts of available resources. There’s even a YouTube video of Linus
himself ranting about these issues. The various tutorials listed are
perhaps more useful, plus I am enjoying “Pragmatic Version Control Using
Git” by Travis Swicegood. For the O’Reilly fans out there, a new bat
book is in the works, called “Version Control with Git” by Jon Loeliger.

How does Git solve your problems, you ask? Both CVS and SVN have
centralized repositories from which one or many people can check out
the code. Git has a new-to-me concept of distributed repositories,
something I’m sure I still don’t entirely comprehend. The repository
itself is a binary database, so the files are compressed. Each time
you do a checkout (git clone) from someone, you obtain a copy of the
entire repository, giving you the entire history right there at your
fingertips. Git was designed to be fast since the Linux kernel is quite
large. The Linux kernel also has many, many people working on it in a
cooperative manner; Git was designed to help them rather than hinder
them. SVN advertises that it makes branching easy – Git advertises that
it makes both branching and merging easy.

For those of us with one foot still in the past, Git provides
interface tools for both SVN and CVS repositories. In other words,
my local Git sandbox can be pointed to an SVN server and do uploads
and downloads using the SVN protocol. It can then be pointed to a
different SVN server – from the same sandbox! Magic!

How can two people use Git to work on the same project? Each person
would have their own Git sandbox with the entire repository in it.
If person A makes a change, person B could do a “git fetch” or “git pull”
to get the changes. Person B then adjusts those changes and makes some new
ones. Person A then does a “git pull” from B’s repository. These could
be simply files on the same machine in which they both have accounts,
or there are ways to set up public read-only servers via http or the
git protocol. Changes that don’t work out can be kept in a branch,
but never “pushed”, or they can be made to disappear completely as
long as you haven’t yet shared them with anyone.

Finally, git frees us from the “revision number” concept. It instead
signs each view of the repository with a cryptographically sound
“SHA1” string, encoding the time and owner, etc. of that repository.
An example SHA1 is 152aee0b44a104b07d40f4401e5f1ea1ea2fe1b0,
guaranteed to be unique among all git repositories.

I’d like to thank Brian Powell for showing me the way of the Git.