Blog

Version Control: What it is and Why it’s a Good Idea

Part 1: Overview

Maintaining and keeping current project files is a pain. Over the course of a typical document lifetime—which can be years in many instances—a file can undergo many transformations from one iteration to the next. Clients will have edits or completely new content that they want to insert into a document, whether it’s an InDesign, Word or HTML file, but at the same time keep the same general format. In instances like these, it’s often the practice to duplicate the file, keeping the old file for backup, and give it a new name, possibly with a date or some other way of distinguishing it as the latest and greatest.

This practice, while practical, can get out of hand over time as files multiply and we find ourselves scanning through lists of many files to find the correct one. Even worse, sometimes we’re wrong and choose the incorrect file due to a lack of familiarity with a project or perhaps a misunderstanding of a colleague’s personal file nomenclature.

Enter a revision or version control system (VCS). Version control systems have been around since the 70s, and were first developed primarily for computer programming professionals, because keeping track of code can be a huge challenge, particularly as the code base grows and the development teams become larger. The first systems were designed so that the code would sit on a central server and users would “check out” the file to their local hard drives, like a library book, work on it, then check it back in when they were done. Instead of overwriting the file, the system would mark that file as the latest and automatically keep the old file as a backup. That way, if a bug was introduced, a programmer could go back to the last file and see what went wrong.

As VCS softwares evolved over the years, they became fairly robust and have now split into two different categories: central VCSes and distributed VCSes. A CVCS derives from the original model, where the project resides on a central server and users check out files and then check them back in. From what I’ve been able to glean over the course of my research, these systems can be slow and cumbersome and hard to use. A distributed VCS, on the other hand, relies on each individual user as his or her own manager. Optionally, we can establish a central repository so users can then push the file up to a server if they desire. These systems are faster and much easier to use (although it should be noted that none of these systems use an inherent graphical interface and instead rely on command-line interfaces to work).

Besides its obvious advantages as a cataloging apparatus a VCS also is a reliable backup mechanism. When users are finished with a file, they “commit” the file and it is archived for future use. We no longer need to have multiple files residing within multiple directories. We only need one file, which will be the latest file, with the option to review the catalog for previous versions of the file, all carefully commented (“New content from client”) if we so choose. And with the exception of the necessary extra step of committing a changed file to the VCS, our workflows can remain essentially unchanged.

The Candidates

The two major players in the DVCS arena are Git and Mercurial.

Git was developed by Linus (pronounced linn-us) Torvalds, notable for the development of the vast open-source project that is the Linux kernel. He in fact wrote Git because he needed a version control system that would give him the features and performance needed to maintain over 22,000 files being worked on by hundreds of individuals all over the world. Git is fast, robust and feature rich. Its main advantages besides those just mentioned are its widespread adoption and plentiful references, including a lynda.com course.

Mercurial was developed at roughly the same time as Git by a Canadian with time on his hands, Matt Mackall, who felt that central VCS systems, like Subversion were big and slow and prone to corruption and workflow stoppages if the central server went down or its hard disk went bad. He believed the redundancies of a distributed system were too valuable to ignore and set about writing Mercurial at, coincidentally, just about the same time that Torvalds was writing Git. I found a fantastic tutorial site for this software.

Both of these tools are great products, and are free. But Git’s widespread adoption, available reference materials and access to graphical interfaces make it the clear choice for us.

Next: Part 2: Branching, and why that makes Git better than Time Machine (which is awesome, but doesn’t do branching)