An introduction to version control

Based on a Wikipedia article

Last edited on October 11th, 2006 by Garrett Rooney and Guido Haarmans

Summary: a short overview of Version Control, based on a Wikipedia article but edited for Subversion users.

Version Control (also known as Revision Control) is the management of multiple versions of the same unit of information. It is most commonly used in engineering and software development to manage ongoing evolution of digital documents like source code, blueprints or electronic models and other critical information that may be worked on by a team of people. Changes to these documents are identified by incrementing an associated number or letter code, termed the "version number", "version level", or simply "version" and associated historically with the person making the change. A simple form of version control, for example, has the initial issue of a drawing assigned the version number "1". When the first change is made, the version number is incremented to "2" and so on.

Software tools for version control are increasingly recognized as being necessary for most software development projects.

Overview

Engineering version control developed from formalized processes based on tracking versions of early blueprints. Implicit in this control was the option to be able to return to any earlier state of the design, for cases in which an engineering dead-end was reached in iterating any particular engineering design. Likewise, in computer software engineering, version control is any practice which tracks and provides controls over changes to source code. Software developers sometimes use version control software to maintain documentation and configuration files as well as source code. In theory, version control can be applied to any type of information record. However, in practice, the more sophisticated techniques and tools for version control have rarely been used outside software development circles (though they could actually be of benefit in many other areas).

As software is developed and deployed, it is extremely common for multiple versions of the same software to be deployed in different sites, and for the software's developers to be working privately on updates. Bugs and other issues with software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the program evolves). Therefore, for the purposes of locating and fixing bugs, it is vitally important for the debugger to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (for instance, where one version has bugs fixed, but no new features, while the other version is where new features are worked on).

At the simplest level, developers can simply retain multiple copies of the different versions of the program, and number them appropriately. This simple approach has been used on many large software projects. Whilst this method can work, it is inefficient (as many near-identical copies of the program will be kept around), requires a lot of self-discipline on the part of developers, and often leads to mistakes. Consequently, systems to automate some or all of the version control process have been developed.

Traditionally, version control systems have used a centralized model, where all the version control functions are performed on a shared server. A few years ago, systems began using a model where developers work directly with their own local working copies and check in code only when needed. There are two mechanisms that ensure that developers do not overwrite each others work when checking in code.

The Lock-Modify-Unlock Solution

In most software development projects, multiple developers work on the program at the same time. If two developers try to change the same file at the same time, without some method of managing access the developers may well end up overwriting each other's work. Most version control systems solve this in one of two ways.

Many version control systems use a lock-modify-unlock model to address the problem of many authors clobbering each other's work. In this model, the repository allows only one person to change a file at a time. This exclusivity policy is managed using locks. Harry must "lock" a file before he can begin making changes to it. If Harry has locked a file, then Sally cannot also lock it, and therefore cannot make any changes to that file. All she can do is read the file, and wait for Harry to finish his changes and release his lock. After Harry unlocks the file, Sally can take her turn by locking and editing the file.

The Lock-Modify-Unlock Solution

In most software development projects, multiple developers work on the program at the same time. If two developers try to change the same file at the same time, without some method of managing access the developers may well end up overwriting each other's work. Most version control systems solve this in one of two ways.

Many version control systems use a lock-modify-unlock model to address the problem of many authors clobbering each other's work. In this model, the repository allows only one person to change a file at a time. This exclusivity policy is managed using locks. Harry must "lock" a file before he can begin making changes to it. If Harry has locked a file, then Sally cannot also lock it, and therefore cannot make any changes to that file. All she can do is read the file, and wait for Harry to finish his changes and release his lock. After Harry unlocks the file, Sally can take her turn by locking and editing the file.

The Copy-Modify-Merge Solution

Subversion and other version control systems additionally can use a copy-modify-merge model as an alternative to locking. In this model, each user's client contacts the project repository and creates a personal working copy—a local reflection of the repository's files and directories. Users then work in parallel, modifying their private copies. Finally, the private copies are merged together into a new, final version. The version control system often assists with the merging, but ultimately a human being is responsible for making it happen correctly.

Here's an example. Say that Harry and Sally each create working copies of the same project, copied from the repository. They work concurrently, and make changes to the same file A within their copies. Sally saves her changes to the repository first. When Harry attempts to save his changes later, the repository informs him that his file A is out-of-date. In other words, that file A in the repository has somehow changed since he last copied it. So Harry asks his client to merge any new changes from the repository into his working copy of file A. Chances are that Sally's changes don't overlap with his own; so once he has both sets of changes integrated, he saves his working copy back to the repository.

But what if Sally's changes do overlap with Harry's changes? What then? This situation is called a conflict, and it's usually not much of a problem. When Harry asks his client to merge the latest repository changes into his working copy, his copy of file A is somehow flagged as being in a state of conflict: he'll be able to see both sets of conflicting changes, and manually choose between them. The copy-modify-merge model may sound a bit chaotic, but in practice, it runs extremely smoothly. Users can work in parallel, never waiting for one another. When they work on the same files, it turns out that most of their concurrent changes don't overlap at all; conflicts are infrequent. And the amount of time it takes to resolve conflicts is far less than the time lost by a locking system.

Reviewers

Some systems attempt to manage who is allowed to make changes to different aspects of the program, for instance, allowing changes to a file to be checked by a designated reviewer before being added.

Delta Compression

Most version control software use delta compression, which retains only the differences between successive versions of files. This allows more efficient storage of many different versions of files. Subversion has this capability.

Integration with other tools

Some of the more advanced version control tools offer many other facilities, allowing deeper integration with other tools and software engineering processes. Plugins are often available for IDEs such as Eclipse, the NetBeans IDE and Vistual Studio. Version Control Systems are also often at the heart of Application Lifecycle Management Solutions such as CollabNet Enterprise Edition.

Vocabulary

Atomic Commit: A collection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.
Baseline: An approved version of a document or source file from which subsequent changes can be made.

Change: A change (or diff, or delta) represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.

Change List: On many version control systems with atomic multi-change commits, a changelist (or change set) identifies the set of changes made in a single commit. This can also represent a sequential view on the source code, allowing source to be examined as of any particular changelist ID.

Check-Out: A check-out (or checkout or co) creates a local working copy from the repository. Either a version is specified, or the latest is used.

Commit: A commit occurs when a copy of the changes made to the working copy is made to the repository.

Conflict: A conflict occurs when two changes are made by different parties to the same document or place within a document. Since the software may not be intelligent enough to decide which change is 'correct', a user is required to resolve the conflict.

Directory Versioning: abilty of modern version control system to not only version individual files but also track changes to whole directory trees over time. Files and directories are versioned.

Export: An export is similar to a check-out except that it creates a clean directory tree without the version control metadata used in a working copy. Often used prior to publishing the contents.

Import: An import is the action of copying a local directory tree (not a working copy) into the repository.

Merge / Integration: A merge or integration brings together (merges) concurrent changes into a unified version.

Resolve: The act of user intervention to address a conflict between different changes to the same document.

Repository: The repository is where the file data is stored, often on a server.

Version: A version or version is one version in a chain of changes.

Versioned metadata: ability to add arbitrary key/value pairs to files and directories, including the tracking of versions to these values over time.

Update: An update (or sync) copies the changes that were made to the repository (e.g. by other people) into the local working directory.

Working copy: The working copy is the local copy of files from a repository, at a specific time or version. All work done to the files in a repository is done on a working copy, hence the name.

Most content from this article was derived from the Wikipedia article "Version Control", licensed under the GNU Free Documentation License. Additional content was derived from "Version Control with Subversion", licensed under the Creative Commons Attribution License.