What Is Subversion?

By: Jeremy Whitlock

November 28th, 2006

Summary: This article will tell you what Subversion is from the eyes of a Subversion tools developer and consultant.

Introduction

Depending on who you ask, Subversion can be many things to many people.  This article will explain, from my eyes, what Subversion is.  As part of doing this, I will step into the shoes of a few key users of Subversion to explain their view of Subversion and how their view may differ from the views of others.  Before we can get into the details of Subversion, lets learn exactly what Subversion is from a high-level perspective and then get more detailed information by walking in the shoes of our theoretical users.

Subversion At A Glance

Out of the box, and in its simplest form, Subversion is nothing more than an advanced, open source version control system.  Its sole purpose is to help you track the changes to directories of files under version control.  This isn't to say that Subversion cannot be the cornerstone of your build management, release management and continuous integration efforts, which we will discuss later, but out of the box, Subversion just cares about the directories and files it is supposed to track the changes to.

Subversion History Abridged

Back in 2000, CollabNet decided to create a replacement for CVS.  This decision came after running into problems and limitations of CVS not only throughout development but also in regard to the CVS integration into their flagship product CollabNet Enterprise Edition, which is a collaboration and development platform for distributed development.  CollabNet reached out to Karl Fogel, author of Open Source Development with CVS, to ask if he would like to be involved.  Coincidentally, he and Jim Blandy had already started talking about this and they agreed to do so.  Their plan was to create a tool that did not deviate too much from CVS's development/usage model but would fix the apparent problems of CVS.  To make a long story short, Subversion was born.

Subversion Features

Now that we know what Subversion is, from a high level, and how it came about, lets look at a few of its more impressive features to get a better understanding of what Subversion brings to the table.

Directory Versioning

Directory versioning is the idea of versioning a directories structure just as you do the structure/content of a versioned file.  Subversion uses a virtual filesystem to allow for directory versioning and the end result is that you can track changes to directory structures just like you can the contents of files.

True Version History

True versioning allows you to copy and rename resources so that the newly created resource has its own history and is seen as a new object.  Since copying and renaming resources are extremely common, true version history is a nice feature allowing you to view each object as its own entity regardless of whether the new entity was the result of a copy or rename.

Atomic Commits

Atomic commits are the concept where your commit is either entirely committed or it is not.  Unlike with non-atomic commits where you can have a partial commit, atomic commits basically allow Subversion to undo any portion of the commit transaction in the event that a problem arises.  This means that any interrupted commit operations do not cause any corrupt or inconsistent state in the repository.

Versioned Metadata

Versioned metadata is the ability to apply key-value tuples to a versioned object.  This metadata is called a property and properties are versioned just like the objects to which they are applied.

Choice of Network Layers

Subversion's access layer has been abstracted to allow for multiple avenues when accessing a repository.  This abstraction allows you to develop your own access method or  you can use an existing method.  This flexibility means that you can use what works instead of being forced to use a particular access model.  Another layer of flexibility is Subversion's use of WebDAV allowing for repository interaction over http/https which usually poses no problem when accessing behind a firewall and/or proxy.

Consistent Data Handling

Subversion uses a binary differencing algorithm when storing version history that works the same on text files and binary files.  This means that Subversion uses the same process for versioning text and binary files, Subversion stores the files/differences the same on the server regardless of file type and Subversion sends differences across the wire the same regardless of file type.

Efficient Branching and Tagging

Subversion's approach to branching/tagging that makes branching and tagging not proportional to the size of the project being branched/tagged.  Subversion uses something similar to a hard-link on the server side when the branch/tag is created.  This means that branching/tagging in Subversion takes a very small amount of time and storage regardless of your project's size.

Hackability

Subversion is its own project built from the ground up with a well-defined C API.  This means that you can maintain, extend and integration Subversion into other projects easily.  It is also worth noting that Subversion has bindings for many languages like Java, Perl and Python.

Subversion In Detail

The list of features above isn't fully comprehensive so I figured it would be a good idea to discuss Subversion in a little more detail to outline more Subversion functionality and concepts.

Automatable and Scriptable

Subversion's output is both human readable and parseable This means that those of you wanting to automate or script any part of Subversion should have no issues doing so.

Change Sets

Subversion was built to be efficient over the wire and on the disk.  To put perspective behind this statement, Subversion wants to send as little data across the wire and to store as little information on the disk.  Subversion does this via change sets.  Every time you create a commit,  you create a change set.  Each change set contains the changes required to reproduce that commit.  Since Subversion doesn't do file-level versioning, change sets are Subversions way of communicating changes in between revisions.  This is excellent for being efficient over the wire and on disk because this allows Subversion to send and store only what is required to reproduce the commit creating the subsequent revision.  In the end, the costs are proportional to change size and not to file size.

Choice of Client

Since Subversion abstracts the access and interaction into well-defined APIs, you have your choice of using the particular Subversion client that fits your needs or environment.  You can even mix-and-match which clients you use depending on your interaction needs.

Choice of Parallel Development Model

Subversion allows you the ability to pick and choose which parallel development methodology you want to use and when.  This means that if you want to use the Lock-Modify-Unlock model for your binary files, so be it.  If you want to use the Copy-Modify-Merge model for all non-binary files, that is great.  You can even mix and match depending on your specific likes and needs.

Internationalization

Subversion was built for global consumption and this commitment is shown by its internationalized messages.

Global Revisioning

Subversion uses a global revision number as opposed to using file-level revision numbers.  The concept here is that each revision contains the state of the repository as it exists for that particular revision.  This allows for many of the necessary features that Subversion has implemented.

Historical Tracking

Subversion's built-in capabilities are not limited just to versioning the files/directories instructed.  Subversion also comes with a a complete toolkit for analyzing the history of the files/directories under version control.  Change reports, release management and many other features are at your fingertips thanks to Subversion's built in historical tracking capabilities.

Subversion In Use

We now know what Subversion is but we still haven't really considered Subversion from the eyes of its users.  The next section is to look at Subversion from the eyes of Subversion users.  These users are a product developer, a product manager, a release manager, a repository administrator and a network/systems administrator.  We will not write a book on each but the idea is to look at Subversion from their eyes and to figure out how Subversion best accommodates them and how.

The Product Developer

A product developer is solely concerned with Subversion in the context that it historically tracks the files/directories which the developer is developing against. Nothing more.  They need to be able to locate resources, compare differences between revisions of resources and to be able to work on multiple products/releases/efforts at the same time.  Subversion accommodates in that it facilitates parallel development by its design and it simplicity in interaction allows the developer to worry more about the product than the intricacies of the version control tool.  To a product developer, the following are most important:
  • Simplicity: Each Subversion tool is extremely well document and is designed to allow for the simplest migration path from another version control tool.  Another reason Subversion is simple for developers is because there are only a handful of Subversion features that a developer will need to understand to be able to do day-to-day development.
  • Flexibility: Developers have the ability to use whichever client that best fits their needs.  This means that you can choose whatever client that makes you the most efficient.  Clients are not the only level of flexibility in the eyes of a developer.  Subversion users also can pick and choose which development methodology they wish when interacting with a Subversion repository.  This allows development teams to build their own development processes.
  • Traceability: Beyond the typical interaction with the repository during development, developers also need to be able to do minor historical tracking.  Whether they need to know who added a particular line of code or who deleted a file, there is a very good need for being able to get historical data from Subversion.  The good thing is that Subversion's built in historical capabilities are more than enough for creating traceability for a development project.
Developers are probably the easiest to please in respect to Subversion.  With Subversion's efficiency over the wire, simple and document commands and the historical tracking capabilities, Subversion is an excellent candidate for a version control system in the eyes of a developer.

The Product Manager

While the product developer is mainly concerned with the simplicity of interaction with the repository, a product manager will probably want to do more historical tracking to be able to properly manage the team working on the product.  The manager will also be interested in the ability to work on multiple releases of the product in parallel.  (Think about working on the current release, bug fix release and a proof-of-concept release at the same time.)  To a product manager, the following are the most important:

  • Branching: To be able to facilitate parallel development, a requirement when working on multiple releases at the same time, a product manager would be interested in Subversion's branching capabilities.  Branching is the cornerstone of allowing parallel development on multiple efforts at the same time.
  • Traceability: Traceability is where developer and manager needs slightly overlap.  Developers need traceability to be able to understand code changes and while managers need traceability, they need it for other reasons.  Managers manage developers so when traceability comes to mind, I begin to think of code reviews, change reports, defect reports and release reports.  Subversion can accommodate with its full features historical tracking features.
  • Simplicity: Most managers want to be able to manage without having to fully understand the underlying tooling.  Subversion abstracts the access layer so that managers can use WebDAV clients, like Windows Web Folders, to simplify Subversion repository interaction.  This coupled with highly documented commands makes a managers job easy when managing a project using Subversion for the version control system.
Product managers are extremely easy to please when it comes to Subversion.  They want an easy way to interact with the repository, an easy way to trace releases and developer contributions and would like to be able to manage multiple releases at the same time.  Subversion makes a manager's job easy and I'm sure the manager would agree.

The Release Manager

Think of the release manager as the same as a product manager but while a product manager manages the developers making the project, a release manager manages the releases of the projects.  Release managers are solely concerned with being able to work on multiple releases in parallel and being able to trace changes between releases.  Here is how Subversion accommodates release managers:

  • Branching: As with product managers, release managers need to be able to make sure that multiple releases are being developed in parallel with cross-contaminating releases with the needs of other releases.  Since branching is the only real way to facilitate parallel development in isolation, branching is a hot topic to release managers.
  • Tagging: Release managers need to be able to archive releases and Subversion allows you to do this with tags.  A tag is basically a human-readable name given to a particular revision of a directory tree.  Where tagging makes life easier for a release manager is that release managers can locate the tags directory and identify which releases have shipped without having to memorize or document the underlying revision of the directory tree to locate a release point.  Releases are as simple as having a tag with the release name, like "Release 1.0".
  • Traceability: Traceability is something that release managers need to be able to identify what was added, removed or fixed from one release to another.  Subversion's historical tracking capabilities make this simple in that you can create a change log between releases,  you can create defect reports between releases (With the proper process to facilitate this.) and you can even create other more detailed reports from one release to another depending on your business needs.
We are beginning to see that Subversion's historical tracking can be extremely powerful and useful.  Beyond that, release managers lives are made much easier with a few convenient mechanisms like tagging thanks to Subversion.

The Repository Administrator

The repository manager has one thing on his/her mind and that is repository layout and permissions.  Here are the areas where the repository manager will be concerned:

  • Flexibility: Subversion does not require or mandate any particular repository layout.  Subversion also allows you to change just about any aspect of your repository whenever you feel the need to.  Want to change from a single project repository to a multi-project repository?  Want to use a non-standard repository layout?  Subversion allows you to make the decisions and even allows you to change your mind easily with minimal downtime and effort.
  • Permissions: Depending on your server configuration, a Subversion repository administrator can integrate into many external authentication schemes for repository access.  Once access is granted, the administrator can even do file-level access control all via a simple text file.  No difficult configurations or administrative needs to create a fully secure Subversion repository.
  • Backup/Recovery: Subversion's backup and recovery tools are very simple to use.  Subversion's scripability makes this process extremely easy and easy to produce.
Subversion was built to make things simple in all aspects and repository administration was one of them.  Repository administrators have the flexibility to choose the best practice for repository layout for their projects and can even change the repository configuration at any time thanks to Subversion's design.

The Network/Systems Administrator

Network/Systems administrators are concerned only with security to the server and the network which the server is attached.  Subversion's access capabilities make their job a lot easier and here is how:

  • Unobtrusive: Subversion gives you the flexibility to choose which network layer to expose your repository.  With this flexibility comes the ability to expose a repository without having to include network and systems administrators in most cases.  Since you can access are well-configured Subversion repository via http/https, you can usually provide access to a Subversion repository from behind a corporate firewall and/or proxy without having to create access rules to open new ports and such.
Subversion can usually be installed without really needing to talk to a network or system administrator thanks to its unobtrusive nature.  This makes things a lot easier for implementing Subversion into your corporation securely.

Summary

As you can see, Subversion has a lot to offer to a lot of people.  Out of the box, Subversion is a commercial quality version control system but Subversion's real value proposition is in the eye of the beholder.  Developers will enjoy Subversion's ease of use and flexibility.  Product managers will appreciate the ability for Subversion to handle multiple efforts being tracked concurrently.  Release managers will welcome the ease of tracing releases.  Repository managers will welcome the flexibility Subversion gives you when providing access to your repository.

Regardless of how you use Subversion, there is a lot to be gained by using Subversion.  Subversion was built around being simple, flexible, and powerful.  Subversion provides many innovative features that gives you the flexibility and power that you will need out of your version control system.