By: Jeremy Whitlock
November 28th, 2006
Summary: This article will tell
you what Subversion is from the eyes of a Subversion tools developer
and consultant.
Introduction
Depending on who you ask, Subversion can be many things
to
many people. This article will explain, from my eyes, what
Subversion is. As part of doing this, I will step into the
shoes
of a few key users of Subversion to explain their view of Subversion
and how their view may differ from the views of others.
Before we
can get into the details of Subversion, lets learn exactly what
Subversion is from a high-level perspective and then get more detailed
information by
walking in the shoes of our theoretical users.
Subversion At A Glance
Out of the box, and in its simplest form, Subversion is
nothing more than an advanced, open source version control system.
Its sole purpose is to help you track the changes to
directories
of files under version control. This isn't to say that
Subversion
cannot be the cornerstone of your build management, release management
and continuous integration efforts, which we will discuss later, but
out of the box, Subversion just cares about the directories and files
it is supposed to track the changes to.
Subversion History Abridged
Back in 2000, CollabNet
decided to create a replacement for CVS. This decision came
after
running into problems and limitations of CVS not only throughout
development but also in regard to the CVS integration into their
flagship product CollabNet
Enterprise Edition,
which is a collaboration and development platform for distributed
development. CollabNet reached out to Karl Fogel, author of Open Source Development with CVS,
to ask if he would like to be involved. Coincidentally, he
and
Jim Blandy had already started talking about this and they agreed to do
so. Their plan was to create a tool that did not deviate too
much
from CVS's development/usage model but would fix the apparent problems
of CVS. To make a long story short, Subversion was born.
Subversion Features
Now that we know what Subversion is, from a high level, and how it came
about, lets look at a few of its more impressive features to get a
better understanding of what Subversion brings to the table.
Directory Versioning
Directory versioning is the idea of versioning a
directories
structure just as you do the structure/content of a versioned file.
Subversion uses a virtual filesystem to allow for directory
versioning and the end result is that you can track changes to
directory structures just like you can the contents of files.
True Version History
True versioning allows you to copy and rename resources so that the
newly created resource has its own history and is seen as a new object.
Since copying and renaming resources are extremely common,
true
version history is a nice feature allowing you to view each object as
its own entity regardless of whether the new entity was the result of a
copy or rename.
Atomic Commits
Atomic commits are the concept where your commit is
either
entirely committed or it is not. Unlike with non-atomic
commits
where you can have a partial commit, atomic commits basically allow
Subversion to undo any portion of the commit transaction in the event
that a problem arises. This means that any interrupted commit
operations do not cause any corrupt or inconsistent state in the
repository.
Versioned Metadata
Versioned metadata is the ability to apply key-value
tuples to
a versioned object. This metadata is called a property and
properties are versioned just like the objects to which they are
applied.
Choice of Network Layers
Subversion's access layer has been abstracted to allow
for
multiple avenues when accessing a repository. This
abstraction
allows you to develop your own access method or you can use
an
existing method. This flexibility means that you can use what
works instead of being forced to use a particular access model.
Another layer of flexibility is Subversion's use of WebDAV
allowing for repository interaction over http/https which usually poses
no problem when accessing behind a firewall and/or proxy.
Consistent Data Handling
Subversion uses a binary differencing algorithm when
storing
version history that works the same on text files and binary files.
This means that Subversion uses the same process for
versioning
text and binary files, Subversion stores the files/differences the same
on the server regardless of file type and Subversion sends differences
across the wire the same regardless of file type.
Efficient Branching and Tagging
Subversion's approach to branching/tagging that makes
branching and tagging not proportional to the size of the project being
branched/tagged. Subversion uses something similar to a
hard-link
on the server side when the branch/tag is created. This means
that branching/tagging in Subversion takes a very small amount of time
and storage regardless of your project's size.
Hackability
Subversion is its own project built from the ground up
with a
well-defined C API. This means that you can maintain, extend
and
integration Subversion into other projects easily. It is also
worth noting that Subversion has bindings for many languages like Java,
Perl and Python.
Subversion In Detail
The list of features above isn't fully comprehensive so
I
figured it would be a good idea to discuss Subversion in a little more
detail to outline more Subversion functionality and concepts.
Automatable and Scriptable
Subversion's output is both human
readable and parseable This means that those of you wanting
to
automate or script any part of Subversion should have no issues doing
so.
Change Sets
Subversion was built to be efficient over the
wire and on the disk. To put perspective behind this
statement,
Subversion wants to send as little data across the wire and to store as
little information on the disk. Subversion does this via
change sets.
Every time you create a commit, you create a change
set. Each change
set contains the changes required to reproduce that commit.
Since
Subversion doesn't do file-level versioning, change sets are
Subversions way of communicating changes in between revisions.
This is
excellent for being efficient over the wire and on disk because this
allows Subversion to send and store only what is required to reproduce
the commit creating the subsequent revision. In the end, the
costs are proportional to change size and not to file size.
Choice of Client
Since Subversion abstracts the access and interaction
into
well-defined APIs, you have your choice of using the particular
Subversion client that fits your needs or environment. You
can
even mix-and-match which clients you use depending on your interaction
needs.
Choice of Parallel Development Model
Subversion allows you the ability to pick and choose
which
parallel development methodology you want to use and when.
This
means that if you want to use the Lock-Modify-Unlock model for your
binary files, so be it. If you want to use the
Copy-Modify-Merge
model for all non-binary files, that is great. You can even
mix
and match depending on your specific likes and needs.
Internationalization
Subversion was built for global consumption and this
commitment is shown by its internationalized messages.
Global Revisioning
Subversion uses a global revision number as opposed to
using
file-level revision numbers. The concept here is that each
revision contains the state of the repository as it exists for that
particular revision. This allows for many of the necessary
features that Subversion has implemented.
Historical Tracking
Subversion's built-in capabilities are not limited just
to
versioning the files/directories instructed. Subversion also
comes with a a complete toolkit for analyzing the history of the
files/directories under version control. Change reports,
release
management and many other features are at your fingertips thanks to
Subversion's built in historical tracking capabilities.
Subversion In Use
We now know what Subversion is but we still haven't
really
considered Subversion from the eyes of its users. The next
section is to look at Subversion from the eyes of Subversion users.
These users are a product developer, a product manager, a
release
manager, a repository administrator and a network/systems
administrator. We will not write a book on each but the idea
is
to look at Subversion from their eyes and to figure out how Subversion
best accommodates them and how.
The Product Developer
A product developer is solely concerned with Subversion in the context
that it historically tracks the files/directories which the developer
is developing against. Nothing more. They need to
be able
to locate resources, compare differences between revisions of resources
and to be able to work on multiple products/releases/efforts at the
same time. Subversion accommodates in that it facilitates
parallel
development by its design and it simplicity in interaction allows the
developer to worry more about the product than the intricacies of the
version control tool. To a product developer, the following
are
most important:
- Simplicity:
Each
Subversion tool is extremely well document and is designed to allow for
the simplest migration path from another version control tool.
Another reason Subversion is simple for developers is because
there are only a handful of Subversion features that a developer will
need to understand to be able to do day-to-day development.
- Flexibility:
Developers have the ability to use whichever client that best fits
their needs. This means that you can choose whatever client
that
makes you the most efficient. Clients are not the only level
of
flexibility in the eyes of a developer. Subversion users also
can
pick and choose which development methodology they wish when
interacting with a Subversion repository. This allows
development teams to build their own development processes.
- Traceability:
Beyond the typical interaction with the repository during development,
developers also need to be able to do minor historical tracking.
Whether they need to know who added a particular line of code
or
who deleted a file, there is a very good need for being able to get
historical data from Subversion. The good thing is that
Subversion's built in historical capabilities are more than enough for
creating traceability for a development project.
Developers are probably the easiest to please in respect to Subversion.
With Subversion's efficiency over the wire, simple and
document
commands and the historical tracking capabilities, Subversion is an
excellent candidate for a version control system in the eyes of a
developer.
The Product Manager
While the product developer is mainly concerned with the
simplicity of interaction with the repository, a product manager will
probably want to do more historical tracking to be able to properly
manage the team working on the product. The manager will also
be
interested in the ability to work on multiple releases of the product
in parallel. (Think about working on the current release, bug
fix
release and a proof-of-concept release at the same time.) To
a
product manager, the following are the most important:
- Branching:
To be
able to facilitate parallel development, a requirement when working on
multiple releases at the same time, a product manager would be
interested in Subversion's branching capabilities. Branching
is
the cornerstone of allowing parallel development on multiple efforts at
the same time.
- Traceability:
Traceability is where developer and manager needs slightly overlap.
Developers need traceability to be able to understand code
changes and while managers need traceability, they need it for other
reasons. Managers manage developers so when traceability
comes to
mind, I begin to think of code reviews, change reports, defect reports
and release reports. Subversion can accommodate with its full
features historical tracking features.
- Simplicity:
Most
managers want to be able to manage without having to fully understand
the underlying tooling. Subversion abstracts the access layer
so
that managers can use WebDAV clients, like Windows Web Folders, to
simplify Subversion repository interaction. This coupled with
highly documented commands makes a managers job easy when managing a
project using Subversion for the version control system.
Product managers are extremely easy to please when it comes to
Subversion. They want an easy way to interact with the
repository, an easy way to trace releases and developer contributions
and would like to be able to manage multiple releases at the same time.
Subversion makes a manager's job easy and I'm sure the
manager
would agree.
The Release Manager
Think of the release manager as the same as a product
manager
but while a product manager manages the developers making the project,
a release manager manages the releases of the projects.
Release
managers are solely concerned with being able to work on multiple
releases in parallel and being able to trace changes between releases.
Here is how Subversion accommodates release managers:
- Branching:
As with
product managers, release managers need to be able to make sure that
multiple releases are being developed in parallel with
cross-contaminating releases with the needs of other releases.
Since branching is the only real way to facilitate parallel
development in isolation, branching is a hot topic to release managers.
- Tagging:
Release
managers need to be able to archive releases and Subversion allows you
to do this with tags. A tag is basically a human-readable
name
given to a particular revision of a directory tree. Where
tagging
makes life easier for a release manager is that release managers can
locate the tags directory and identify which releases have shipped
without having to memorize or document the underlying revision of the
directory tree to locate a release point. Releases are as
simple
as having a tag with the release name, like "Release 1.0".
- Traceability:
Traceability is something that release managers need to be able to
identify what was added, removed or fixed from one release to another.
Subversion's historical tracking capabilities make this
simple in
that you can create a change log between releases, you can
create
defect reports between releases (With the proper process to facilitate
this.) and you can even create other more detailed reports from one
release to another depending on your business needs.
We are beginning to see that Subversion's historical tracking can be
extremely powerful and useful. Beyond that, release managers
lives are made much easier with a few convenient mechanisms like tagging
thanks to Subversion.
The Repository Administrator
The repository manager has one thing on his/her mind and
that
is repository layout and permissions. Here are the areas
where
the repository manager will be concerned:
- Flexibility:
Subversion does not require or mandate any particular repository
layout. Subversion also allows you to change just about any
aspect of your repository whenever you feel the need to. Want
to
change from a single project repository to a multi-project repository?
Want to use a non-standard repository layout?
Subversion
allows you to make the decisions and even allows you to change your
mind easily with minimal downtime and effort.
- Permissions:
Depending on your server configuration, a Subversion repository
administrator can integrate into many external authentication schemes
for repository access. Once access is granted, the
administrator
can even do file-level access control all via a simple text file.
No difficult configurations or administrative needs to create
a
fully secure Subversion repository.
- Backup/Recovery:
Subversion's backup and recovery tools are very simple to use.
Subversion's scripability makes this process extremely easy
and
easy to produce.
Subversion was built to make things simple in all aspects and
repository administration was one of them. Repository
administrators have the flexibility to choose the best practice for
repository layout for their projects and can even change the repository
configuration at any time thanks to Subversion's design.
The Network/Systems Administrator
Network/Systems administrators are concerned only with
security to the server and the network which the server is attached.
Subversion's access capabilities make their job a lot easier
and
here is how:
- Unobtrusive:
Subversion gives you the flexibility to choose which network layer to
expose your repository. With this flexibility comes the
ability
to expose a repository without having to include network and systems
administrators in most cases. Since you can access are
well-configured Subversion repository via http/https, you can usually
provide access to a Subversion repository from behind a corporate
firewall and/or proxy without having to create access rules to open new
ports and such.
Subversion can usually be installed without really needing to talk to a
network or system administrator thanks to its unobtrusive nature.
This makes things a lot easier for implementing Subversion
into
your corporation securely.
Summary
As you can see, Subversion has a lot to offer to a lot of people.
Out of the box, Subversion is a commercial quality version
control system but Subversion's real value proposition is in the eye of
the beholder. Developers will enjoy Subversion's ease of use and
flexibility. Product managers will appreciate the ability for
Subversion to handle multiple efforts being tracked concurrently.
Release managers will welcome the ease of tracing releases.
Repository managers will welcome the flexibility Subversion gives
you when providing access to your repository.
Regardless of how you use Subversion, there is a lot to be gained by
using Subversion. Subversion was built around being simple,
flexible, and powerful. Subversion provides many innovative
features that gives you the flexibility and power that you will need
out of your version control system.