By: Auke
Jilderda, Bob Jenkins
October, 2006
When managing a Subversion server for a number of
related projects, in an Open Source community or an enterprise, one needs to
strike a useful balance between standardising the development environment to
the extent needed for effective collaboration while leaving enough flexibility
to individual teams to work in a variety of ways. Individuals and projects will
request particular features or customisations with some regularity. This
article discusses when to customize, how to customize, and suggests a
recommended approach to such requests.
When to customize
The first and most important question is when to customize and when not to.
Many software engineers and projects have a natural tendency to adapt, or
customize, their tools and environment to their likes, dislikes, and needs.
They often feel strongly about particular customisations and push for them, not
necessarily properly considering or quantifying the costs for themselves, their
project, or the organisation or community at large. As long as a customisation
only impacts an individual, one can arguably assert that it is his 'problem'
and responsibility but when it impacts multiple people, a project, or the
organisation at large, the benefits have to be balanced against the costs.
The obvious costs of a customisation is the amount of effort needed to
create and maintain it. It has to be written, tested, and maintained as the
environment evolves. In addition, a customisation increases the threshol to
accessing data. They change the behaviour of tools and practices and use of the
tool, effectively making it more difficult for an outsider to understand,
participate in, and contribute to a project. Also, each customisation needs to
be supported if users have issues with it. In other words, every customisation
makes it more expensive to operate and maintain the server, increases the
threshold for users to collaborate, and adds to the variety of functionality
that needs to be supported.
Therefore, there either has to be a tangible benefit to the vast majority of
projects on the site or a large benefit to a few projects. Note that the former
is arguably not so much a customisation but more a request for (and early
prototyping of) a feature users want. The latter is arguably a customisation
that is too specific for the majority of users.
How to customize
There is a variety of ways to extend the functionality that Subversion
already provides. The options can be classified into two flavours: Client side
or server side customisations.
Client side - Wrappers
Client side customisations are all solutions in which the server remains
unchanged and the customisation is done on the client side by wrapping either
the command-line client or client side API calls. Client side customisations
keep the burden of customisation on the individual or project requesting it,
forcing them to make a proper, honest cost-benefit analysis whether they really
want or need the customisation. Also, they scale very well in terms of the
number of customisations that can be handled because the number of people
maintaining them scales with the customisations.
An example of a client side customisation is svnmerge.py, a Python
script on top of the standard Subversion command-line client that allows users
to easily merge changes from and to a branch, automatically recording which
change sets have already been merged. It can display an always updated list of
changes yet to be merged and prevents merge mistakes such as merging the same
change twice. The svnmerge.py script is essentially an early prototype of the
merge tracking functionality that is currently being discussed, designed, and
implemented for a future release of Subversion.
Server side - Hook Scripts
Server side customisations are the solutions where the server configuration
is changed. Server side customisations scale in terms of rolling out a
customisation to all projects on the server. They touch the day-to-day
operation of the site and increase the effort and cost of operating the
service. Also, they potentially impact the security, availability, and
performance of the service.
The primary example of server side customisations are hook scripts. A hook,
or hook script, is a program triggered by some repository event, such as the
creation of a new revision or the modification of an unversioned property. Each
hook is handed enough information to tell what that event is, what target(s) it
operates on, and the username of the person who triggered the event. Depending
on the hook's output or return status, the hook program may continue the
action, stop it, or suspend it in some way. The Version Control with Subversion
book describes
this in more detail.
Subversion currently defines nine hooks: The start-commit hook is
invoked before a transaction is created in the process of doing a commit; the
pre-commit hook is invoked after a transaction has been created but
before it is committed; the post-commit hook is invoked after a
transaction is committed. The pre-revprop-change and
post-revprop-change hooks are invoked before respectively after a
revision property is added, modified or deleted. The pre-lock and
post-lock hooks are invoked before respectively after an exclusive
lock on a path is created. The pre-unlock and post-unlock
hooks are invoked before respectively after an exclusive lock is destroyed.
Hooks are typically used for three kinds of functionality:
- First, to log or notify interested parties about an event. For example,
sending an e-mail message per commit, summarising key information about that
commit such as the author, date, commit message, and the change set.
- Second, to check a particular condition. For example, verify whether the
code complies with coding guidelines or whether the user has the appropriate
access rights to the parts of the repository that he wants to commit to.
- Third, to block certain behaviour. For example, allow a user to change a
log message but not the author and date of a revision (to maintain
traceability), prevent locks from being stolen, or allow locking if and only
if the path has the svn:needs-lock property set.
Note that, at present, Subversion does not support a hook performing pre-
or post-processing functionality, such as automatically ensuring the code
complies with coding guidelines, because the server does not have a means
to communicate such changes back to the client. In other words, whatever
a hook does, it shall not modify the transaction itself. Instead, it can
check a condition and accept or reject the action.
Hooks are essentially a way of running arbitrary code on the server in
response to actions by the version control client. Moreover, a hook will run
with the same permissions as the web server in general and, with that, has the
ability to affect other repositories on the same server. This mechanism is very
powerful but has potential implications on the security, availability, and
performance of the server. A hook can easily slow down or bring down your
server or, even worse, corrupt the data in the repository.
Findings & Recommendations
When managing a Subversion server for a number of projects, you need to
strike a useful balance between standardising the environment to enable
effective collaboration and efficient operation while leaving enough
flexibility to projects to work in a variety of ways. Standardisation can bring
a lot of benefits, such as a reduced time to learn the environment when
switching projects and enabling more effective collaboration between teams.
However, a one-size-fits-all is neither feasible nor desirable with today's
heterogeneity (in local culture, departmental culture, processes, and so on) in
individuals and project teams.
From a technical perspective, client and server side customisations differ
in what they can and cannot do. Client side customisations are suitable for
cases where it affects only a single user or when wanting to do automatic
pre-processing (such as code formatting). Server side customisations are
suitable for cases where it should be standardised across the repository and is
either a notification, a check, or blocking certain behaviour.
From a cost-benefit perspective, try to keep customisations that are
specific to only a limited set of users on the client side to put the burden of
customisation on the project, stimulating them to make a proper and honest
cost-benefit analysis as well as preventing it from impacting others. Generic
customisations that are relevant to and requested by a large percentage of the
projects fit better on the server side. Server side customisations typically
have a substantially higher cost, mainly because they potentially impact
performance, availability, and security of the entire server. They need
rigorous testing, both upon creation and with each upgrade of the server.
Especially when deploying hooks, we strongly recommend to use only very commonly
used hooks, both to mitigate the risks (the more a hook is used, the more
it is tested) and to strike a reasonable balance between standardisation and
customisation - hooks are popular if and only if they contribute value to many
people and, with that, are worth the effort. Requests for esoteric
customisations are likely not worthwhile the effort of creating, testing, and
maintaining them.