Login | Join Now

Enhancing a Subversion Server

By: Auke Jilderda, Bob Jenkins

October, 2006

When managing a Subversion server for a number of related projects, in an Open Source community or an enterprise, one needs to strike a useful balance between standardising the development environment to the extent needed for effective collaboration while leaving enough flexibility to individual teams to work in a variety of ways. Individuals and projects will request particular features or customisations with some regularity. This article discusses when to customize, how to customize, and suggests a recommended approach to such requests.

When to customize

The first and most important question is when to customize and when not to. Many software engineers and projects have a natural tendency to adapt, or customize, their tools and environment to their likes, dislikes, and needs. They often feel strongly about particular customisations and push for them, not necessarily properly considering or quantifying the costs for themselves, their project, or the organisation or community at large. As long as a customisation only impacts an individual, one can arguably assert that it is his 'problem' and responsibility but when it impacts multiple people, a project, or the organisation at large, the benefits have to be balanced against the costs.

The obvious costs of a customisation is the amount of effort needed to create and maintain it. It has to be written, tested, and maintained as the environment evolves. In addition, a customisation increases the threshol to accessing data. They change the behaviour of tools and practices and use of the tool, effectively making it more difficult for an outsider to understand, participate in, and contribute to a project. Also, each customisation needs to be supported if users have issues with it. In other words, every customisation makes it more expensive to operate and maintain the server, increases the threshold for users to collaborate, and adds to the variety of functionality that needs to be supported.

Therefore, there either has to be a tangible benefit to the vast majority of projects on the site or a large benefit to a few projects. Note that the former is arguably not so much a customisation but more a request for (and early prototyping of) a feature users want. The latter is arguably a customisation that is too specific for the majority of users.

How to customize

There is a variety of ways to extend the functionality that Subversion already provides. The options can be classified into two flavours: Client side or server side customisations.

Client side - Wrappers

Client side customisations are all solutions in which the server remains unchanged and the customisation is done on the client side by wrapping either the command-line client or client side API calls. Client side customisations keep the burden of customisation on the individual or project requesting it, forcing them to make a proper, honest cost-benefit analysis whether they really want or need the customisation. Also, they scale very well in terms of the number of customisations that can be handled because the number of people maintaining them scales with the customisations.

An example of a client side customisation is svnmerge.py, a Python script on top of the standard Subversion command-line client that allows users to easily merge changes from and to a branch, automatically recording which change sets have already been merged. It can display an always updated list of changes yet to be merged and prevents merge mistakes such as merging the same change twice. The svnmerge.py script is essentially an early prototype of the merge tracking functionality that is currently being discussed, designed, and implemented for a future release of Subversion.

Server side - Hook Scripts

Server side customisations are the solutions where the server configuration is changed. Server side customisations scale in terms of rolling out a customisation to all projects on the server. They touch the day-to-day operation of the site and increase the effort and cost of operating the service. Also, they potentially impact the security, availability, and performance of the service.

The primary example of server side customisations are hook scripts. A hook, or hook script, is a program triggered by some repository event, such as the creation of a new revision or the modification of an unversioned property. Each hook is handed enough information to tell what that event is, what target(s) it operates on, and the username of the person who triggered the event. Depending on the hook's output or return status, the hook program may continue the action, stop it, or suspend it in some way. The Version Control with Subversion book describes this in more detail.

Subversion currently defines nine hooks: The start-commit hook is invoked before a transaction is created in the process of doing a commit; the pre-commit hook is invoked after a transaction has been created but before it is committed; the post-commit hook is invoked after a transaction is committed. The pre-revprop-change and post-revprop-change hooks are invoked before respectively after a revision property is added, modified or deleted. The pre-lock and post-lock hooks are invoked before respectively after an exclusive lock on a path is created. The pre-unlock and post-unlock hooks are invoked before respectively after an exclusive lock is destroyed.

Hooks are typically used for three kinds of functionality:

  • First, to log or notify interested parties about an event. For example, sending an e-mail message per commit, summarising key information about that commit such as the author, date, commit message, and the change set.
  • Second, to check a particular condition. For example, verify whether the code complies with coding guidelines or whether the user has the appropriate access rights to the parts of the repository that he wants to commit to.
  • Third, to block certain behaviour. For example, allow a user to change a log message but not the author and date of a revision (to maintain traceability), prevent locks from being stolen, or allow locking if and only if the path has the svn:needs-lock property set.

Note that, at present, Subversion does not support a hook performing pre- or post-processing functionality, such as automatically ensuring the code complies with coding guidelines, because the server does not have a means to communicate such changes back to the client. In other words, whatever a hook does, it shall not modify the transaction itself. Instead, it can check a condition and accept or reject the action.

Hooks are essentially a way of running arbitrary code on the server in response to actions by the version control client. Moreover, a hook will run with the same permissions as the web server in general and, with that, has the ability to affect other repositories on the same server. This mechanism is very powerful but has potential implications on the security, availability, and performance of the server. A hook can easily slow down or bring down your server or, even worse, corrupt the data in the repository.

Findings & Recommendations

When managing a Subversion server for a number of projects, you need to strike a useful balance between standardising the environment to enable effective collaboration and efficient operation while leaving enough flexibility to projects to work in a variety of ways. Standardisation can bring a lot of benefits, such as a reduced time to learn the environment when switching projects and enabling more effective collaboration between teams. However, a one-size-fits-all is neither feasible nor desirable with today's heterogeneity (in local culture, departmental culture, processes, and so on) in individuals and project teams.

From a technical perspective, client and server side customisations differ in what they can and cannot do. Client side customisations are suitable for cases where it affects only a single user or when wanting to do automatic pre-processing (such as code formatting). Server side customisations are suitable for cases where it should be standardised across the repository and is either a notification, a check, or blocking certain behaviour.

From a cost-benefit perspective, try to keep customisations that are specific to only a limited set of users on the client side to put the burden of customisation on the project, stimulating them to make a proper and honest cost-benefit analysis as well as preventing it from impacting others. Generic customisations that are relevant to and requested by a large percentage of the projects fit better on the server side. Server side customisations typically have a substantially higher cost, mainly because they potentially impact performance, availability, and security of the entire server. They need rigorous testing, both upon creation and with each upgrade of the server.

Especially when deploying hooks, we strongly recommend to use only very commonly used hooks, both to mitigate the risks (the more a hook is used, the more it is tested) and to strike a reasonable balance between standardisation and customisation - hooks are popular if and only if they contribute value to many people and, with that, are worth the effort. Requests for esoteric customisations are likely not worthwhile the effort of creating, testing, and maintaining them.