Table Of Content

Questions like “where are we?” and “where are we going?” only exist coherently within the context of “where have we come from?”. In the same way, understanding the history of a software project can help us to understand its trajectory and structure. 

Source code management (SCM) analysis can show patterns that can indicate issues in an engineering function, and can help in understanding how a system came to be.

Mining Owsy’s SCM History

I joined Owsy in Q3 2020, initially as a contractor. Before my official start date, I wanted to get as much context about the products as possible.

In addition to reading docs and tickets, running the source code and looking at the UI, I ran a mixture of off-the-shelf and hand crafted tools to perform some broad analysis of the history of the main projects that comprise the BindHQ product:

  1. Hand-crafted scripts
  2. Github’s analytics
  3. Hercules and Labours https://github.com/src-d/hercules 
    1. I used this to generate an “overwrite matrix” to see which users were overwriting each other’s code
  4. Githammer https://github.com/asharov/git-hammer 
  5. Git of Theseus https://github.com/erikbern/git-of-theseus 

It’s fascinating to see authors contributing to projects over time and compare it with the test authorship:

Screen Shot 2021-03-23 at 4.41.25 PM

Screen Shot 2021-03-23 at 4.41.39 PM

We could see here, for example, that one person contributed a vast amount of tests over time.

One of the most interesting plots was the overwrite plot which shows who overwrites whose code:

Screen Shot 2021-03-23 at 4.42.00 PM

Another fun metric is the percentage of lines still present in code after n years, which is a feature that Git of Theseus offers:

Screen Shot 2021-03-23 at 4.42.19 PM

It can be very interesting to compare these charts for different components of the application!

Seeing code removed from our accounting package was interesting:

Screen Shot 2021-03-23 at 4.42.28 PMWas Mining Owsy’s SCM History Useful?

In isolation, it was interesting but there weren’t any earth-shattering insights up front, excepting that some components seemed to have a single maintainer (which is a business risk that we have subsequently addressed).

The deeper value was as a conversation starter; engineers were enthusiastic about talking about how the projects came to be, and it was really helpful to have conversations like “what led to that huge amount of code from the accounting package in mid-2019?“. As an outsider coming in, this helped me to establish when features were added or removed, and under what circumstances.

What was the output?

Having used the data from the history as a jumping off point for conversations, I delivered a development timeline of one of Owsy’s products. That was, in my view, a fairly valuable document because it contextualises a lot about that product that I’d found initially confusing.

Wrapping up

A rich and sophisticated understanding of the world benefits from philosophical, factual, scientific theological, literary, physical, and historical insight. Likewise, one component of understanding a software project is getting to grips with its history, for which there are some useful tools that can help to contextualise the current state of software systems and are therefore a useful tool for understanding projects.

SCM analysis is only one vector of source code analytics - it is never the “full story”, but sometimes it shows useful patterns, and is a great conversation starter.


Quote faster and
win more business with BindHQ