Complexity Risk
Part Of
Reduced By Practices
- Automated Testing: Simplifies debugging by isolating issues to specific units.
- Configuration Management: Reduces complexity by managing system changes in a controlled and documented manner.
- Refactoring: Refactoring is aimed at making code more orthogonal, less duplicative and clearer to understand
- Review: Identifies and resolves potential complexities during the review process.
- Runtime Adoption: Reduces the amount of code you are responsible for.
Attendant To Practices
- Automated Testing: Managing a large suite of unit tests can add to the complexity.
- Automation: Introducing code adds to the complexity of a project
- Coding: Writing new code adds complexity to a project.
- Demand Management: Forecasting and planning demand can add complexity to project management.
- Documentation: Extensive documentation can sometimes add to complexity rather than simplifying it.
- Issue Management: Managing a large number of logged issues can add complexity.
- Measurement: Collecting and analyzing data can add to the complexity of the project.
- Monitoring: Implementing comprehensive monitoring solutions can add complexity.
- Performance Testing: Requires sophisticated tools and setup, adding complexity.
- Redundancy: Introducing redundancy can add complexity to the system.
- Regression Testing: Managing extensive regression tests can add complexity.
- Security Testing: Requires specialized skills and tools, adding complexity.
- Tool Adoption: Integrating multiple tools can add complexity to the development process.
Complexity Risk is the risk to your project due to its underlying "complexity". Here, we will break down exactly what we mean by complexity, look at where it can hide on a software project and discuss some ways in which we can manage this important risk.
Here we will:
- Look at two ways in which complexity is measured, via Kolmogorov Complexity and Graph-Connectivity.
- Define Complexity Risk, and the related risks of Codebase Risk (complexity in your codebase) and Dead-End Risk (risk of implementations getting "stuck").
- Discuss ways to think about complexity: as mass, technical debt and mess.
- Discuss ways to manage complexity risk, such as modularisation, hierarchy, use of languages and libraries and by avoiding feature creep.
- Discuss places where Complexity Risk manifests in computing.
Codebase Risk
Complexity arises in software projects in a number of different ways. We're going to start by looking at complexity in the code you write. Aside from bragging rights, no-one is interested in having a large codebase:
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” - Bill Gates
The size of your codebase, the amount of code, the number of modules, the interconnectedness of the modules and how well-factored the code is all contribute to Codebase Risk: a specific type of Complexity Risk meaning the complexity of your codebase.
Before we look at the implications of this risk, let's look at some prior-art on how to measure this complexity.
Kolmogorov Complexity
The standard Computer-Science definition of complexity is Kolmogorov Complexity. This is:
"...the length of the shortest computer program (in a predetermined programming language) that produces the object as output." - Kolmogorov Complexity, Wikipedia
This is a fairly handy definition for us as it means that in writing software to solve a problem there is a lower bound on the size of the software we write. While in practice this is pretty much impossible to quantify, that doesn't really matter: here I want to focus on the techniques for moving towards that minimum.
Let's say we wanted to write a JavaScript program to output this string:
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
We might choose this representation:
function out() { (7 )
return "abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd" (45)
} (1 )
(total: 53)
The numbers in brackets on the right indicate how many symbols each line contains. In total, this code block contains 53 symbols if you count function
, out
and return
as one symbol each.
But, if we write it like this:
const ABCD="abcd"; (11)
function out() { (7 )
return ABCD+ABCD+ABCD+ABCD+ABCD+ABCD+ABCD+ (16)
ABCD+ABCD+ABCD; (6 )
} (1 )
(total: 41)
With this version, we now have 41 symbols (ABCD
is a single symbol - it's just a name). And with this version:
const ABCD="abcd"; (11)
function out() { (7 )
return ABCD.repeat(10) (7 )
} (1 )
(total: 26)
... we have 26 symbols.
Abstraction
What's happening here is that we're exploiting a pattern: we noticed that abcd
occurs several times, so we defined it a single time and then used it over and over, like a stamp. This is called abstraction.
By applying abstraction, we can improve in the direction of the Kolmogorov lower bound. By allowing ourselves to say that symbols (like out
and ABCD
) are worth one complexity point, we've allowed that we can be descriptive in naming function
and const
. Naming things is an important part of abstraction, because to use something, you have to be able to refer to it.
Generally, the more complex a piece of software is, the more difficulty users will have understanding it, and the more work developers will have changing it.
Although we should prefer the third version of our code over either the first or second (because of its brevity) we could go further down into Code Golf territory. The following javascript program plays FizzBuzz up to 100, but is less readable than you might hope.
for(i=0;i<100;)document.write(((++i%3?'':'Fizz')+
(i%5?'':'Buzz')||i)+"<br>")
(total: 62)
So there is at some point a trade-off to be made between Complexity Risk and Communication Risk. That is, after a certain point, reducing Kolmogorov Complexity further risks making the program less intelligible.
Refactoring
Abstraction is therefore a key tool in the battle against Complexity Risk: it allows us to jettison repetition. But, as the code-golf example shows, you can go too far. So an important part of software development is picking the right abstractions: ones that are useful, durable and pervasive.
Time spent replacing poor abstractions with better ones is called refactoring.
The above diagram demonstrates that a key practice in battling Codebase Risk is choosing a minimal set of useful abstractions. The attendant risk in doing that work (the downside) is the time spent doing it. That is, Schedule Risk.
Sometimes it is better to have an ok-ish abstraction now rather than a brilliant abstraction too late.
Languages and Dependencies
The above Javascript example also demonstrates a second way in which we can manage Codebase Risk.
In the third version of the program, we used the method .repeat()
, which allowed us to save a further 16 symbols.
.repeat()
is a recent addition to Javascript, added in ES6. What this shows is that the Kolmogorov complexity of a program is actually heavily dependent on the features of the programming language: using ES6-Javascript allows us to produce simpler programs than before.
So as the above diagram shows, we can also reduce Codebase Risk in our choice of languages and third party libraries. This doesn't come without a cost, though. We are trading-off our own Codebase Risk but increasing Dependency Risk and Boundary Risk instead.
Connectivity
A second useful measure of complexity comes from graph theory, and that is the connectivity of a graph:
"...the minimum number of elements (nodes or edges) that need to be removed to disconnect the remaining nodes from each other" - Connectivity, Wikipedia
To see this in action, have a look at the graph above. It has 10 vertices, labelled a to j, and it has 15 edges (or links) connecting the vertices together. If any single edge were removed from the diagram above, the 10 vertices would still be linked together. Because of this, we can say that the graph is 2-connected. That is, to disconnect any single vertex, you'd have to remove at least two edges.
As a slight aside, let's consider the Kolmogorov Complexity of this graph, by inventing a mini-language to describe graphs. It could look something like this:
<item> : [<item>,]* <item> # Indicates that the item
# before the colon
# has a connection to all
# the items after the colon
So our graph could be defined like this:
a: b,c,d
b: c,f,e
c: f,d
d: j
e: h,j
f: h
g: j
h: i
i: j
(total: 39)
In the graph above, I've removed 6 of the edges. Now, we're in a situation where if any single edge is removed, the graph becomes unconnected. That is, it's broken into distinct chunks. So, it is 1-connected.
The second graph is clearly simpler than the first. And, we can show this by looking at the Kolmogorov Complexity in our little language:
a: d,g
b: f
c: d,f
d: j
f: h
e: h
h: i
(total: 25)
So for defining our graphs, Connectivity is also Complexity.
This carries over into software too: heavily connected code is more complex than less-connected code. It's also harder to reason about and work with because changing one module potentially impacts many others. Let's dig into this further.
Hierarchies and Modularisation
In the second graph, I've arranged it as a hierarchy (which I can do trivially now that it's only 1-connected). For 10 vertices, we need 9 edges to connect everything up. It's always:
edges = vertices - 1
Note that I could pick any hierarchy here: I don't have to start at c (although it has the nice property that it has two roughly equally-sized sub-trees attached to it).
How does this help us? Imagine if a - j were modules of a software system and the edges of the graph showed communications between the different sub-systems. In the first graph, we're in a worse position:
- Who's in charge? What deals with what?
- Can I isolate a component and change it safely?
- What happens if one component disappears?
The second graph is easier to reason about because of the reduced number of connections and the new hierarchy of organisation.
On the down-side, perhaps our messages have farther to go now: in the original, i could send a message straight to j, but now we have to go all the way via c. But this is the basis of Modularisation and Hierarchy.
Secondly, it's not apparent to i that j even exists: we have hidden the modules from each other. For anything that i needed to send to j, i must be able to send to c instead.
The trade-off of modularisation/hierarchy is shown in the above diagram, and it's our third tool for battling Codebase Risk.
But we don't just see this in software, it's everywhere in our lives: societies, business, and living organisms all use this technique. For example in our bodies we have:
- Organelles - such as Mitochondria, contained in...
- Cells - such as blood cells, nerve cells, skin cells in the Human Body, inside...
- Organs - like hearts livers, brains etc, held within...
- Organisms - like you and me.
The great complexity-reducing mechanism of modularisation is that you only have to consider your local environment.
Analogies
So, we've looked at some measures of software structure complexity. We can say "this is more complex than this" for a given piece of code or structure. We've also looked at three ways to manage it: Abstraction and Modularisation and via Dependencies.
However, we've not really said why complexity entails Risk. So let's address that now by looking at three analogies, Mass, Technical Debt and Mess
Complexity is Mass
The first way to look at complexity is as Mass : a software project with more complexity has greater mass than one with less complexity.
Newton's Second Law states:
F = ma, ( Force = Mass x Acceleration )
That is, in order to move your project somewhere new and make it do new things, you need to give it a push. The more mass it has the more Force you'll need to move (accelerate) it.
You could stop here and say that the more lines of code a project contains, the greater its mass. And that makes sense because in order to get it to do something new you're likely to need to change more lines.
But there is actually some underlying sense in which this is true in the real, physical world too, as discussed in a Veritasium video. To paraphrase:
"Most of your mass you owe due to E=mc², you owe to the fact that your mass is packed with energy because of the interactions between these quarks and gluon fluctuations in the gluon field... what we think of as ordinarily empty space... that turns out to be the thing that gives us most of our mass." - Your Mass is NOT From the Higgs Boson, Veritasium
I'm not an expert in physics at all so there is every chance that I am pushing this analogy too hard. But, by substituting quarks and gluons for pieces of software we can (in a very handwaving-y way) say that more connected software has more interactions going on, and therefore has more mass than simple software.
If we want to move fast we need simple code-bases.
At a basic level, Complexity Risk heavily impacts on Schedule Risk: more complexity means you need more force to get things done, which takes longer.
Technical Debt
The most common way we talk about Complexity Risk in software is as Technical Debt:
"Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organisations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise." - Ward Cunningham, 1992, Wikipedia, Technical Debt
Building a low-complexity first-time solution is often a waste: in the first version, we're usually interested in reducing Feature Risk as fast as possible. That is, putting working software in front of users to get feedback. We would rather carry Complexity Risk than take on more Schedule Risk.
So a quick-and-dirty, over-complex implementation mitigates the same Feature Risk and allows you to Meet Reality faster.
But having mitigated the Feature Risk this way, you are likely exposed to a higher level of Complexity Risk than would be desirable. This "carries forward" and means that in the future, you're going to be slower. As in the case of a real debt, "servicing" the debt incurs a steady, regular cost.
Kitchen Analogy
It’s often hard to make the case for minimising Technical Debt: it often feels that there are more important priorities, especially when technical debt can be “swept under the carpet” and forgotten about until later. (See Discounting.)
One helpful analogy I have found is to imagine your code-base is a kitchen. After preparing a meal (i.e. delivering the first implementation), you need to tidy up the kitchen. This is just something everyone does as a matter of basic sanitation.
Now of course, you could carry on with the messy kitchen. When tomorrow comes and you need to make another meal, you find yourself needing to wash up saucepans as you go, or working around the mess by using different surfaces to chop on.
It's not long before someone comes down with food poisoning.
We wouldn't tolerate this behaviour in a restaurant kitchen, so why put up with it in a software project? This state-of-affairs is illustrated in the above diagram. Not only does Complexity Risk slow down future development, it can be a cause of Operational Risks and Security Risks.
Feature Creep
In Brooks' essay "No Silver Bullet - Essence and Accident in Software Engineering", a distinction is made between:
- Essence: the difficulties inherent in the nature of the software.
- Accident: those difficulties that attend its production but are not inherent. - Fred Brooks, No Silver Bullet
The problem with this definition is that we are accepting features of our software as essential.
Applying Risk-First, if you want to mitigate some Feature Risk then you have to pick up Complexity Risk as a result. But, that's a choice you get to make.
Therefore, Feature Creep (or Gold Plating) is a failure to observe this basic equation: instead of considering this trade off, you're building every feature possible. This will impact on Complexity Risk.
Sometimes, feature-creep happens because either managers feel they need to keep their staff busy, or the staff decide on their own that they need to keep themselves busy. This is something we'll return to in Agency Risk.
Dead-End Risk
Dead-End Risk is where you take an action that you think is useful, only to find out later that actually it was a dead-end and your efforts were wasted. Here, we'll see that Complexity Risk is a big cause of this.
An Example
For example, imagine a complex software system composed of many sub-systems. Let's say that the Accounting sub-system needed password protection (so you built this). Then the team realised that you needed a way to change the password (so you built that). Then, you needed to have more than one user of the Accounting system so they would all need passwords (OK, fine).
Finally, the team realises that actually authentication would be something that all the sub-systems would need, and that it had already been implemented more thoroughly by the Approvals sub-system.
At this point, you realise you're in a Dead End:
- Option 1: Continue. You carry on making minor incremental improvements to the accounting authentication system (carrying the extra Complexity Risk of the duplicated functionality).
- Option 2: Merge. You rip out the accounting authentication system and merge in the Approvals authentication system, consuming lots of development time in the process, due to the difficulty in migrating users from the old to new way of working. There is Implementation Risk here.
- Option 3: Remove. You start again, trying to take into account both sets of requirements at the same time, again, possibly surfacing new hidden Complexity Risk due to the combined approach. Rewriting code can seem like a way to mitigate Complexity Risk but it usually doesn't work out too well. As Joel Spolsky says:
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming: It’s harder to read code than to write it. - Things You Should Never Do, Part 1, Joel Spolsky
Whichever option you choose, this is a Dead End because with hindsight, it would probably have been better to do authentication in a common way once. But it's hard to see these dead-ends up-front because of the complexity of the system in front of you.
Avoiding Dead Ends
Working in a complex environment makes it harder to see developmental dead-ends.
Sometimes, the path across the Risk Landscape will take you to dead ends, and the only benefit to be gained is experience. No one deliberately chooses a dead end - often you can take an action that doesn't pay off, but frequently the dead end appears from nowhere: it's a Hidden Risk. The source of a lot of this hidden risk is the complexity of the risk landscape.
Version Control Systems like Git are a useful mitigation of Dead-End Risk, because using them means that at least you can go back to the point where you made the bad decision and go a different way. Additionally, they provide you with backups against the often inadvertent Dead-End Risk of someone wiping the hard-disk.
Where Complexity Hides
So far, we've focused mainly on Codebase Risk, but this isn't the only place complexity appears in software. We're going to cover a few of these areas now, but be warned, this is not a complete list by any means:
- Algorithmic (Space and Time) Complexity
- Memory Management
- Protocols / Types
- Concurrency / Mutability
- Networks / Security
- The Environment
Space and Time Complexity
There is a whole branch of complexity theory devoted to how the software runs, namely Big O Complexity.
Once running, an algorithm or data structure will consume space or runtime dependent on its performance characteristics, which may well have an impact on the Operational Risk of the software. Using off-the-shelf data structures and algorithms helps, but you still need to know their performance characteristics.
The Big O Cheat Sheet is a wonderful resource to investigate this further.
Memory Management
Memory Management (and more generally, all resource management in software) is another place where Complexity Risk hides:
"Memory leaks are a common error in programming, especially when using languages that have no built in automatic garbage collection, such as C and C++." - Memory Leak, Wikipedia
Garbage Collectors (as found in Javascript or Java) offer you the deal that they will mitigate the Complexity Risk of you having to manage your own memory, but in return perhaps give you fewer guarantees about the performance of your software. Again, there are times when you can't accommodate this Operational Risk, but these are rare and usually only affect a small portion of an entire software-system.
Protocols And Types
As we saw in Communication Risk, whenever two components of a software system need to interact, they have to establish a protocol for doing so. As systems become more complex, and the connectedness increases, it becomes harder to manage the risk around versioning protocols. This becomes especially true when operating beyond the edge of the compiler's domain.
Although type-checking helps mitigate Protocol Risk, when software systems grow large it becomes hard to communicate intent and keep connectivity low. You can end up with "The Big Ball Of Mud":
"A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. " - Big Ball Of Mud, Wikipedia
Concurrency / Mutability
Although modern languages include plenty of concurrency primitives (such as the java.util.concurrent libraries), concurrency is still hard to get right.
Race conditions and Deadlocks abound in over-complicated concurrency designs: complexity issues are magnified by concurrency concerns, and are also hard to test and debug.
Recently, languages such as Clojure have introduced persistent collections to alleviate concurrency issues. The basic premise is that any time you want to change the contents of a collection, you get given back a new collection. So, any collection instance is immutable once created. The trade-off is again speed to mitigate Complexity Risk.
An important lesson here is that choice of language can reduce complexity: and we'll come back to this in Software Dependency Risk.
Networking / Security
There are plenty of Complexity Risk perils in anything to do with networked code, chief amongst them being error handling and (again) protocol evolution.
In the case of security considerations, exploits thrive on the complexity of your code, and the weaknesses that occur because of it. In particular, Schneier's Law says, never implement your own cryptographic scheme:
"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. It's not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis." - Bruce Schneier, 1998
Luckily, most good languages include cryptographic libraries that you can include to mitigate these Complexity Risks from your own code-base.
This is a strong argument for the use of libraries. But when should you use a library and when should you code-your-own? This is covered further in the section on Software Dependency Risk.
The Environment
The complexity of software tends to reflect the complexity of the environment it runs in, and complex software environments are more difficult to reason about, and more susceptible to Operational Risk and Security-Risk.
In particular, when we talk about the environment, we are talking about the number of external dependencies that the software has, and the risks we face when relying on those dependencies.
So the next stop in the tour is a closer look at Dependency Risk.