Complexity Risk
Part Of
Reduced By Practices
- Automated Testing: Aids in refactoring by ensuring that functionality survives the change.
- Configuration Management: Reduces complexity by managing system changes in a controlled and documented manner.
- Dependency Adoption: Can reduce the amount of code you are responsible for, hence the amount of 'owned' complexity.
- Refactoring: Refactoring is aimed at making code more orthogonal, less duplicative and clearer to understand
- Review: Identifies unnecessary complexity and communicates necessary complexity to the rest of the team.
Attendant To Practices
- Automated Testing: Managing a large suite of unit tests can add to the complexity.
- Automation: Introducing automation adds to the complexity of a project
- Coding: Writing new code adds complexity to a project.
- Demand Management: Forecasting and planning demand can add complexity to project management.
- Documentation: Documentation is also a source of complexity on a project and can slow down change.
- Issue Management: Managing an excessive number of logged issues can add complexity.
- Measurement: Collecting and analyzing data can add to the complexity of the project.
- Monitoring: Implementing comprehensive monitoring solutions can add complexity.
- Performance Testing: Requires sophisticated tools and setup, adding complexity.
- Redundancy: Introducing redundancy can add complexity to the system.
- Regression Testing: Managing extensive regression tests can add complexity.
- Security Testing: Requires specialized skills and tools, adding complexity.
- Tool Adoption: Integrating multiple tools can add complexity to the development process.
Complexity Risk is the risk to your project due to its underlying "complexity" - the amount of code, documentation, issues, features, different types of user and so on and, crucially, the relationships between those elements.
Looking at the living world, society or software in general, we can see that -over time- complexity increases. There is a trade-off wherein we can capture more resources, more value or more user requirements (respectively) through increasing complexity. The downside of this is that complex systems are more brittle and harder to change.
Worked Example
It's the early 2000s: your Pokémon website is becoming really popular and profitable and has a large, enthusiastic customer base. But you're worried that you're carrying too much Operational Risk as the whole thing is run on a single server and database that you've rented from a hosting provider. What if it goes down? Or the disk crashes? The provider isn't interested in helping you, so you hire a second server and database and work out a process for load balancing between the two of them. You write scripts that keep the databases in sync and implement sticky sessions so that users only see their own version. There are lots of corner cases you have to work through and it is a major headache.
It's the early 2020s: your Pokémon website is becoming really popular and profitable but you're worried that you're carrying too much Operational Risk. You're able to turn on some backup features, load balancing and increase the instances via the console provided by your Cloud Service Provider, handing off the Complexity Risk to them at some expense. As well as helping with Demand Management, CSPs have allowed software developers to shift a lot of Complexity Risk to them, the downsides being cost and lock-in.
Example Threats
1. Space and Time Complexity
There is a whole branch of complexity theory devoted to how the software runs, namely Big O Complexity.
Threat: Once running, an algorithm or data structure will consume space or runtime dependent on its performance characteristics, which may well have an impact on the Operational Risk of the software. Using off-the-shelf data structures and algorithms helps, but you still need to know their performance characteristics.
The Big O Cheat Sheet is a wonderful resource to investigate this further.
2. Memory Management
Threat: Memory Management (and more generally, all resource management in software) is another place where Complexity Risk hides:
"Memory leaks are a common error in programming, especially when using languages that have no built in automatic garbage collection, such as C and C++." - Memory Leak, Wikipedia
Garbage Collectors (as found in Javascript or Java) offer you the deal that they will mitigate the Complexity Risk of you having to manage your own memory, but in return perhaps give you fewer guarantees about the performance of your software. Again, there are times when you can't accommodate this Operational Risk, but these are rare and usually only affect a small portion of an entire software-system.
3. Protocols And Types
As we saw in Communication Risk, whenever two components of a software system need to interact, they have to establish a protocol for doing so.
Threat: As systems become more complex, and the connectedness increases, it becomes harder to manage the risk around versioning protocols. This becomes especially true when operating beyond the edge of the compiler's domain.
Although type-checking helps mitigate Communication Risk, when software systems grow large it becomes hard to communicate intent and keep connectivity low. You can end up with "The Big Ball Of Mud":
"A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. " - Big Ball Of Mud, Wikipedia
4. Concurrency / Mutability
Although modern languages include plenty of concurrency primitives (such as the java.util.concurrent libraries), concurrency is still hard to get right.
Threat: Race conditions and Deadlocks abound in over-complicated concurrency designs: complexity issues are magnified by concurrency concerns, and are also hard to test and debug.
Languages such as Clojure introduce persistent collections to alleviate concurrency issues. The basic premise is that any time you want to change the contents of a collection, you get given back a new collection. So, any collection instance is immutable once created. The trade-off is again speed to mitigate Complexity Risk.
An important lesson here is that choice of language can reduce complexity: and we'll come back to this in On Software Dependencies.
5. Networking / Security
There are plenty of Complexity Risk perils in anything to do with networked code, chief amongst them being error handling and (again) protocol evolution.
Threat: In the case of security considerations, exploits thrive on the complexity of your code, and the weaknesses that occur because of it. In particular, Schneier's Law says, never implement your own cryptographic scheme:
"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. It's not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis." - Bruce Schneier, 1998
Luckily, most good languages include cryptographic libraries that you can include to mitigate these Complexity Risks from your own code-base.
This is a strong argument for the use of libraries. But when should you use a library and when should you code-your-own? This is covered further in the section on On Software Dependencies.
6. The Pursuit Of Perfection
Complexity arises in software projects in a number of different ways. Sometimes, a quick-and-dirty solution is a better option than one that deals with every corner-case but adds excessively to the complexity budget. Aside from bragging rights, no-one is interested in having a large codebase:
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” - Bill Gates
The size of your codebase, the amount of code, the number of modules, the interconnectedness of the modules and how well-factored the code is all contribute to Complexity Risk.
In 2012, Knight Capital Group, lost $440 million in 45 minutes because of a software deployment error caused by legacy code and unnecessary complexity. Knight was deploying new trading software designed to handle their market-making operations and made use of Feature Flags to enable or disable versions of their code. However, during deployment, they neglected to properly unflag some of the old functionality in one of the servers.
The old code, known internally as “Power Peg,” (unused for eight years) was essentially a ticking time bomb. Instead of simplifying the system by removing dead code, they had layered new features on top of it.
When the new deployment went live in started buying and selling millions of shares at lightning speed, generating massive unintended trades. Within minutes, Knight's system had created enormous market disruptions, and by the time the error was caught, the firm had incurred $440 million of losses.