Skip to main content

Hierarchies

"Hierarchies are the wheel we're happy to re-invent over and over and over again."

Have you noticed just how prevalent the idea of heirarchy is in software? By hierarchy, I mean a general pattern of connectedness, where a parent has some children, who, in turn, may each have some further children, and so on.

This is a really popular pattern. But what's odd about it is that we seem to re-invent it over and over again in software.

In this article, I am going to look at the two basic types of hierarchy, compositional and subsumptive, explaining what these are and giving examples of where we see them whether that's in programming languages, computing generally or other areas of life.

The chief claim of this article is that our inability to consistently generalise the pattern of hierarchy has been a lost opportunity. But nevertheless, we'll look at some attempts to abstract hierarchy in software systems, and talk about what is lost by not having a single abstraction for hierarchy.

So let's start with...

A Javascript Example

Here's some code from the ChartJS website I was looking at, just the other day:

<canvas id="myChart" width="400" height="400"></canvas>
<script>
var ctx = document.getElementById('myChart').getContext('2d');
var myChart = new Chart(ctx, {
type: 'bar',
data: {
labels: ['Red', 'Blue', 'Yellow', 'Green', 'Purple', 'Orange'],
datasets: [{
label: '# of Votes',
data: [12, 19, 3, 5, 2, 3],
backgroundColor: [
'rgba(255, 99, 132, 0.2)',
'rgba(54, 162, 235, 0.2)',
'rgba(255, 206, 86, 0.2)',
'rgba(75, 192, 192, 0.2)',
'rgba(153, 102, 255, 0.2)',
'rgba(255, 159, 64, 0.2)'
],
borderColor: [
'rgba(255, 99, 132, 1)',
'rgba(54, 162, 235, 1)',
'rgba(255, 206, 86, 1)',
'rgba(75, 192, 192, 1)',
'rgba(153, 102, 255, 1)',
'rgba(255, 159, 64, 1)'
],
borderWidth: 1
}]
}
});
</script>

We've got...

  • A hierarchy of tags such as <canvas>, <script> and so on.
  • We have attributes on the tags, like width and height.
  • Within the <script> tag, we have a hierarchy of statements, where we assign the variablesctx and myChart in order.
  • A hierarchy of method calls on objects, such as: document.getElementById('myChart').getContext('2d')
  • A nested hierarchy of maps (delimited by { and }) and arrays (delimited by [ and ]), starting with { type: 'bar'...
  • Within those arrays, we have strings, consisting of characters, delimited by ' here, such as 'rgba(153, 102, 255, 1)'.
  • But actually, that's another function call, which uses brackets to separate the name of the function rgba from it's parameters.
  • And those parameters are separated by ,s...
  • And the Chart constructor is defined in a different package, which is a collection of files in a repository (maybe npm or jsdelivr.
  • Some hierarchy is indicated using indentation of the code. Some isn't.

What is the take-away from this? We heavily use syntax to indicate different types of hierarchies within software. Just look at all the different ways it happens above: brackets, single-quotes, curly brackets, angle-brackets and tags, square brackets, double-quotes, commas and semi-colons.

A LISP Example

The above is an extreme example: there's a lot of different syntax going on in that ChartJS file! At the other extreme, LISP code looks a bit like this:

(defun fibonacci (N)
"Compute the N'th Fibonacci number."
(if (or (zerop N) (= N 1))
1
(+ (fibonacci (- N 1)) (fibonacci (- N 2)))))

Hierarchy is really managed by just three things here:

  • spaces, to split up children
  • quotation marks ("), to delineate strings and
  • Parentheses (( )) which are capable of creating nested hierarchies.

A Desktop Example

On the desktop, hierarchy is also re-invented multiple times.

Navigating Hierarchies

Have a look at the image above. Let's go through the different types of hierarchy here:

  • At the desktop level, I have several programs running. I can tab between them using CMD+Tab, or I can see all the options listed vertically on the right hand side.
  • One level down in the hierarchy, in Safari, Eclipse and VS-Code and Terminal, I have tabs, shown horizontally across the top of the window. Tabs look different in each of these, but navigating between those uses CTRL+Tab.
  • I might also have multiple windows in these applications (each with tabs inside). It's not clear how I navigate in this level of the hierarchy.
  • Further, I also have a directory trees in VS-Code and Eclipse, both of which display the files/directories hierarchy vertically with indents. Again, different navigation semantics for each.
  • Finally in Terminal, I can move up/down directory trees with the cd command.

So at the very least, we have a two hierarchies: applications/windows/tabs and directories/files. You could say that the take-away from this example is that we use looks and behaviour to indicate these different types of hierarchy.

Hierarchies Everywhere

Hierarchies are the wheel we're happy to re-invent over and over and over again! What about:

  • Classes in a Package
  • Packages in a namespace
  • Repos in a Project
  • Projects within a GitHub organisation
  • Servers in a Data Centre
  • Teams of People
  • Countries, Counties, Towns etc.

We build hierarchies not just into software, but all over our societies. They seem fundamental to how we understand things. As well as Family Trees and Organisation Charts, we use hierarchies everywhere. For example, biologists often break down the complexity of the human body like this:

  • Organelles - such as Mitochondria, contained in...
  • Cells - such as blood cells, nerve cells, skin cells in the Human Body, inside...
  • Organs - like hearts livers, brains etc, held within...
  • Organ Systems - like the circulatory system, the immune system, the respiratory system, contained in...
  • Organisms - like you and me.

Wikipedia calls this a compositional containment hierarchy:

"The compositional hierarchy that every person encounters at every moment is the hierarchy of life. Every person can be reduced to organ systems, which are composed of organs, which are composed of tissues, which are composed of cells, which are composed of molecules, which are composed of atoms. In fact, the last two levels apply to all matter, at least at the macroscopic scale. Moreover, each of these levels inherit all the properties of their children. " - Hierarchy, Wikipedia

The Curse Of Reinvention

Sadly in the world of computing, there is no common abstraction for all of these types of hierarchy, we have to learn different APIs for operating across them all. There's no way to observe that since files in a directories is a hierarchy, and tabs in a browser is also a hierarchy there should be a common way of navigating that structure.

Why do we keep re-inventing the same thing? Is there a reason for this lack of abstraction, or is it an accident?

Syntax Trees

One potential reason is syntactic sugar.

In a strongly-typed language like Java, for example we might have this:

public class Numbers {
public static void main(String[] args) {
System.out.print("This is a number: " + 4);
}
}

The Eclipse IDE has an AST View which you can install, which allows you to see the compositional hierarchy of this Java code as the Eclipse compiler understands it. However, this is an excerpt for the above program. The full hierarchical view is a lot larger (6107 lines, in fact).

> type binding: org.riskfirst.website.preprocessor.Numbers
BODY_DECLARATIONS (1)
> method binding: Numbers.main(String[])
BODY
STATEMENTS (1)
EXPRESSION
MethodInvocation [121+42]
> method binding: PrintStream.print(String)
EXPRESSION
QualifiedName [121+10]
> variable binding: System.out
ARGUMENTS (1)
InfixExpression [138+24]
> (Expression) type binding: java.lang.String
LEFT_OPERAND
StringLiteral [138+20]
ESCAPED_VALUE: '"This is a number: "'
OPERATOR: '+'
RIGHT_OPERAND
NumberLiteral [161+1]
TOKEN: '4'

So arguably, using all this different syntax helps us:

  • Keep track of where in the hierarchy we are.
  • Reduce the amount of space used to display the same information.

... however, just because you're doing that, doesn't mean that some common abstraction wouldn't still be useful.

Intent

A second reason for this multitude of hierarchies goes back to explaining the use to which they are put. Because of LISP's parsimony, you miss out on knowing what the different types of hierarchy are. You don't know whether you're looking at XML attributes or elements, or array structures, or maps, so it's harder to understand the purpose of the hierarchy. In the ChartJS example, assuming you know what you're looking at, the purpose of these different hierarchies is a lot clearer.

Also, you miss out on the huge number of )))))s that you get with LISP.

Without Compositional Hierarchy

Where would we be without compositional hierarchy in our software code? It's not impossible to imagine:

  • We could write code in a stack-less, goto-oriented way, but such programs are extremely hard to reason about, as discussed in E.W. Dijkstra's seminal paper Goto Considered Harmful.
  • Finite State Machines are a pretty useful tool in the toolbox, managing state transitions, but without hierarchy (although you could build a finite hierarchy within them).
  • Turing Machines and the BrainFuck language both manage without any kind of hierarchy, and are Turing Complete, meaning that you can do any kind of computing in them. Although, they're both very hard to reason about.
# Hello World, in BrainFuck

+[-->-[>>+>-----<<]<--<---]>-.>>>+.>>..+++[.>]<<<<.+++.------.<<-.>>>>+.

Subsumptive Hierarchies

"A subsumptive containment hierarchy is a classification of object classes from the general to the specific. " - Hierarchies, Wikipedia

The other type of hierarchy we come across both in software and everywhere else in the human experience is the classification hierarchy or subsumptive hierarchy. For example: people are a type of mammal which is a type of animal. This kind of thinking is embedded deeply in type systems and object-oriented programming.

Type Systems are built on the concept of subsumptive hierarchies. Unlike composition, with subsumptive hierarchies, you can classify things along many axes. For example, a cup might fit into the classifications "drinking receptacle", "kitchenware" and "Star-Wars memorabilia" all at the same time.

A lot of the power of Interfaces in programming languages comes from being able to do this. This leads to a really interesting point: whenever we "reach out" of the compositional hierarchy of a software program to call another piece of code (via calling a function or a variable) we fall back to using the classification hierarchy to determine whether that connection is a valid one.

public class Numbers {
public static void main(String[] args) {
System.out.print("This is a number: " + 4);
}
}

In the above snippet of Java code, there is a place where we leave the compositional hierarchy to call a static function from somewhere else: System.out.println. When this happens, we rely on the classification hierarchy of the Java Type System to determine whether that call is acceptable. System.out.println: takes a String as it's argument, and it's the job of the type checker to make sure that this call-outside-the-hierarchy will work.

This is a good example of compositional and subsumptive hierarchies working together.

Planets

Subsumptive hierarchies are difficult for a couple of reasons. The first being how to decide what's in and what's out of the category.

Eris, From Wikipedia (NASA, ESA, and A. Schaller (for STScI))

(Eris, from Wikipedia)

As an example of this, let's consider planets. The definition of a planet is quite bogus, and has changed over time:

  • The Greeks coined asteres planetai to be the class of objects in the sky that moved separately from the rest of the body of stars. Possibly including moons, comets and asteroids. 1.
  • However, after the Copernican Revolution made the moon a satellite of earth, the defintion of planets seemed to be bodies orbiting the sun, and there were just 9 of them: Mercury, Mars, Earth, Venus, Saturn, Jupiter, Uranus, Neptune and Pluto.
  • In 2005, The Discovery of Eris, a body larger than Pluto orbiting in a trans-Neptunian orbit meant that potentially hundreds of objects deserved the term planet.
  • In response, Pluto was demoted to being a dwarf planet. In order to do this, the definition of planet was changed to include the clause that it had "cleared its neighbourhood" of most other orbiting bodies. This excluded Kuiper-Belt objects such as Pluto, but is still problematic, as Alan Stern discusses below.

"I and many other planetary scientists — like the almost 400 that signed a petition against the IAU in 2006 — have a problem with the IAU definition because the implications of it are just nonsensical. Here's why. The IAU's "zone-clearing" criteria, when worked out mathematically, means that to qualify as a planet at larger and larger distances from the sun, a body has to have more and more mass than it would in a closer orbit. This is in part because the zones get larger (like distance cubed, or volume) as you go outward; it's also in part because orbital speeds are slower further out, so zone-clearing takes longer." - Alan Stern, Fighting for Pluto's Planet Title

So the problem comes down to the fact that, on one hand, we want a nice classification of the eight or nine largest objects orbiting our sun, rather than a messy classification of hundreds.

Generics

The second problem with subsumptive hierarchies is that the composition creeps back in: you can't do everything just with a subsumptive hierarchy. This means that we end up with type definitions like this:

public interface Map<K,V> {

...

V get(Object key);

...
}

This is saying that a Map is composed with a key type K and a value type V. Our subsumptive hierarchy of types ends up making use of compositional hierarchy!

Both Hierarchies Together

So to recap, there are two key types of hierarchy:

Compositional hierarchies are used everywhere in software development: files in disks in servers, methods and functions in packages and namespaces etc. Good programming languages attempt to capture as much of the program's complexity as possible within the containment hierarchy.

People understand compositional hierarchies because they're baked into (and invented by) our brains. When I look outside at a car, I can see that it is a compositional hierarchy of windows, wheels and metal panels. When I think about my house, I think about different objects being contained within different rooms within a structure of bricks. But none of that exists: it's all in my head. Everything is really just a bunch of atoms.

Subsumptive hierarchies are also used everywhere in software development: strings, numbers, records, classes, types, schemas. A key ability for a programmer is often to be able to abstract from multiple areas and say "this is like this". By noticing patterns of classification we can save on the amount of code we have to write.

Compositional hierarchies on a larger project:  methods, classes, packages, directories, projects

In Eclipse (my Java IDE) I can therefore view both these types of hierarchy. In the above screen-grab, you can see some more compositional hierarchy on the left: methods, classes, packages, directories and projects.

Classification hierarchy of the Resource class from Spring

Whereas in this screen grab, I can view the hierarchy of a class within Java (here the Resource class from Spring).

Compositional Hierarchy Implementations

Java

Because hierarchies are so trivial to implement in software systems, they tend to be implemented over and over again. In Java, I can build a hierarchy out of nested List objects, or nested Arrays, or I can explore the hierarchy of an XML document via nested Nodes. Each has a different set of methods for navigating them.

Nevertheless, there are some after-the-fact efforts to provide consistent interfaces across hierarchies, such as Spring Expression Language or JXPath provide a consistent hierarchical abstraction over these, so that I can write:

URLs

A second example to look at is the URL. URL paths have hierarchy built into them. I can navigate upwards from from a child to successive levels of parent like so:

https://github.com/robmoffat/kite9-visualization/actions # Actions on one of my github projects
https://github.com/robmoffat/kite9-visualization # One of my github projects
https://github.com/robmoffat # All of my projects
https://github.com # Top level of github

This usually works, but isn't guaranteed to (there might be some intermediate levels within the URL scheme which yield 404 - Not Found pages).

However, going the other way (from parent to child) isn't so easy: there is no corresponding standard to say "this URL has these children". Instead, I am presented with a screen of HTML to look at, but there is no guarantee that the links it contains will take me to child pages in a hierarchy.

HATEOAS

A poorly-adopted example of what this might look like is HATEOAS - Hypertext As The Engine Of Application State. When you do a REST request and get a HATEOAS response, you indeed get links to all child resources. Requesting the URL /accounts/12345/ would return a response like:

{
"account": {
"account_number": 12345,
"balance": {
"currency": "usd",
"value": 100.00
},
"links": {
"deposit": "/accounts/12345/deposit",
"withdraw": "/accounts/12345/withdraw",
"transfer": "/accounts/12345/transfer",
"close": "/accounts/12345/close"
}
}
}

(Example from Wikipedia)

But even HATEAOS isn't really representing a compositional hierarchy here: those links could point off to anywhere.

Why Bother?

So, what are we missing out on here? Why is this a big deal? In a word, it's about consistency, as with every abstraction. In many programming languages there is the concept of the namespace, which ensures we have a consistent subsumption hierarchy. In fact, this is what allows us to use different packages together in a project, and not end up having the types clash with each other.

But by lacking consistent compositional hierarchy, either within a language or across all software systems, we lose the ability to navigate the world's information easily. URLs go some way to achieving this, but not far enough!

In the next article, we'll look at Crystals And Code, and how the lack of consistent abstractions means that our Information Systems are perpetually broken.