Combinatorial Explosion of Versions 5 Years Later

Five years ago, I started working on the problem of combinatorial explosion of versions described in my earlier blog post. Today I would like to summarize the journey of these five years and where we are headed next in this research.

Combinatorial Explosion of Version representation by connected futuristic lasers

Table of Contents

I Scope of Versioning Problem

Five years ago, I was very invested in the DevOps field and primarily interested in versioning of microservices. Today, I’m working within supply chain security as a whole and considering the versioning explosion far beyond just microservices.

This significantly increases the scope of the problem. Essentially, the core of the problem is the same – multiple permutations of various component versions, however dealing with multi-layered transitive dependencies vastly complicates an already complex situation.

I also expanded my thoughts related to versioning as described here.

II From Functionality to Security

Initially, I was more interested in the functional working of the system.

For my initial blog post, I received several replies that tracking version permutations was not needed in the first place, because we can reduce the problem scope using versioning at the API level.

The idea is that API versions should be much more long-lived than the software component versions, therefore if every component adheres to a certain API contract, that largely makes the whole problem irrelevant.

However, this line of thinking does not help at all from a security angle. Every unique permutation of components creates a specific security posture with its own risk factors. With that in mind, functional compatibility via API contracts does very little to manage the security of the application.

III Project Work

I started implementing various ideas related to release versioning in our Reliza Hub project. Many of those were then incorporated into our latest project – ReARM.

On top of that, I’m currently actively contributing to OWASP Transparency Exchange API, which benefits a lot from these ideas, although in a simplified form.

IV Data Model

During my work on various related projects, one of the biggest achievements I consider is the data model, which is presented here (with some additions since then, which I will describe in later posts).

V How we Produce Technology Products

The key idea is we need to realize that any packaged technology product (this applies to both software and hardware) – is a specific permutation point in a multi-dimensional space of various available components. Moreover, each of these components can be recursively viewed in the same way – as a separate packaged product.

Now, as product creators or users, we make a selection of which such permutation point to choose. And in the process we have certain constraints – such as compatibility, security and the option to add our own code (which, in the grand scheme, simply creates an additional dependency – that is an additional dimension in the component space).

VI Movement within Multi-dimensional Component Space

Software and hardware updates represent movement within the component space. Such movement is subject to its own constraints, for example, software is usually significantly easier to update than hardware.

With this in mind, it may be useful to view certain components as fixed, while allowing other components to be updated. But such updates frequently come with their own constraints – i.e., within a certain major version.

VII Why This Matters

Again, five years ago I was preoccupied with functional compatibility of the system. But today I strongly believe that security is a more significant concern and much less visible.

If we are not aware of the exact composition of our product, it is virtually impossible to assess security risks. But in reality it is equally impossible to have full confidence in the composition of the product we are dealing with.

Therefore, what we should be looking at is twofold:

Security posture within the known composition of the product
Level of entropy of what we know versus what is still unknown about the composition of the product

VIII Summary – Present vs Future

Right now it seems that the best we can do is represent our products as accurately as possible. Various xBOM tools, such as cdxgen for generation, Dependency-Track for analysis, and ReARM for storage and representation are helping with that.

Next, we should start gauging the entropy of what is still unknown. We will probably never be perfect in that, but we could start by imagining all the things that compose our product and then estimating how many of them we can identify using existing tools. Note, that this work has already begun with frameworks like OWASP SCVS.

Then, we should move towards automated systems that produce better points on multi-dimensional component space where our product should reside. Again some work in that direction has started a while back with tools like Dependabot and Renovate – but these are still far from what could eventually constitute a complete system.