Microservices – Combinatorial Explosion of Versions

Combinatorial Explosion of Versions of Microservices
Combinatorial Explosion of Component Versions

As the IT world transitions to microservices and tools like Kubernetes are roaring, there is this one lingering issue that slowly comes full force. That is Combinatorial Explosion of versions of various microservices. Community expectation is that it is potentially much better than the dependency hell of the previous era. But nonetheless versioning of products built on microservices it is a pretty hard problem. To prove the point articles like “Give Me Back My Monolith” immediately come to mind.

If you’re wondering what this all is about, let me explain. Suppose your product consists of 10 microservices. Now suppose each of those microservices gets 1 new version. Just 1 version – we can all agree that sounds pretty trivial and insignificant. Now look back at our product. With just 1 new version of each component we now have 2^10 – that is 1024 permutations of how we can compose our product.

If this is not entirely clear, let me explain the math. We have 10 microservices, each has one update. So we have 2 possible versions for each microservice (either the old one or the updated one). Now, for each component we can use either one of those 2 versions. That is equivalent to having binary number with 10 places. In example, let’s say 1’s are new versions and 0’s are old versions so one possible permutation would be 1001000000 with 1st and 4th component updated and all others not. From math we know that binary number with 10 places has has 2^10 or 1024 variations. That is exactly the number we are dealing with here.

Now to continue with our thinking – what happens if we have 100 microservices and 10 possible versions each? Whole thing gets pretty ugly – it’s now 10^100 permutations – which is an enormous number. To me, it’s good to state it like this because now we’re not hiding behind words like “kubernetes”, but facing this hard problem face on.

Why I’m so captivated by this problem? Partly because coming from NLP / AI world – we were actively talking about the problem of combinatorial explosion in that field maybe 5-6 years ago. Just instead of versions we would have different words and instead of products we would have sentences and paragraphs. Now, while NLP and AI problem remains largely unsolved, a matter of fact is that substantial progress has been made recently (to me the progress could be faster if people would be a little less obsessed with machine learning and a little more considerant of other techniques – but that would be off-topic).

Back to the DevOps world of containers and microservices. We have this enormous elephant of a problem in the room and frequently what I hear is – just take kubernetes and helm and it’ll be fine. Guess what, it won’t be fine on its own. More so, closed-form solution for such problem is not feasible. Like in NLP, we should first approach this problem by limiting the search space – that is pruning outdated permutations.

One of the things that help – I mentioned last year in this blog about a need to keep minimal span of versions in production. Also it is important to note that good CI/CD process helps a lot in pruning variations. However, current state of CI/CD is not enough without proper accounting, tracking and tooling to handle actual permutations of components.

What we need is larger scale integration-stage experiments where we could establish risk factor per components, have some automated process to upgrade different components and test without human intervention to see what’s working and what’s not.

So the system could look like:

  1. Developers writing tests (this is crucial – because otherwise there is no reference point, it’s like labeling data in ML)
  2. Every component (project) has its own well-defined CI pipeline – this process is well established by now and CI problem per-component is largely solved
  3. “Smart Integration Engine” sits on top of various CI pipelines and assembles component projects into final product, runs the test and figures out shortest path to completion of desired features given present components and computes risk factors. If upgrades are not possible, such engine alerts Developers about best possible candidates and where it thinks things are failing. Again, tests are crucial – the integration engine uses tests as reference point.
  4. CD pipeline then pulls data from Smart Integration Engine and performs the actual roll-out. This completes the cycle.

In summary, to me one of the biggest pains right now is the lack of an integration engine that would mix various components into a product and thus allow for proper trace-ability of how things actually work in the complete product. I would appreciate thoughts on this (Spoiler alert – I’m currently working on Reliza to act as that “Smart Integration Engine”.)

One final thing I want to mention – to me monolith is not an answer for any project of substantial size. So I would be very skeptical of any attempt to actually improve lead times and quality of deliveries by going back to monolith. First, monolith has similar problem of dependency management between various libraries but it’s largely hidden in the development time. As a result, people can’t really make any changes in the monolith so whole process slows to a crawl.

Microservices make things better, but then they hit versioning explosion at the integration stage. Yes, essentially, we moved the same problem – from the dev stage to the integration stage. But, in my view, it is still better and teams actually perform faster with microservices (likely just because of a smaller batch size). Still, improvement we got so far by dismantling monoliths into microservices is not enough – version explosion of components is a huge problem with a lot of potential to make things better.

Link to discuss on HN.

Japanese translation by IT News: マイクロサービスにおけるバージョンの組み合わせ爆発

Chinese translation by InfoQ: 微服务——版本组合爆炸!

Russian translation by me on Habr.com: Микросервисы — комбинаторный взрыв версий

Reliza Hub Tutorial Using Our Playground

Reliza Hub is a DevOps Metadata Management System. It helps manage software releases in the era of kubernetes and micro-services. Tutorial covers the following:

  • Projects and Products, and how to create releases for them
  • How to connect CI script to generate new Project releases (we use GitHub Actions as an example)
  • How to send data from instances to Reliza Hub and how to request back data with details about target releases

Be sure to check Reliza Hub Playground and corresponding GitHub repository.

Reliza Hub Playground Tutorial

Automatic Version Increments With Reliza Hub: 2 Strategies

This article describes how to set up automated version increments for use in CI build pipelines. I will go over 2 possible strategies: for simple CalVer workflow I would be using open-source Reliza Versioning tool. For fully synchronized workflow I would be using Reliza Hub SaaS.

Update: Please check our tutorial for more advanced use cases of Reliza Hub.

I Choosing Versioning Schema

For a project architect, one of the necessary first steps is to choose a Versioning Schema. Two most popular conventional models for today are SemVer and CalVer

Both have their pros and cons. Discussing them in details is out of scope of this article, however I will highlight the differences very briefly.

Main benefits of SemVer are that it has a strict convention and allows to estimate amount of changes between versions by just looking at actual versions.

For CalVer, main benefit  is that it allows to quickly see version relevance from today’s prospective (by establishing the difference between version’s date and today’s date). This part is essentially missing from SemVer, since SemVer versions tell nothing about when they were created.

However, downside of CalVer is predictably lack of difference semantics – for example, a year difference in CalVer versions may only be in a single line of code – and CalVer version usually would not have enough semantics to compensate. Even though CalVer is less conventionalized and actually presents by itself a class of version schemas which common pattern (usually, year and month).

So with these and other considerations (i.e., certain tools would require to use particular schema), it is required to pick a schema for the project.

II Simple workflow with Reliza Versioning OSS

Simple standalone workflow is usually based on automatic increment of previous version referenced somewhere in the source code or in the build process.

Reliza Versioning Open Source Solution has CLI that may be used in-place for version auto-increments in such workflows.

Let’s say we are using Ubuntu flavor CalVer (YY.0M.Micro). If we need to generate first version, we would run

 docker run --rm relizaio/versioning -s YY.OM.Micro 

Which would produce CalVer version based on today’s date. Since I’m writing this in February 2020, I’m currently getting 20.02.0.

Let’s now assume that I have an old version referenced that happens to be 20.01.3 and I need to do CalVer style bump on it. This means, that if date has changed, it will bump date first, so if I perform:

docker run --rm relizaio/versioning -s YY.OM.Micro -v 20.01.3 -a bump

I would get 20.02.0 (again, I’m writing this in February 2020).

Note that, if we’re still in February 2020 and our previous version is 20.02.4, running simple bump on that would produce 20.02.5, since only micro component may be bumped.

Now, if I deliberately only want to bump micro component and not bump date, I can run 

 docker run --rm relizaio/versioning -s YY.OM.Micro -v 20.01.3 -a bumppatch 

This would in turn produce 20.01.4.

Simple enough? All that is left is to introduce this run command inside build pipeline.

Now, similar strategy works with SemVer:

docker run --rm relizaio/versioning -s semver 

would initialize version at 0.1.0.

docker run --rm relizaio/versioning -s semver -v 3.8.2 -a bump

Would produce 3.8.3.

If we would to bump minor instead (and get 3.9.0), run

 docker run --rm relizaio/versioning -s semver -v 3.8.2 -a bumpminor 

Or to bump major (and obtain 4.0.0):

docker run --rm relizaio/versioning -s semver -v 3.8.2 -a bumpmajor 

III Synchronized Automated Workflow using Reliza Hub

Reliza Hub is a deployment and release metadata management tool. Above other features, it keeps track of project version schemas and version pins in project branches.

What is version pin in a branch? Suppose we have SemVer schema and we have Release branch built in December 2019, and we also have our regular master branch. To distinguish between those branches, it is a good practice to keep branches on different minor versions (or in certain cases on different major versions).

This means that Master branch may have 1.3.x pin, while release branch may have 1.2.x pin. This way we can understand which branch a release belongs to just by looking at major and minor components of the version.

Similar effect may be achieved with CalVer versioning – suppose we’re using Ubuntu style CalVer as above (YY.0M.Micro). Then we may choose to pin some stable production branch to say 2019.11.Micro, while keeping our master branch on the latest (YY.0M.Micro) schema. Effectively, Reliza will bump version according to current date and resolve conflicts via increments of Micro component. It is very similar to SemVer, main difference is that Pin is usually set on the date and not on major / minor combination. More details about different version components can be found in the README of Reliza Versioning repository on GitHub.

Let us now discuss how to mount fully automated workflow on Reliza Hub (note: Reliza Hub is currently in the public preview – until mid-June 2020, and after that there will be a free tier for individual developers – see more pricing details here).

First, navigate to https://relizahub.com , read terms and if agreed either authenticate with GitHub or click OK. Then create your organization and navigate to Projects:

Projects in Reliza Hub
Projects in Reliza Hub

Then click on plus-circle icon to create new Project:

Add New Project - Select Project Version Menu in Reliza Hub
Add New Project – Select Project Version Menu in Reliza Hub

Enter desired project name and select one of provided version schema templates or click Custom and then enter your own custom project version schema (again refer to Reliza Versioning GitHub Readme for details on available comopnents).

You may also enter details of your VCS repository for this project or skip this step at this time – after all it is not required for the version synchronization workflow we are discussing.

Click “Submit”. Your project is now created.

Notice that the system has automatically created “master” branch for you. If you click on it, you will see releases of this branch registered in the system (predictably, there are none at this point). Also notice, that master branch’s version pin matches project’s version schema exactly.

If you want, you may modify the version pin after clicking on the wrench icon above releases and thus expanding branch settings.

Branch View in Reliza Hub
Branch View in Reliza Hub

Now if you click on the plus-circle icon in the Branch column, you would be able to create release manually and system would auto-assign first version for you – 0.0.0 in ourcase. Every next click on the plus-circle (Add Release) would call version synchronization logic and yield next version (making sure that every version is returned only once).

However, what we really want here is to configure programmatic approach. Here is how:

First of all, we need to generate our API Key. For this, first expand the project settings menu by clicking on the wrench icon in the Project column:

Reliza Hub - Wrench Icon To Open Project Settings
Reliza Hub – Wrench Icon To Open Project Settings

Then in the project settings menu click Generate Api Key and it is best to store obtained id and key in your favorite vault solution. Note that subsequent clicking on “Generate Api Key” would re-generate the key and make the old one invalid.

Once you have the key, you may use Reliza Client docker container to obtain Version Assignment from the system. The call needs to be made as following:

docker run --rm relizaio/reliza-go-client getversion -i $your_api_key_id -k $your_api_key -b master 

Notice here that getversion keyword is a trigger to obtain version details from Reliza Hub. -i parameter stands for your api key id and -k parameter for api key itself, also -b parameter is required and denotes branch.

The tool would return next version for the branch in the json format, such as {“version”:”1.1.5″}.

It is also possible to supply optional –pin flag for Reliza Go Client, which is required for new branches or would update version pin for existing branches. In example, if we want to create new Feb2020 branch, with SemVer version pin 1.2.patch, we would issue command as:

docker run –rm relizaio/reliza-go-client getversion -i $your_api_key_id -k $your_api_key -b Feb2020 –pin 1.2.patch

More details about Reliza Go Client are provided on its GitHub page.

Summary

We covered above:

1. Simple workflow to auto-increment versions in the build pipeline using open source reliza versioning tool.

2. More advanced automated synchronization workflow using Reliza Hub Metadata Management solution for synchronization. Note that Version Synchronization is a small portion of Reliza Hub features, but discussing other functionality would be out of scope of this article.

2 thoughts today while snowboarding

Tried snowboarding for the 2nd time in my life (was not too bad relative to the 1st time 😉 )
Had those 2 thoughts in the process:
1. Mountain skiing and snowboarding are really great sports to treat OCD: if you get too much control, you can’t get speed – you stop and you fall; if you get no control whatsover – you go to fast, and again – fall. So the idea is to find that optimal balance with some control but not too many (can’t control everything after all).
2. One thing that coronovirus story should re-enforce – is that remote workforce is the only way to go in the modern world. How many time did it occur that somebody would come to work sick, and then everybody in the office would go out sick, and then cycle repeats throughout the year. That is especially bad in crammed places, like call-centers. Has anyone tried to estimate the loss of productivity due to sickness (not even mentioning other things such as quality of life or life expectancy)? It is absolutely ridiculous to force everybody to work from the same space when there is no real need for that.

Kubernetes – list all deployed images with sha256 hash

While there is official documentation how to list all kubernetes images here, it’s missing imageID field that includes sha256 hash. Sha256 digest is crucial for our use-case at Reliza, so here are working commands to list all images and all image ids:

# get all imageIDs (with sha256 hash digest)
kubectl get pods --all-namespaces -o jsonpath="{.items[*].status.containerStatuses[0].imageID}"

Notes:

  • Tried on kubernetes 1.17
  • Images returned are whitespace separated
  • Images are duplicated (one image per pod) – but should be trivial to de-duplicate

How to make microk8s work with helm 3

This is a quick note for self. When running microk8s and trying to wire helm 3 I was getting “Error: Kubernetes cluster unreachable”. Workaround I found is the following:

mkdir /etc/microk8s
microk8s.config > /etc/microk8s/microk8s.conf
export KUBECONFIG=/etc/microk8s/microk8s.conf

This block above pretty much does the trick. Obviously, for production or near production use it’s worth adding cron and adding export command to something like .bash_profile.
P.s. What helped me a lot was this discussion of a similar issue for k3s: https://github.com/rancher/k3s/issues/1126

Ford v Ferrari – best business movie since Moneyball

Finally watched Ford v Ferrari yesterday – should have done it earlier but was busy and dealing with bunch of issues. It’s a terrific movie overall, very relevant to today. Even though we’d like to see some things changing since 1960s, unfortunately it’s often not the case.

After discussing with my wife here are few thoughts (no spoilers):

  • People are not rational agents and most decisions in life, including business, are based on emotions more than on anything else. (Known mantra but a nice reminder from the movie)
  • No matter where you are, there is always a “Leo Beebe” guy around (and sometimes more than one)
  • People live compartmentalized lives – meaning that there may be a lot of drama in one organization, but a completely different type of drama in the other – and those 2 organizations would not overlap at all. Therefore, the importance of each drama lies mostly within the group involved and nobody else cares. So being inside a drama, it is then important to zoom out your prospective and make a conscious decision whether this fight is any important to you. If you’re a professional racer and have a chance at winning Le Mans – then probably it is. On the other hand, if you’re struggling in a small poorly managed company pitted against a “Leo Beebe” – then probably it is not. Life is short and the only scenario when dealing with a “Leo Beebe” makes sense is when there is something really important to you around it.

P.S. Other great movies and series I recommend can be found here.

No good way to verify public image sha256 in docker hub – security concern

This is a little crazy but apparently we don’t have a good way to verify sha256 digests of public images in docker hub.

Related thread is here: https://github.com/docker/hub-feedback/issues/1925 and also this stackoverflow is useful: https://stackoverflow.com/questions/57316115/get-manifest-of-a-public-docker-image-hosted-on-docker-hub-using-the-docker-regi .

Problems in the nutshell:

  1. Publicly displayed digests on docker hub UI do not match those seen when pulling images locally
  2. Getting public image manifests is highly problematic (very hacky work-around involved)
  3. Public image may be re-pushed with same tag -> forcing new digest -> forcing details about last image erased. How we audit this to still be good?

Potentially, all those present serious level of security concern.

DevOps, DataOps in 2020 – Tectonic Shift

2020 is a remarkable year because how the things are going in DevOps and DataOps fields. Also let me mention DataOps challenges I listed a year ago here.Christmas Tree - DevOps and DataOps in 2020

To see where we are now I remind you of DORA’s State Of DevOps 2019 report (get your copy here if you haven’t done so yet) – and there were few other similar studies, that generally outline the trend that the amount of high performing software companies is in the range of ~5-15%. And low performing companies are at ~20%. What is stunning is there exists orders of magnitude difference between low and high performers.

As software becomes the core of every modern organization, this now translates to live or death situation for businesses. And medium performers are not spared as in many aspects they are also far behind high performers. Business can only sustain if it manages to embrace high performing software practices, starting with the mindset. This is what I’d like to call “Tectonic DevOps Shift” (many people refer to this process as the 4th Industrial Revolution).

It is important to note that high performers are also “not done yet”. Meaning that there is large room for improvement for most of them as it would take some time for everybody to settle on best practices. But looks like we’re getting close to at least good understanding what those best practices are.

So here are my thoughts about what’s most important to embrace at this time and what are the immediate trends in DevOps and DataOps that I see (will be a bit technical in parts):

1. Continuous Integration (CI) must be fully containerized

Essentially, whole build process must be described in a single Dockerfile and run with a single docker build command – thanks to multi-stage docker builds – for example, here is a sample how a build-stage maven file could look. This simplifies CI scripts to bare minimum and makes moving between CI platforms trivial.

2. Continuous Delivery (CD) must be fully containerized as well 

This is similar to previous point. However – I moved this to a separate point, since this could be done later (or not at all for older projects nearing retirement). Essentially, if your CI is containerized, you could still build your artifacts inside containers and then extract to legacy non-containerized environments.

3. Container orchestration as a key DevOps platform

Widespread adoption of Kubernetes, wide popularity of Docker Compose for POCs and demos and use of Docker Swarm for smaller projects allows for quick switching between various hosting options and cloud providers once containerization is fully adopted. Arguably, this field has still a lot of room to grow due to complexity or existing tools and frequently messy code and yaml mix – but the trend stays and this should be sorted out eventually.

4. Security is bigger than ever

Focus on cyber-security is very strong. DevSecOps field is a huge part of DevOps these days. To me particularly the main pain point is still how we move secrets to environments and manage those secrets. A lot of improvement has been made recently – with tools such as HashiCorp Vault and others, but still there is visible room to improvement.

On the other side, with Quantum Computing on the horizon there is an emerging demand to switch to Quantum-safe encryption. Most people understandably prefer to ignore this challenge for now, but it is a huge growing problem.

5. Templating 

Standard Dockerfiles, standard terraform scripts, standard cookbooks, playbooks – templates make it faster and easier to solve typical problems quickly. Several organizations are working to make this easier and organize already, but there is still room to grow.

6. DataOps – Data is like software but more fluid

Main requirement for DataOps projects is additional level of agility on top of traditional DevOps pipelining. Essentially, idea above about project build process living in a single Dockerfile servers that purpose of agility. As data is fluid there should be expectation of updating pipelines quickly and efficiently.

In this sense I hear many consultants saying that it takes large effort to build pipelines for the sake of ease of use and savings in the end. To me that doesn’t cut it any more – namely it’s missing a step of building pipelines themselves for agility. Meaning if you need to change pipeline later, that you should be able to do quickly. Achieving that level of excellence brings you closer to what successful DataOps practice needs.

7. Embracing adult organizations – moving to remote asynchronous workforce

 I have a strong belief that elementary school-like organizations, which believe that everybody must be in the same office to perform cannot really win the market if given strong adult competitors. In this sense adult organizations need to build trust among its people so they can perform asynchronously and remotely, which leads to huge advantages in a lot of sense. This process undergoes some degree of learning curve but in my experience remote workforce is already able to perform better in many instances.

8. Breaking silos by embracing transparency and clear communication lines 

This refers to the previous point of adult organizations – if we already have adult organization, this now allows for transparency and greater level of responsibility and commitment from every member of such organization. This finally leads for developers caring about end-user experience and not only about binary marking of ticket as “fixed”. I highly recommend “CEOs Should Tell It Like It Is” post by Ben Horowitz to further describe this idea.

9. Embracing DevOps and DataOps as core

Since the terms like Agile, DevOps, DataOps are so broad, some organizations trying to get better in releasing software and in business in general are lost when it comes what to do and where to start. There are two extremes that I see: 1st – everything we do must be Agile – which sometimes leads to comic situation where people lose common sense in the process. 2nd extreme – DevOps is something with infrastructure, CICD, security (essentially, technical stuff) – and it’s not our core, so we better hire consultants to do all that and focus on our core.

Both of those extremes usually don’t work well. My take – always start with common sense, DevOps starts with mindset – and mindset is the core of any business. Good consultants are good to quickly bring you up to speed and show you best practices and state of the art, but they can’t run the business for you.

At the same time, it is ok to outsource certain technical aspects (i.e., infrastructure monitoring and parts of incident management), as long as core thinking about how everything is wired together stays in-house. Also, my advice – is if you don’t know where to start, start with lead times measuring time between feature being dev complete and deployed to production. If you have instances where such time is measured in months, your organization is in big need of change.

Summary thoughts

In summary, I’d like to point that DevOps is half culture and half technical stuff. And that is roughly about how my points above got split. I’d also say that culture is arguably more important – since the right culture leads to the right technical decisions eventually, but the opposite is not always true. So always think about culture and embrace that DevOps is at the core of any software organization – which is essentially any organization out there these days 😉