GitOps for Ephemerals is a Mess

I was working with Ephemeral instances for quite some time now. At this point I believe a different solution rather than pure GitOps is needed for them to operate efficiently. I’ll give some explanation below.

Table of Contents

I What are Ephemerals?

Currently in DevOps / SRE, we define 2 types of Deployment instances: Persistent Instances and Ephemerals.

Persistent Instances are always available, they are traditionally used for progressive software delivery, i.e. these could be Test, UAT, Staging, Production.

On the other hand, Ephemerals only exist for a short period. For example, they may be used for a major testing effort, where more test environments are needed. Or we may want to spin an Ephemeral to test specific Pull Request. Or we may want to quickly check how particular functionality operated on an older version of our application – again an Ephemeral could be the most convenient way.

Usage of Ephemerals presents significant time and cost savings for organizations. They are quick to spin and they have short life span – so take only a fraction of the cost of Persistent Instances.

II What is GitOps?

GitOps is a concept of putting all infrastructure operations in code. It mandates that every infrastructure element must be sourced from code. For more details, I recommend this post by Dmitri Lerko.

III GitOps and Ephemerals

With all the above, to practice GitOps in relation to Ephemerals, every Ephemeral must be defined in code. Elaborating further, every lifecycle detail for each Ephemeral becomes an operation on code.

I see 3 main problems with this approach.

IV Problem 1: Bundling

Modern GitOps architectures usually assume single Git repository for deployment (the GitOps repository) and one or more other repositories for the actual software code. Further, those other repositories would frequently contain certain number of microservices.

Bundling these microservices is fundamentally complicated due to combinatorial explosion of versions. On top of the native bundling complexities, the GitOps repository should also somehow be notified of these new versions coming in.

In the case of only using Persistent Instances this is feasible and may be even done manually – as there are only a handful of such instances. However, for Ephemerals there is no clear way to stream changes from the software code repositories into the GitOps parent. In practice, complex CI/CD pipelines emerge – with triggers and links – and complex pipelines are usually those that do not work properly.

V Problem 2: Scale

Assume, we have many Developers and Business people who want Ephemerals. On the other hand, we have only a single GitOps parent repository to accommodate them all. This means we would be getting a lot of commits into that repository.

Traditionally, in TBD to handle commits on a scale we should use Pull Request pattern. However, this now means that a launch of an Ephemeral depends on a Pull Request approval – which is not something that makes sense. After all, we want an Ephemeral to be provisioned quickly and efficiently.

VI Problem 3: Access Control

Developers and Business People need some sort of write access to the big GitOps parent to launch Ephemerals.

Generally speaking, this a big no-no from the security standpoint as it goes against the principle of Separation of Concerns.

VII Workarounds

Following workarounds that I know of are used in practice:

Lock the main branch of GitOps repo and let them write Pull Requests – as described above, this puts load on the Ops team for approvals, and still introduces security risk of unwanted changes if an approval misses something – can be further mitigated by only giving access for certain directories, but becomes even messier.
Create a separate repository or repositories for Ephemerals – at this point there is no single point of truth between the main repository and the one for Ephemerals. May be somewhat mitigated by introducing a link between all those repositories – which however now hits the previous security point – as the access is not isolated any more.
Create a restricted interface for Developers to communicate with the GitOps repository – essentially, this is the approach that currently makes sense to me. I will elaborate on my current thoughts in a separate point below.

VIII Restricted Interface

It seems to me that the direction where the industry is moving is towards creating these restricted interfaces for non-Ops stakeholders to operate Deployment instances.

I believe that this approach if done correctly, at some point, would check all the boxes: bundling, scalability and access control.

At issue here – and I would like to re-iterate the title of this post – is that this is not a pure GitOps any more. It is more like using Git as a database for some other Restricted Interface service that is now doing the work.

That is because, essentially, no non-admin human is now able to commit to GitOps repository anything directly. In other words, committing directly to GitOps repository is becoming like going through SSH to do something on the Production instance (something that everyone recommends to avoid but Ops people still have to do at times).

From here we can take this a step forward and think whether Git is actually most appropriate database for such use-case – but I will leave this conversation for another day.