Wrote a Medium article how we are using Reliza Hub with GitHub Actions and ArgoCD

This is about using Reliza Hub for Kubernetes CICD pipeline. Key problem we are trying to help with is managing different versions and permutations of versions of microservices.

The article can be found here: https://medium.com/@taleodor/building-kubernetes-cicd-pipeline-with-github-actions-argocd-and-reliza-hub-e7120b9be870

Take The Fear Out Of Git Push

(Git) Push To Reset The World
Photo by Tom Grünbauer on Unsplash

Remember “The Lean Startup” where they asked engineers to push a change to production on their first day of work? Then the quote to justify it was: “If our production process is so fragile that you can break it on your very first day of work, shame on us for making it so easy to do so”.

“The Lean Startup” was published almost 10 years ago, in 2011. Unfortunately, this simple idea described above still does not resonate with many organizations. On the contrary, putting more pressure on developers committing and pushing code is quite common.

In my experience, the #1 reason for the loss of Software Development productivity is this exact fear to push a change to the Version Control Repository.

Bureaucracy Always Wants To Slow Down Git Push

Developers are frequently told all sorts of things before they push changes: that they need to branch first, or they need to merge to several branches, or they need pull requests first, or they must test locally first, or they need peer reviews, and on and on it goes. Frequently it turns into a strict ever-changing checklist of what a developer needs to do before pushing a simple change. Organization bureaucracy is thus trying to protect itself from negative consequences of botched releases.

But the truth is that if developers are afraid of pushing changes all sorts of bad things will naturally happen:

  1. Developers would hoard changes to push at the very last moment, creating a snowball effect, increasing the batch size and making it very hard to make sense of what is pushed.
  2. Conflicts are now everywhere, which leads to proliferation of merges, which leads to lots of bugs and yet makes already large snowball of changes even bigger.
  3. Every process is synchronized, and everybody spends a lot of time waiting for something – for approvals, for branches being cut, for changes being pushed, for meetings being held – which results in a plain downtime for a lot of people across the organization, while they sit waiting or redoing their work due to merge-conflicts.
  4. Finally, since pushes happen at the last possible moment, deadlines will be missed, releases truly will be botched and the customers won’t be happy.

Making It Right

So as DevOps / GoalOps people, what can we do to make it right? First and foremost, let developers push their changes whenever they feel changes are ready. For this to work correctly, CI process on push must become asynchronous.

This means that the CI process itself must be decoupled from downstream approvals and deployments. With such approach, CI is guaranteed to run on every push and provide needed level of feedback to developers. At the same time untested and unapproved CI artifacts should not clog any downstream CI/CD components. Meaning that developers may do their work freely, while downstream teams may do their work freely on other artifacts at the very same time. Sure enough, they need to communicate, but the process must prevent teams from blocking each other.

A reasonable developer or a development team getting continuous feedback from such CI process would gradually improve and over time balance the amount of code and testing needed for builds to roll accurately. This is what we call asynchronous process, where each team receives needed feedback quickly while at the same time not clogging processes for other teams.

Surely, we need the right tooling to make this work. Having one-fits-all pipeline in Jenkins would unlikely support asynchronous workflow. Instead, another approach would be more suitable: asynchronous CI goes first, then asynchronous Release Assembly, and then asynchronous CD – an approach of this kind we are building with Reliza Hub. The key is decoupling release phases and removing hard dependencies between teams for the sake of faster feedback.

Summary

If we can remove fear from git push – this is one thing that can increase Software Development productivity dramatically. Getting this right is also an indicator of maturity of GoalOps culture. Both technical and business people will appreciate that.

From DevOps to GoalOps

By now you are probably tired of all the different ways how we may call Software Operations. ITIL, Agile, DevOps, DevSecOps – and then quite a few more recent ones like NoOps, AllOps, AIOps, MLOps, GitOps, DataOps… I know I missed a few.

So why on earth we need “GoalOps”?

Surprisingly, modern software development deviated quite a bit from the spirit of “The Goal” written by Eliyahu M. Goldratt. Yes, we have whole DevOps movement in technology popularized by Gene Kim’s “The Phoenix Project” and most recently “The Unicorn Project”. And on the other side – business side – we have Agile and its various methodologies.

However, it turned out that DevOps appeared to be too geeky, while Agile appeared to look too much like some marketing trick. Key problem here is the huge rift between technical people and business people – the fact that they don’t understand each. To they point they tend to use different terms for the same thing.

I recently heard quite a bit from product marketing people that they don’t really understand their tech teams, and it is like they go in different directions. I frequently heard even stronger words towards “business people” coming from the tech teams. As a result we get poor products and broken organizations.

The time has now come to do things differently. It is time to show to both business and technical people that they are part of the same organization and they need to work towards a common Goal.

For this to happen – their optics must be aligned, their culture must be aligned, their perception of success must be aligned and The Common Goal must be defined clearly.

That is why we need GoalOps – a process of business and technology parts of organization working together towards The Common Goal. A new prospective that can finally be understood and agreed by both sides.

Software development, Robotics, HealthTech, AI – all those are just too important to let the old rift between Technology and Business continue. That is why starting from the Goal, then applying it to modern realities and joining forces across whole organization is so important. That is why GoalOps is the future.

Approvals for any CI CD in Reliza Hub – Demo Video

As we are working with Reliza Hub to make it useful to both Technical and Business teams, we have just launched a new feature: approvals for any CI/CD DevOps or DevSecOps pipeline out there. Here is a demo video that I recorded.

Would very much appreciate any feedback on YouTube:

Reliza Hub Approvals For Any CI CD pipeline – Feature Demo

3 Problems With GitOps

GitOps commits
Photo by Yancy Min on Unsplash

Even though this article is about issues experienced with GitOps, I must start by saying that GitOps is certainly a big improvement relative to CIOps. If you’re not very familiar with GitOps, a lot of key details can be found in this post by Dmitri Lerko.

Let’s now jump strait into issues:

1. Default GitOps pattern is not fully auditable

Auditability is frequently named as one of the key advantages of GitOps. However contrary to popular belief, default GitOps process is not fully auditable. Key reason for that is force push. Force push essentially allows to remove any unwanted blocks of commits from Git history on the central repository.

To solve this issue, many organizations disable force push (either for the whole repository or for specific branches). However, for some auditing requirements this still may be not good enough. Since force push is a Git feature and may be turned back on at any time.

This leads to 2 more options to ensure proper auditability:

a) Frequent signed and dated back-ups of Git central repository. Yet, for the purposes of GitOps specifically, when several deployments may happen within seconds, the interval of such back-ups would not be enough to cover pace of changes.

b) Using production logs of Git central repository server as audit trail and sign, date, encrypt and stream those logs continuously to another location. While this is a fine solution, parsing these logs would be the complicated part.

Bottom line – it is possible to build fully auditable logic around GitOps, but it is certainly not there from day one. Let me repeat here that auditability requirements may differ. Particularly, what basic GitOps implementation offers would be perfectly reasonable for many organizations. But there are other organizations who need all those additional workarounds.

2. Business Approvals

Typical approval model used in GitOps is via Pull Requests (PRs). Developers make changes, create PR, then approver may accept such PR, which would be recorded and change would be deployed. Such PR-based approval model is based on a premise that everybody already uses Git and knows what PR is.

All this is great to the point when we need business approvals in the mix. Business people may very well not know Git, not know what PR is and not understand what is happening.

So if we have business approvals we have to go back and have to build a process over GitOps process, which shoots our lead times through the roof.

3. Versioning and Configuration Management

I recently blogged about the underlying issue of versioning in microservices. This issue is very much in a full bloom when we talk about GitOps.

Key problem with GitOps is that we need to supply versions of components into our YAML files under Git version control – and those versions would then be picked up by our CD system of choice (such as Flux or Argo).

If an organization allows to always pick latest version of every component and then resolve differences on production – this is not a real problem for them. But for many organizations this would not be an option. They need to be strict about what changes went in and what production version they have at what point (this partly reflects the business approval problem above).

Feature flags may help only somewhat here, since there must be a high level of tolerance to the fact that each feature flag implementation may actually contain breaking changes too.

Now, if our organization is strict about its versioning policy and may not use latest and greatest all the time, it becomes very hard to keep track of all the possible permutations of components. If we are doing GitOps, the question now becomes “how we assign components to environments?” Or in other words, “how we know what versions of what components we should actually commit to Git?”

Speaking about configuration management, we are facing issue that is very similar to versioning. If we need to deal with configuration changes, we need to know which ones we may safely submit via our GitOps tool of choice. And keeping track of those changes is very much non-obvious.

Summary

GitOps is a great approach and earned a right to be seen as a big improvement over previous generation of CIOps. However, it does not solve all the questions in the world of Continuous Delivery and there are at least 3 significant issues still present with GitOps. Namely: Auditability, Business Approvals and Versioning and Configuration Management.

Lego World – No One DevOps Solution Fits All

DevOps Software Lego World
Photo by James Pond on Unsplash

I was entertaining the idea of Lego-style DevOps software for quite a while now. I wanted to write about this pretty much after success of my previous post about microservices, but then the whole situation with the virus started to unfold – which led me to a bit of paralysis. Today I’m trying to break out of this paralysis and write this down as a sort of return to normalcy 😉

Let me start with software in general and then move to DevOps specifically. This topic is really general, but as I’m building a startup in the DevOps field, I choose DevOps as being “close to home”.

The key narrative here is that gradually, since the start of the computer era, it progressively becomes easier to install a piece of software. For end users, app stores were the culmination of how easy things are – you click the install button in the store, and voila – you now how the software up and running. How cool is that!

But there is always a gap between end users and businesses in how we deal with IT and software.

For businesses – small, medium, enterprises – we still have this old IT Ops narrative that things are complex and therefore must be hard. Fortunately, even with all that resistance things get easier on different fronts with tools like Apt, Ruby Gems, npm, pre-built AMIs from vendors, and most recently – you guessed it – containers, compose files and helm charts. I’m absolutely positive the puck doesn’t stop where we are now – as many things still have a large room to grow and API integration is still a major pain point.

However, the trend is clearly to install software that you need when you need it, quickly and easily. Businesses are made of people – and those people want the ease of an application store to suit their needs just as all other people. Who wants to be stuck in a 10-year old tooling just because it costed a fortune back then? Yet questions like this one on StackExchange make me doubt whether IT departments actually get it.

So, as a business professional, I should have access to the tooling that suits my job best as fast as possible (ideally – immediately). Does this sound right to you too?

Now, let’s transition this thinking to the DevOps world. If I like EC2 on AWS but I’m positive that Azure Container Registry is a better product than ECR, I shouldn’t have an issue to mix-and-match the two. If I want to use GitHub for code storage but CircleCI for integration – that’s perfectly fine. These are examples of tools that are relatively easy to mix-and-match.

However, there is also an opposite direction in the market – namely to try to lock people down in a DevOps vertical and thus create a moat around all your products.

What is a DevOps vertical? It means you simply have your VCS Repository, your CI, your artifact storage, your CD, your infrastructure management and your actual cloud – all from the same vendor! How does that sound?

This certainly sounds great from the prospective of that vendor. But as a user, I don’t want to be forced into ECR just because I’m using ECS – if there is a better product on the marketing. Luckily, it works fine in ECR / ECS pair – I can mix and match and replace ECR with say Docker Hub easily.

But other vertical solutions are trying to actually lock users in pairs of products that could not be simply mixed-and-matched with alternatives. Even though alternatives exist, they are not connected or not compatible for subtle reasons. I.e., try to assign your 2nd level domain name to AWS Load Balancer without using Route 53 for DNS. Possible, but very difficult (because AWS only gives you cname for load balancer and not actual IP addresses, and you can’t add cname record to 2nd level domain).

Now, if I were to switch at this moment to a lock-in solution, what would happen in the future? Does it now mean I’m fully on the mercy of the vendor for the time being? Can they raise prices any time they want like Google just did with GKE? Plus, as their vertical grows and they add more components to it, can it get worse indefinitely? Does it remind you of “old” Microsoft?

My answer to all that is always plan for a switch to another product. When choosing a product, consider ease of transition in and out of this product as one of the key priorities. Stay away from those that force you on the same vertical. Not only because of today’s considerations but also because of the future and vendor’s philosophy. It’s a Lego-world. Moat of a product should be the ability for it to inter-operate with other software, not a lock placed on its users making them unable to get out.

Simple Card Shuffler For Mafia Game

Due to self-isolation we switched from weekly playing mafia (werewolf) game offline to online (via zoom). But we needed a card shuffle mechanics, so I wrote this one: https://mafia.brolia.com over weekend.

Source code on GitHub:
Back-end – https://github.com/taleodor/mafia-express
UI – https://github.com/taleodor/mafia-vue

Note that this assumes classic rules, namely only 4 roles: villager, mafia, godfather, sherrif. Sample rules could be found here.

Get sha256 hash on a directory

Update (2020-03-06): Following this conversation on reddit with issue raised by u/atoponce I updated the result to include file renames and moves and added LC_ALL=C section.

Today I started building a new use case for Reliza Hub where we would match file system digest of the deployed directory to what we have in our metadata. We do such matching via sha256 hashes.

Previously we were mostly covering docker images or archive files where digest extraction was trivial. But this time around it’s a file system and sha256sum utility in linux does not have built-in option to compute digest on directory.

I first encountered this problem some time ago when we were building Reliza Hub Playground and corresponding monorepo sample repository. Use case was to integrate this command into GitHub Actions CI script so it would create releases of sub-projects in monorepo only if those projects actually changed.

To do so at CI run, script would call Reliza Hub to check if this sha256 was already registered, and only if it was not – then we would create a new release. So to get sha256 on directory back then I just did a quick DuckDuckGo search which brought to this superuser post and this askubuntu post. Switching to sha256sum from md5sum and sha1sum brought me to:

find /path/to/dir/ -type f -exec sha256sum {} \; | sha256sum

And this is what I initially used for Reliza Hub Playground Helper project. And this worked perfectly.

However, now when I started step 2 of the same workflow – where we promote same file system to the instance and need to match it from the instance to sha256 recorded on Reliza Hub side, I realized with disappointment that sha256 digest now suddenly doesn’t match when I executed the command above on the target instance. Another words, in the CI build and in the target instance I got 2 different sha256 values on the same git code base.

Why? After quick debugging I realized that first find command included file paths, and those were unsurprisingly different. To deal with that I used awk and left only digests of files, then calling sha256sum as following:

find /path/to/dir/ -type f -exec sha256sum {} \; | awk '{print $1}' | sha256sum

Looks good so far – but it still did not match! Another round of debugging – and I realized that on different machines find command would return files in different sorting order.

My next try to fix this was based on this superuser post trying to sort by date. But it quickly turned out that since we were using git clone frequently, dates on files were not matching either and sorting order was not universal.

Next idea I came with was to try to change sort to use file names and dictionary sort instead of dates. Surprisingly, that was also inconsistent across different Linux boxes (slightly, but enough to not get digests right). After further research, LC_ALL=C comes to the rescue here – as I further discovered.

So in the end after couple of hours overall, I came up with the following solution:

  1. Find all files in the directory and subdirectories using find -f
  2. Execute sha256sum on these files to get digests
  3. Use awk to only take digests from the previous command
  4. Sort those digests in the alphabetic order
  5. Only now compute final sha256sum on sorted digests

This worked and finally provided me with universal way to compute sha256 hash on directory across different platforms. Here is same thing in code:

find /path/to/dir/ -type f -exec sha256sum {} \; |  awk '{print $1}' | sort -d | sha256sum | cut -d ' ' -f 1

Happy, I posted this on r/bash but as mentioned above luckily u/atoponce correctly pointed out that this solution would ignore file renames or moves within repository. He suggested great solution that is:

dir=<mydir>; (find "$dir" -type f -exec sha256sum {} +; find "$dir" -type d) | LC_ALL=C sort | sha256sum

That is great, but still we have an issue of absolute versus relative paths and digests computed differently based on those. I.e., dir=/home/myuser/path/to and cd /home/myuser && dir=path/to produce different sha256 hashes. To solve this I decided to use sed with regex following this stackoverflow. And the final-final solution I have at the moment is:

dir=<mydir>; find "$dir" -type f -exec sha256sum {} \; | sed "s~$dir~~g" | LC_ALL=C sort -d | sha256sum

That’s it! I also published details about the actual use case on medium.