Lately, after a webinar I was involved in a discussion with some developers, to use Spring Boot or Quarkus.
They all accepted Spring Boot is at the moment more rich / mature framework, with integration to nearly every imaginable framework and yes, this is it’s biggest strength and also the biggest weakness. I understand why people would prefer Quarkus, it is neat and efficient. if you want a comparison it is like Maven and Gradle. Maven being the Spring Boot and Gradle being the Quarkus, Maven is too heavy compared to Gradle, but more people has an expertise on it, there are more tools for it at moments, Gradle while I would say being % 30-40 more efficient then Maven, people are slow to adapt while they are not familiar with it and afraid that they would not find every tool they need for it.
As I mentioned while they are agreeing programming with Spring Boot at the moment is easier, people prefer Quarkus mainly reasoning over Quarkus advantages at startup times and slight advantages during runtime.Startup speed is really important point for Kubernetes / Cloud Development, while we say if our application can run with 2 instances in off-peak hours but needs 20 instances in peak hours, we should be able scale up quickly, if our applications needs 30s to startup this is far from ideal, at the the end of the day because of this startup delays, why should we pay constantly extra 18 instances.
As following post confirms it, it seems in every imaginable category Quarkus is in front (in some slightly but still in front) so Quarkus should be our choice isn’t it.
Well the test in the above post are made out of the box versions of the Spring Boot and Quarkus, for Spring Boot there are some small tricks that you can do to speed up the startup (as mentioned before biggest strength of Spring Boot is, it’s integration with every possible framework that you can think of, but it is also it’s Achilles heel) and most of these integrations are not used used here so they are unnecessary. One of the biggest optimisation that we can do, is turning off the integration of the frameworks that we don’t need.
Secondly native performance of the Spring Boot is evaluated in this article with Spring Boot 2.7.x which was in BETA phase in this version, with Spring Boot 3.x native image is officially supported and it seems that Spring team invested a lot of time to optimise this feature, we will also compare Spring Boot 3.1.0 native performance.
To test this theories, I created the ‘Getting Started‘ application of Quarkus and create a similar one with Spring Boot, a REST service that just return some string, this is how the results looks like (unfortunately WordPress reduces the resolution of the images so if you like to see higher resolutions please click the images).
Spring Boot 3.1.0 application with JRE
Quarkus with JRE
As you can see Spring Boot in JRE need 0.783s to start and Quarkus 0.529s, so even with my optimisations, Spring Boot startup 250ms slower, if you are planning to use Spring Boot or Quarkus in JRE, you should decide yourself depending on that 250 ms is enough to give up all integration capabilities of the Spring Boot (Spring Boot gives two start time durations, one for application and one for process but Quarkus only one, so I am not sure what we are comparing here, if you like to take process time then Spring Boot needs 1s to start).
Now lets look, how the things are looking for native images.
Spring Boot 3.1.0 Native Image.
Quarkus Native Image
Now Spring Boot native image needs 0.018s to start (or 0.028 if you like) the process time, it seems Spring Boot did quantum leaps with 3.x version for the native images.
Again if you need the absolute performance (-0.002 or +0.008 difference) then choose Quarkus but I am asking myself for gaining that 0.008 for production runtime environment, how much speed I will loose in development with all missing Spring’s integrations with all the frameworks that you can think of.
Don’t get me wrong, Spring has 10+ years history and that is the reason of that amount of integration, compared to Quarkus young history, I am sure they will catch up in time (and to be fair Quarkus Reactive REST development is more streamlined then Spring Boot which have to carry lots of old baggages) and in the future, they will have more argument then startup speed but personally I will stick to Spring Boot unless somebody tells me, we have to go for absolute performance.
To be fair, my test methodology is rather rather primitive, compared to the previous article but I just want to show here, for JRE there are ways to optimise Spring Boot startup up time compared to the out of box configurations and for native images, I guess we can fairly say, both are at same performance levels.
There are several reason why Spring might never close the gap in the JRE development.
Quarkus compared to Spring, does no component scanning during runtime to identify the dependency injections from dependencies libraries / modules, it deals with them with extensions like Jandex (follow the white rabbit) which creates index files to identify the potential injectable beans during compile time as explained here.
Quarkus does not use the similar mechanism like Spring JDK Dynamic Proxying or CGLIB Proxying (follow the white rabbit) which are realised on runtime, Quarkus mechanism happens in compile time to save time at startup.
Unless Spring will develop some counter solutions, these factors will prevent equal performance in JRE but the sacrifice you have to do to get these benefits, will be your daily development gets complicated. Yes you will gain speed while starting your application but depending the experience of your development team with Quarkus, there will be some pain attached to your development speed.
And here are my Spring Boot optimisations if anybody needs it. This information also might amaze or horify you, seeing how much work Spring is doing for us behind the scene, keep in mind that these settings will have more effect in JRE mode compared native image adding some more performance on top of the Spring AOT.
In this blog I would try to demonstrate how an ideal Continuous Integration / Continuous Deployment with GitOps (https://opengitops.dev/) to Kubernetes should look like using Github Actions, Gradle, Docker, JiB, Helm, Helmfile, Terraform and ArgoCD for Services applying the principles of Twelve Factor App. What I here demonstrate is based on my experiences that I collected from the previous Kubernetes projects that I was involved and compiling information from the lots of documentation that exists in the internets best practices about this subject.
During my career, I saw too many projects which didn’t invest enough to their deployment pipelines during the startup with the assumption that they will comeback later and fix it, which as you know does not happen most of the time, you will be always under pressure to develop more Business Features but you would not really get any chance to make your house cleaning and pay for your Technical Debts, so it is better to start in correct way then compromising.
The solution that I would explain here, I didn’t invented myself, it was out there as puzzle parts from different sources but it was never fully explained / demonstrated as a complete solution, so my purpose here is to give a blueprint that you can adapt your pipelines / workflow with minimal changes for startup projects or better fix your existing ones.
You can find my proposal here, an overkill for small projects (you can streamline the process explained here for you needs off course) and ignore my warnings here but a word of caution here, if you just start with one Service (while it is a startup project or you don’t think you will reach the complexity levels that it would require what I will explain here), please remember, it would be very costly later on change your workflows to adapt these ideas. Even you are starting small, these ideas would be useful to you in long run and as you will see in further chapters, what we will implement here would be reusable in other projects of yours with minimal costs.
Let me give you short summary of what you expect to see in this blog
Pipelines
I would explain to you why it would be good idea to separate our DevOps pipelines as Service and Environment Pipelines. I will also show to you how to build those with this help of the GitHub Actions in this blog but in the future ones with MS Azure DevOps Pipeline, Gitlab, AWS Code Pipeline, CI / CD with Google Cloud.
You can see
Azure DevOps Pipelines implementation of these concepts in this follow up blog.
Gitlab Pipelines in Google Cloud implementation of these concepts in this follow up blog
After implementing the same scenarios with Azure DevOps Pipelines, Gitlab Pipelines, I should say that my favourite is Github Actions, if you can’t use ‘github.com’ because of company policies, I really advice you to check self-hosted / on premise options of the Github Enterprise Server.
Gitversion
I would also introduce to an every important tool for your GitFlow and GitHubFlow projects, which will solve all of your versioning problems.
How to install / configure it to several different Pipeline Environments like Github Actions / MS Azure / Gitlab / etc…
And how to operate Gitversion for the day to day tasks, like how to deal with feature branches, hotfixes, release processes, etc..
Terraform
I always found, specially for development, test and staging environment running on idle when nobody using them problematic. Let’s say people are active in staging environment one month every year, so you are practically paying the 11 month unnecessarily, so why not tear down the staging environment when it is not needed and build again in 5 minutes again with Terraform when needed. Or take this one step further, most probably your working for force is working between 07:00 – 18:00 o’clock time slot, so why not tear down test environment up 19:00 o’clock and recreate it with Terraform and ArgoCD at 06:00 o’clock in the morning again. These are the points you will achieve real cost saving in Kubernetes at different Cloud Providers. For an how to please look into Terraform chapter.
ArgoCD
Final piece of the puzzle, ArgoCD an automation tool to realise true GitOps, I will show you how your Kubernetes Manifests would be taken from GiT and automatically deployed to a Kubernetes Cluster. I will also explain basic concepts, how to configure and operate it.
The Plan
To be able to demonstrate these concepts here, we would require an actual projects in GiT and for that I would use another Proof of Concept study of mine, which I explained in this blog. Original blog has a really naive Deployment Strategy, while it main focus was to demonstrate the integration of several technologies for an Event Sourcing application with Apache Kafka, Apache Cassandra and Elasticsearch. This blog would show you how to go from that naive approach to a full fledge solution.
To convince you that these ideas works, we will demonstrates those in on Google Cloud Platform (GCP)’s Free Test account for Google Cloud.
For starters let’s see our battle plan.
PS. WordPress is unfortunately reducing the quality of the images, please click the images to see high resolutions of those.
Now let’s look to the diagram above, you should have to have two quick takes from it, we would have two pipelines, one is Service Pipeline and one Environment Pipeline. It makes sense to have these distinctions, a Service Pipeline would most probably trigger much more frequently then an Environment Pipeline, while we will have much more Software changes then the Environment changes.
Service Pipeline would be responsible for building Executables, Docker Images and upload those to Docker and Helm repositories.
Environment Pipeline would be responsible to maintain and deliver Helm Charts for infrastructure (in this case for Kafka, Cassandra, Elasticsearch) that are necessary to initialise Environments for Development (Feature / Bugfix / Integration / Release ), Test, Staging, Production environments.
You can also see from the above Diagram that Development, Test, Staging and Production environments served via ArgoCD from separate GiT Repositories, the main reasons for this solution are the Security and Quality concerns for GitOps, it is better to separate those.
Your software should reach a maturity level in Development Environment Git Repository first, to able to promote to Testing Environment Git Repository, where much less people would have the right to change anything in the Test Environment Git Repository, so the chances of unwanted things happening will reduce and hence the Quality will increase because this will prevent the things that can cause problems in Development Environments, to cause any havoc in Test Environment.
Same ideas applies to the Staging Environments Repository, even less people can modify those and for Production even fewer, in an fully automated GitOps process. These sort of precautions can save you lots of head aches later on.
At this point, I am seeing you scratching your head, thinking would not be better to have a branch per environment, that is an actually an Anti Pattern as the great Martin Fowler explains. We don’t want an environment version of our services to be a different commit, we want our service transferred as same binary / version between the all environments, that is the reason we will follow a Git Repository per Environment, after all there is a reason why you don’t see a ‘environment’ branch in GitFlow.
But let me be clear about one thing here, when we are promoting, for ex, Test Environment to Staging Environment, we only mean the transferring configuration information of our software, not the source code. When your Services built with automated workflows, their Docker Images and Helm Charts would be deployed to the Docker / Helm Repositories. We will only transfer between Environment Git Repositories which version of these Docker Images, Helm Charts should be deployed and their configuration informations (like which database they are connecting, how many instances should be up and so), that means promoting the version numbers of Docker Images, Helm Charts and connection strings, etc and nothing more.
Naturally our story will start from the Services, we will follow the approach of a GiT Repository pro Service, my PoC application will use following GiT Repositories..
Outside of ‘Four Eyes FSM Pekko Application‘ (which is a full fledged Event Sourcing application with Akka / Pekko, Kafka, Cassandra, Elasticsearch) and ‘Customer Relationship Adapter Service’ (which will be using Spring Boot 3.0 ‘native-image‘ feature with GraalVM (which you can see more in detail in Appendix) are boring Spring Boot Applications (only some primitive REST Service implementation) for simulating partner systems for ‘Four Eyes FSM Pekko’ while our focus lies in deployment features and not the software technologies.
To represent our whole software System, we will have Helm Umbrella Chart containing all of our Services (App of Apps Pattern in some sources), this would be our main deployment unit.
You will also see that we will have a clear distinction between the Infrastructure Deployments and Service Deployments. It is true that we can place the Infrastructure components like Kafka, Cassandra, Elasticsearch, etc, to same Helm Umbrella Chart but considering there will be a lot more change / commits in GiT about our Services then our Infrastructure, it does not make too much sense for every commit in a Service Repository, also to deploy the infrastructure components. We will have another Infrastructure Deployment pipeline and an Helm Umbrella Chart for Infrastructure components which would be really important also for initalizing new Environment in k8s Dev Cluster with our Github Action Workflows on the fly for Feature / Bugfix / Integration / Release branches.
A word of caution here, if you are bringing your 20 years old project into the Kubernetes for the hopes of cost savings, bad news is, you probably would not save too much money in production, you would probably need same amount of CPU / Memory Storage resources, real cost saving potentials lies for Development (including Feature / Bugfix / Integration / Release), Test and Staging Environments. With the solutions presented in this Blog, you can turn these environments off when you don’t need them instead of let those idling and costing you (and you are paying those for nothing). With the Infrastructure / Service Deployment Pipelines that I will demonstrate, you can start these environment inside of a 5 minutes, instead of those are idling for months ( while you are still paying CPU / Memory / Storage resources). In the Appendix section, you can find a demonstration about how to create a Kubernetes Test Environment during your office hours, by creating and destroying those with the help of the Terraform configurations. After all, if between 18:00 o’clock and 6:00 o’clock, nobody is testing your application, why should you pay for the resources.
So now back to main topic.
One final point here, once we have our Helm Umbrella Chart, actually we can directly install from this Chart all our system with Helm commands but I will follow another pattern, while we want to do GitOps, deployment of a software should be that auditable / manageable over GiT (We should be track who installed what, when and why, so in the case of an emergency, we can rollback those changes). To achieve this audibility / manageability we will use other awesome tools called Helmfile / ArgoCD.
The promise of this blog would be that you would able to deploy your infrastructure
and your services with the help of these pipelines.
Now let’s look to our Service und Infrastructure Pipelines.
Service Pipelines:
JiB / Docker
Now, what is really interesting about the Service Projects, is in their Gradle configurations ‘build.gradle‘ file. Normally people who code pipeline configurations, does the Docker Image generation and Helm Chart deployments in these pipelines. I am not big fan of this because if a developer wants to test a quick bugfix (2 lines of code change) in its ‘minikube’ (we can use same Helm Umbrella Charts Services and Infrastructure to create local environments), he / she should not wait for the whole turn-around of build pipeline (lets say 10 minutes).
I prefer that Developer should be able to create Docker Image locally, for this Google has awesome tool called JiB (you can look here for full configuration options). JiB can create Docker Images without a Docker demon and is also smart enough to optimise the Docker Image layering, so for example, to place concrete dependencies (which have a lower chance of changing) to a layer, SNAPSHOT dependencies to a layer and Application Code to another layer, so if they don’t change , let’s say concrete dependencies (like you Spring Framework libraries) that layer should not be build over and over and pushed to Docker registries, which would save quite a lot of time.
You can see these optimisation explained here in detail with Dive tool in my other blog, if you click the link 🙂
The configuration of JiB quite simple, as you can see from ‘Customer Relationship Adapter’ Service, ‘build.gradle‘.
As you can see the configuration of JiB extremely easy, just define base image, which repository that image will be uploaded and tags, then you can easily build a docker image without docker daemon in your workstation.
Helm
Now second interesting part, deployment of the Helm Charts, personally I don’t like to do Helm commands in build pipelines with shell commands as long as some Gradle Plugins can do it for me.
As you see, the configuration is really simple to configure which repository to upload the Helm Chart and plugin does really nice things for us, placing in Helm chart the ‘image repository‘, ‘image tag‘ and ‘appVersion‘.
So this part of pipeline will be nothing more then calling ‘./gradlew helmPackage‘ and so first part of CI / CD chain is complete.
Now we have five Services that we have to deploy to our Kubernetes Cluster, it is most logical to organise the several Helm Charts of the Services under an Umbrella Helm Chart (by following Apps of App Pattern), which you can see here.
This of course will work, the problem will be the audibility and traceability. Although it is possible, it is not that easy to figure out what we deployed to our Kubernetes Cluster, which Docker Images we used, which configuration parameters were active or worst of, what is the Delta from our previous release of our application.
The easiest way to find the answers to these questions is applying the principles GitOps, in which we can exactly see what we deployed, better if we are working with Pull / Merge Requests, we can see what will be difference from previous deployment to the next deployment.
ArgoCD
To reach this goal, we will use an awesome tool called ArgoCD, when we reach this end goal, this is how the things will look in our Kubernetes Cluster and in ArgoCD UI.
Now that we laid the foundation with ArgoCD lets continue with our pipelines.
GitVersion
Before I continue with further topics, I like to mention here another awesome tool which you can use in your pipelines to define Semantic Version of your application in GiT called GitVersion. This is a topic most of the development Teams ignore and just use hash codes created from GiT as version number, of course they are unique but not really human readable but Versioning is really important topic for our Continuous Deployment so it is important to identify which version we are deploying to which environment. This tool can follow GitFlow or GithubFlow concepts which ever is fitting you.
I will make in preparations chapter a small demonstration here, to show how to use this tool and integrate with Github Actions which can also serve as demonstration for our Service and Environment Pipelines (I do this with Github Actions while my sample code lies in Github but those can be with minimal changes adapted to Gitlab / MS Azure / etc and other Pipeline tools).
Environment Pipelines
At the beginning of the Blog, I mentioned that we will have separate Service Pipelines and Environment Pipelines. Until now we examined the Service Pipelines, now let’s looks Environment Pipelines.
The Proof of Concept that we use to demonstrate, the story we are explaining here, depends on Infrastructure components like Apache Kafka, Apache Cassandra and Elasticsearch. Every environment that we want to run our Services, these Infrastructure Components must be present.
As you will read further in the blog, we will create isolated environments to test our ‘features / epics / integrations / releases‘, to ensure that these Environments would not negatively effect each other, which means that these Environments will also need Infrastructure components (a Data State created in Cassandra for ‘feature/usecase-1‘ should not negatively effect development / tests in ‘feature/usecase-2‘). Now before Kubernetes days, this level of isolation was not feasible, nobody would install an complete new Instance of Apache Cassandra on a physical machine, just to test a ‘feature‘. The costs for it can’t be never justified.
With Kubernetes, creating new Instances of Apache Kafka, Apache Cassandra, Elasticsearch, is a thing of 5 minutes (and another 5 minutes to tear down the environment when you are done with development / testing of the feature), realised via robots. It is a powerful feature and actually should be the main motivation for you, to switch from your 20 olds Monolith to Kubernetes Environments, it should be the main driving force, the one that your increase of Quality and reduce your of Cost.
Now ‘State of the Art‘ methodology to install these Infrastructure to Kubernetes environments are via the Kubernetes Operators, every modern Infrastructure component now a days has an Operator, like Strimzi for Apache Kafka, k8ssandra for Apache Cassandra and ECK for Elasticsearch.
I will also follow this path to configure the Infrastructure for this blog.
Helm Umbrella Chart for Environment
As you might see, while the operators should be installed before hand to your Kubernetes Cluster (you can find the instruction the above links), the configuration of our Infrastructure is quite simple.
Above you see the configuration of the Kafka Strimzi Operator, there is nothing fancy here. we configure number of Kafka Instances, how much Memory and CPU these Kafka Instances will get, only critical point, (if you are going to use these Charts as sample, please change the Kafka Image, this one specifically chosen for my M1 Mac Notebook, the performance with it, in your environment probably will not be good).
Stimzi Operator’s Custom Resource Definition’s does also give us the possibility of deploying necessary Kafka Topics, even giving us the possibility to configure ‘replication-factors‘, ‘partitions‘ depending on which environments, via ‘values-xxx‘ files of the Helm Charts.
It looks simple to start an Apache Cassandra Cluster isn’t with a Custom Resource Definition, it really is.
As I mentioned previously, it could be really critical for different development teams to have a certain data state in the Environments, for this Cassandra has a really cool tool called Medusa, k8ssandra Medusa Operator, Cassandra Restore1, Cassandra Restore 2, so you can arrange your workflow so that when you create a pull request, you can actually restore some data state to be able to proceed with your evaluations.
One minor note, while Cassandra will have to run on GKE, we have to use a special ‘storage-class‘ ‘standard-rwo‘, other then that base configuration of Cassandra is really simple.
As you can see, initialising an Elasticsearch Cluster in Kubenetes is also as simple with Custom Resource Definitions, only special thing, Elasticsearch Cluster needs Nodes ‘master‘, ‘ingest‘, ‘data‘ with roles (at least one ‘master’ node but this is no production setting please pay attention to that).
Deployment of the Infrastructure
As you will see below during the Workflows, we will deploy the Helm Chart for the Infrastructure with ‘helm install’ while we were installing the Services via ArgoCD. The main reason for this, while development / changes of the Services are under our control and the frequency of changes quite higher then Infrastructure (let’s say, we will have an change / commit every 10 mins to ‘development‘ branch for a Service, a change / commit to Infrastructure once month) and the changes originating from the Framework developers are not under our control, I prefer it to install it with via Helm but if you prefer differently, you can modify the workflows to use ArgoCD for the deployment of the Infrastructure components.
One more point I like to express, it is quite popular in todays IT world to buy services from your Cloud Provider, like a PostgreSQL Instance from MS Azure, while you don’t want deal / administrate it yourself and use an ‘infrastructure as code‘ tool like Terraform to automate the creation of these Infrastructure components. You can manipulate the Workflows that I demonstrate here to execute these Terraform Configurations even for our Dev Kubernetes Clusters to create ‘feature / integration / release‘ Environments, but in my opinion that is an overkill (but if you really want to see how it is done, here is the link to the chapter). I think, it is much more logical to use PostgreSQL Operator (or similar tools for other Infrastructure Components) to instantiate an PostgreSQL instance in your Kubernetes Cluster for these environments but off course, the choice is yours.
Environment Promotion
As you can see from the first diagram of this blog, one of the most important ideas in this Blog is the Environment Promotion. When your application deployed to ‘Integration Environment‘ in Dev Cluster for Sanity Checks and to check certain quality benchmarks are reached, your Decision Gremium will say, this software state is mature enough to promote to ‘Test Environment‘ will be test there via your test team. If they are satisfied with the results, a decision will be taken to promote ‘Test Environment‘ to ‘Staging Environment‘ and finally to ‘Production Environment‘.
So how do we do that? It is basically setting concrete Version numbers for Helm Umbrella Charts for Services and Infrastructures after we identify as stable software, in appropriate Environment Git Repositories.
Probably when you first see the diagram at the beginning of the Blog, you were really sceptical about transferring information between Environment Git Repositories for Environment Promotion. At this point in the blog, as you can see, it is nothing more then committing the repository Version of our Umbrella Charts and placing configuration data specific to the environment.
Github Actions / Workflows
As previously mentioned, I will display our Service Pipeline with Github Actions, but you can convert the basic idea to any tool like Gitlab, MS Azure Pipelines, CI / CD with Google Cloud, AWS Code Pipeline.
You can see Azure DevOps Pipelines implementation of these concepts in this follow up blog.
Before I go into details, as you can see from the start of this blog there are going to be several Service Git Repository and basically they will use the same Workflows, so we would need a Central Git Repository to maintain these. Service Git Repositories would only have Trigger / Entrypoints to reuse these workflows.
Use Case 1: Continuous Integration and Deployment
Trigger Action: Commit to ‘credit-score‘ repository ‘development‘ branch
Let’s start with one of the easiest Workflows, which is building the Java Code, execute tests, build Docker Image, upload to our Docker Image Repository, make the Helm to package Helm Chart and upload to our Helm Repository and finally deploy those to our Dev Environment under ‘development’ namespace.
>>credit-score Service Repository
Now let’s look ‘build-with-reusable.yaml’ that is defining our Pipeline.
First we are calculating our Version.
name: Java / Gradle CI Caller
run-name: Building with Gradle triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
push:
branches:
- 'development'
- 'release/**'
- 'feature/**'
- 'hotfix/**'
- 'pull/**'
- 'pull-requests/**'
- 'pr/**'
paths-ignore:
- '.github/**'
jobs:
call-build-workflow:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/build.yaml@master
with:
native: true
chart-name: "customer-relationship-adapter-application"
secrets: inherit
We define a name for our Workflow and under which condition this workflow will trigger, in this case with a push to Github Repository. As you can see we want this Workflow to trigger only for certain branch, for GitFlow all branches, except ‘master’ for the reasons we will explain shortly.
This Use Case would be standard and will be used by all Service Repositories, we will place in ‘fsm-akka-github-workflows‘, which will look like the following.
First part of our Workflow, which is getting the Version for us.
Here, there is a provided Github Action, ‘gitversion’ which has two phases, first ‘setup’ and second ‘execute’ which will identify the Version for our build with GitVersion tool that I introduced before.
Now we need this version number for our Gradle Build, so we need this as output value, for this we defined ‘${{ steps.gitversion.outputs.semVer }}’ to be passed to the next part of the job.
First thing second part of the Job does is to express that it depends on ‘calculate-version‘ with ‘needs‘ keyword which is important because this way we will be able to access the output variable from the previous step. This part continues preparing Java environment (which Java Version, which distribution to use, etc) to able to process the gradle build.
Finally we pass the Version to the gradle build by passing the value to ‘ORG_GRADLE_PROJECT_version: ${{ env.SEMVER}}‘ which is the implicit way pass parameters to Gradle over environment which get from Github Repository Secrets (which you have to configure for every Service Repository).
Final instruction of the workflow will start the Gradle build with arguments ‘build -no-daemon‘.
All the thing you observed until now is the setup for an usual Docker Registry / Helm Repository like Nexus, Artifactory but lately Cloud Registry like MS Azure’s, Google Cloud started using OCI protocol also for Helm Repositories, unfortunately Gradle Plugin that we are using for Helm functionality is not able to use OCI Registries, I will push these Helm Charts to Google Cloud Artifact Registry so I have to deal Helm Push also in the Pipeline.
After this part of the workflow is complete, a second part will trigger that will continuously deploy out development state to our ‘Dev Kubernetes Cluster’ which will trigger with the ‘continuous-deployment-development-with-resuable.yaml’.
name: Continuous Deployment Caller - Development
run-name: Continuous Deployment for Development triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_run:
workflows: [Java / Gradle CI Caller]
branches: [development]
types: [completed]
jobs:
call-continuous-deployment-workflow:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/continuous-deployment-development.yaml@master
with:
repo-name: "customer-relationship-adapter"
branch-name: "${{ github.event.workflow_run.head_branch }}"
secrets: inherit
This workflow triggers upon the completion of the ‘Java / Gradle CI Caller’ workflow on ‘development’ branch with ‘[completed]’ condition, then the reusable workflow ‘continuous-deployment-development.yaml’ take over to complete the rest.
First thing you have to know, reusable workflow needs ‘on‘ -> ‘workflow_call‘ trigger and this workflow will also need the originating Git Repository name as input parameter and also the branch name that we want this workflow to run (please remember that this workflow triggered with ‘on‘ -> ‘workflow_run‘ trigger from Service Repository, this trigger type does not inherently pass ‘branch-name‘ to reusable workflows so we have to pass explicitly).
This functionality will be re-used from every Service Pipeline that we have so it is placed in a central repository.
We need to do this, to be able to tell Helm Umbrella Chart which development version of the service it should deploy to ‘Dev Environment’, after we identified that we pass this information further in the workflow.
Next part the workflow will concentrate to Deployment aspect to Dev Environment and first delegate the continuation of the workflow to Helm Chart Repisotory, with the ‘dispatch’ functionality of the GitHub Actions, which we see first time here.
To be able to realise that, first we have to define that we want to use ‘aurelien-baudet/workflow-dispatch@v2’ action, which needs the information to which repository we are dispatching, in this case, it is ‘mehmetsalgar/fsm-akka-helm-umbrella-chart‘, which workflow there should be triggered, which would be ‘helm-publish-with-reuse.yaml‘, which branch in target repository this workflow should be triggered and that would be ‘development’.
Off course triggering a workflow from another directory is a Security relevant operation, for this reason we have to pass to this action, our GitHub Token, which we already placed as GitHub Repository Secret.
This ‘dispatch‘ action has some under nice features like, waiting for the completion of the triggered workflow, normally dispatched workflow has the nature of ‘fire & forget’, so next step in the workflow will not wait for the termination of the dispatched workflow but start directly but we don’t want that, so we set ‘wait-for-completion’ parameter to true. Further parameters control how much time this workflow would wait for the ‘successful’ completion of the dispatched workflow or mark it as failure and how often should this status check should occur (a word of caution here, GitHub has a rate of call check on Dispatch, so if you set this parameter to 1s and you have too many service repository, you can lock yourself out of GitHub API).
name: Helm Publish with Gradle reuse
run-name: Helm Publish with Gradle triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_dispatch:
jobs:
call-helm-publish:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/helm-publish.yaml@master
secrets: inherit
The dispatched workflows, must have ‘on‘ -> ‘workflow_dispatch‘ trigger and while we have the ‘helm publish’ scenario for lots of the use cases, I have placed that in ‘fsm-akka-github-workflows‘ and final interesting point, while this Workflow will be used from several Use Cases / Repositories, it should receive Repository Secrets from originating Workflow, for this, we are using the parameter ‘secrets: inherit’.
First part the workflow try to identify the Version of the ‘development’ branch (workflow-dispatch action defines this branch) in ‘fsm-akka-umbrella-chart’ repository, so it can use a specific version while publish Helm Umbrella Chart.
At this point we have to look to the ‘Chart.yaml’ of the Helm Umbrella Chart in ‘development’ branch.
Every Services of us are represented as dependencies in our Umbrella Helm Chart, we maintain which version of our Service will be included in the Helm Chart is configured in the ‘gradle.properties’.
For development branch, we want that Helm deploys a range of active development Services, as defined with the Semantic Versioning concepts, hence the notation ‘<=1.3.0-beta‘ as long as major/minor/patch does not change, our workflow would deploy -alpha,x versions of 1.3.0 for ‘credit-score’ service. As your development train moves on, you have to changes these as ‘<=1.3.0-beta‘, ‘<=2.1.0-beta’, etc….
The ‘master‘ branch, compared to this, will have concrete versions of our services, that is one of the reasons why ‘master’ branch would be treated differently during the workflows.
Last part show you last piece in the puzzle, how we are using Gradle’s Helm Plugin Filtering capability in ‘build.gradle‘ to place actual version values to the Helm Chart provided with ‘gradle.properties‘,
First I like to draw you attention to the ‘needs’ keyword in the workflow, this is the mechanism in Github Actions in which order the Jobs will run, while we need to calculate the version of the Helm Chart, first we have to have that part of the Workflow run.
Later on, on ‘helm-publish’ job, we access this Version number via, ‘${{ needs.calculate-version.outputs.semVer }}’ notation then we execute Helm Gradle plugin ‘helmPublish’ task with Version number and GitRepository Secrets.
Now our Helm Umbrella Chart is with this concrete version number uploaded to the Helm Repository, so use this in ‘fsm-akka-dev-environment’ to release new Version of our ‘credit-score’ Service via ArgoCD.
>>fsm-akka-dev-environment Repository
Before we start looking in detail the workflows in this Repository, let me first say few words about it existence reasons. With this repository we will synchronise the state we have in Git with our Kubernetes Cluster via ArgoCD.
Now I mentioned before we can actually easily deploy our application with the following command
> helm install fsm-akka fsm-akka/fsm-akka --version=1.2.0.alpha.1 -n development
The problem with it, is audibility…
you saw, we are using for development environment ‘alpha‘ version of our services, so what exactly we deployed?
what is delta between the current and our last deployment of our System?
who did the changes?
what caused the changes (requirement, feature, etc..)?
Even with ‘helm install‘, it is possible to find the answer of these questions but it is quite hard comparing to just making a ‘git diff‘ between pull request and current state, looking to commit history, etc…or worst case rolling back the changes.
So how can we make GitOps with the Helm, at this point an awesome tool called Helmfile comes to rescue.
To be able to use Helmfile we need an ‘helmfile.yaml‘ configuration file.
As you can see, it is super simple, we have to first tell Helmfile from which Helm Repositories our Helm Umbrella Chart should be deployed. In the ‘releases’ part of the configuration, we will specify the name of the Helm Chart and most importantly which version we like to deploy and while we are at ‘development’ branch we again work with the range concept of versions of the Helm/Helmfile. This configuration would deploy highest range of ‘-alpha.x‘ for 1.2.0. If you need change major/minor/patch versions, you have change this file and commit to ‘development’ branch.
This workflow receives input parameters ‘source-repo‘ to know from which origin repository this workflow is triggered and ‘version’ is the version number of the Helm Umbrella Chart that we want to deploy.
The workflow first will delete directory ‘./gitops/fsmakka’ (which is the directory which Helmfile will write its output).
This workflow deals with the deployment of the ‘development’ branch, so checkout this branch and use ‘mamezou-tech/setup-helmfile@v1.2.0‘ GitHub Action to prepare Helm Environment and execute the command ‘helmfile template‘ which will create the following directory structure under ‘./gitops/github/fsmakka’.
Over these files, we can exactly know what are trying to deliver to Kubernetes Clusters.
Last step to ensure this is to commit these files to Git Repository and tag this commit with following pattern ‘${{ inputs.source-repo }}-${{ inputs.version }} –force’ to know from exactly which Service Repository these changes are coming and which version of our Service caused to us.
>> fsm-akka-4eyes-argocd
Now we need an ArgoCD Application Custom Resource Definition that will be deploying continuously development versions (-alpha versions) while we have to this process running continuously we have to install this once.
As you can see ArgoCD deployed the ‘development‘ version of the application.
Use Case 2: Prepare Environment for Pull Request
Trigger Action: Creation of Pull Request for ‘feature/x‘ branch or Commits to ‘feature/x‘ branch
Second workflow is much more challenging then the first one. This one will assume that you are working with a ‘feature’ branch and you are ready create a pull request to merge this to ‘development’ branch. To be able to do that, somebody has to assess the quality of the software in the ‘feature‘ branch. To make this possible, we will create a Namespace based on the name of the ‘feature‘ branch, at our ‘Dev Kubernetes Cluster’ so we can test the feature.
To be able to work in isolation (our feature branch would not disturb other development effort), we will create a new branch in Helm Umbrella Chart Git Repository, place the version of our service in this branch. then publish the Umbrella Helm Chart with the actual version of the Umbrella Chart repository.
Next step, we will need a branch at the Dev Environment Repository to generate the our Kubernetes manifests with the help of the Helmfile, so ArgoCD can read those to create a new Environment for us at Dev Kubernetes Cluster.
Final step would be that, we will configure the ArgoCD to pick this new branch from Dev Environment repository to deploy it to our Kubernetes Cluster using Feature Branch name as a Namespace.
This workflow, while it is going to be re-used from all Service Repositories, we will place to the our central repository for the workflow.
The trigger of the workflow will be the creation of Pull Request in Github,
This workflow will only trigger, if a ‘pull-request’ opened against ‘development’ branch, which in opinion signifies that you reached certain level of maturity with your Feature Branch and you want to show / test your progress in the feature development (Off course this is my interpretation of the development process, with minimal changes you can change this workflow to trigger the moment you create Feature Branch but in my opinion, at the start of the development of a Feature there would not be too many things to test / look for.
This trigger will exist for several other Service Repositories, so it will delegate directly to a reusable workflow in ‘fsm-akka-github-worklows‘ with input parameters such as…
for which repository this workflow triggered
the name of the feature branch
which base branch we should take for our Helm Umbralla Chart (may be you have an Epic Story and you want to include several Services to you feature environment, so you would take another branch then ‘development‘ as a base to Helm Umbrella Chart)
and finally which Helm Values these Helm Charts should be deployed (let’s say you want to have fewer instances / more instances of Kafka in feature environment)
First part of the workflow must be familiar for you know, we explained in the previous Use Case, the function of the ‘calculate-version‘, with input parameter ‘branch-name‘ on the Service Repository it will identify branch version and pass this parameter further on the workflow. To be able to do this calculation we have to create a Branch in ‘fsm-akka-helm-umbrella-chart’ by combining the name of Service Feature Branch and the name of the Service Repository (to provide uniqueness) so Gitversion can calculate the version for Helm Umbrella Chart.
which will dispatch the workflow to ‘fsm-akka-helm-umbrella-chart‘ to run on the branch ‘${{ inputs.umbrella-chart-base-branch-name }}‘ as it is given as an input parameter (for this use case it is ‘development’, which will be critical in the development of the Epic Stories) to create a new Branch as the same name as Service Repository feature branch name.
Then it will dispatch the workflow to ‘publish-and-prepare-environment.yaml‘, which will create Infrastructure, Service Environments, now let’s looks these workflows closer.
>> fsm-akka-helm-umbrella-chart Repository
name: Create Branch with reuse
run-name: Creating Branch ${{ inputs.branch-name }} - Base Branch Name ${{ inputs.base-branch-name }}triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_dispatch:
inputs:
branch-name:
required: true
type: string
base-branch-name:
required: true
type: string
jobs:
create:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/create-branch.yaml@master
with:
branch-name: ${{ inputs.branch-name }}
base-branch-name: ${{ inputs.base-branch-name }}
secrets: inherit
There is nothing special about ‘create-branch-with-reuse.yaml‘, while we will create branches from other workflows, this delegates to a centralised workflow, it can create a branch with the given branch name and base branch name, for this use case it is ‘development’ but for an epic branch it can be ‘epic/xxxx‘, etc.
Next part of the workflow will publish Helm Umbrella Chart to Helm Repository, for the version number that we identified on this feature branch. We will need the same Helm Packaging (publishing with Google Cloud Platform unfortunately little bit complicated with OCI Helm Repositories, the Gradle Plugin that I use unfortunately does not support it, so we have program Helm Push ourselves) functionality in the other workflows, it is also implemented as reusable workflow (but it is the first workflow that is reused from same repository, please pay attention to the notation) then we will set the ‘-P${{ inputs.source-repo }}-version=${{ inputs.service-version-number }}‘ the Version for the Service, finally set the Umbrella Chart version ‘-Pversion=${{ inputs.umbrella-chart-version }}‘ and push.
Helm Push part is little bit complicated while we have to use the key for the Service Account that we created in Google Cloud Platform to interact with our Google Kubernetes Engine, we have take the key from the Github Secret that we created and pass to Google Cloud authentication mechanism.
Then we will need to create new environment in Dev Kubernetes Cluster, we need new branch in ‘fsm-akka-dev-environment ‘ so ArgoCD can pick up this branch to deploy our services to Kubernetes Cluster..
name: Prepare Dev Environments in Kubernetes
run-name: Prepare Dev Environments in Kubernetes for Branch ${{ inputs.branch-name }} Version ${{ inputs.version }} for Tag ${{ inputs.tag }} with Chart Base Branch ${{ inputs.umbrella-chart-base-branch-name }} triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_call:
inputs:
branch-name:
required: true
type: string
umbrella-chart-base-branch-name:
required: true
type: string
tag:
required: true
type: string
version:
required: true
type: string
jobs:
prepare-dev-environment:
runs-on: ubuntu-latest
steps:
- name: Create Branch for PullRequest/ServiceRelease/Integration in dev-environment
uses: aurelien-baudet/workflow-dispatch@v2
with:
workflow: 'create-branch-with-reuse.yaml'
repo: 'mehmetsalgar/fsm-akka-dev-environment'
ref: "${{ inputs.umbrella-chart-base-branch-name }}"
token: ${{ secrets.PERSONAL_TOKEN }}
wait-for-completion: true
wait-for-completion-timeout: 5m
wait-for-completion-interval: 10s
inputs: '{"branch-name":
"${{ inputs.branch-name }}",
"base-branch-name":
"${{ inputs.umbrella-chart-base-branch-name }}"}'
- name: Prepare PullRequest/ServiceRelease/Integration in dev-environment
uses: aurelien-baudet/workflow-dispatch@v2
with:
workflow: 'prepare-services-for-new-environment.yaml'
repo: 'mehmetsalgar/fsm-akka-dev-environment'
ref: '${{ inputs.branch-name }}'
token: ${{ secrets.PERSONAL_TOKEN }}
wait-for-completion: true
wait-for-completion-timeout: 10m
wait-for-completion-interval: 10s
inputs: '{"tag": "${{ inputs.tag }}",
"version": "${{ inputs.version }}"}'
This workflow will create a branch in ‘fsm-akka-dev-environment‘ based on defined at start of the workflow (this case ‘development‘ but for an epic story, this can be the branch of an epic) and in the next step we will contiue the ”prepare-services-for-new-environment.yaml‘ to render k8s manifests with the help of the Helmfile so ArgoCD can read those.
This part of the workflow will checkout the newly created branch, remove previous version of manifest (helmfile will not make a delta and delete old files) , setup Helmfile Action, execute..
‘which generate from Kubernetes Manifests from Helm Umbrella Chart Version we prepared in earlier steps of the workflow and the final step, we will commit newly generated manifest and tag those with a combination of the ‘source-repo‘ name that triggered this workflow originally and the version at Service Repository with the help of the following Helmfile configuration.
Please pay attention to ‘—‘, this is a signal for Helmfile to group its render areas, without this you might get some problems with the template variables we use in Helmfile configuration and also the OCI configuration parameter, so Helmfile knows how to connect an OCI Helm Repository.
This version of the Workflow, does not uses the version from environment default for the Helm Umbrella Chart version but get as a parameter from the workflow.
create-infrastructure-in-k8s:
name: Create Infrastructure in K8s with Branch Name as Namespace
needs: create-branch-helm-umbrella
uses: ./.github/workflows/create-infrastructure-in-k8s.yaml
with:
branch-name: ${{ inputs.branch-name }}-${{ inputs.repo-name }}
base-branch-name: ${{ inputs.infrastructure-base-branch-name }}
value-file: ${{ inputs.value-file }}
secrets: inherit
This will call another reusable workflow while we will need the same functionality for ‘Integration‘ and ‘release‘ branch workflows with parameters containing the ‘branch-name‘ for ‘credit-score‘ feature branch, from which base branch we will create this branch in ‘fsm-akka-helm-infrastructure-chart‘ repository, which for this workflow at the moment ‘master‘ branch and finally which environment configuration should be used with ‘value-file’.
This workflow tries to create an acceptable Kubernetes Namespace from Feature Branch, so it removes unacceptable characters from branch name (please pay attention to see the steps are chained so we can call the same GitHub Action twice). The last Job will install our Infrastructure (Kafka, Cassandra, Elasticsearch) to this new Namespace with the following command.
So we use calculated namespace ‘needs.calculate-namespace.outputs.namespace‘, ‘value-file‘, like values-dev.yaml, values-test.yaml, values-prod.yaml, values-xxxx.yaml, etc..and finally the version of the Helm Infrastructure Chart.
Which will deploy our infrastructure to Goigle Kubernetes Engine ( GKE ) under the namespace we create our feature branch.
You probably asking yourself, ‘we are deploying our Services with ArgoCD, why are we deploying our Infrastructure with Helm?’. Personally I think because of the amount of the changes happening for our Service it is much critical to track over the GitOps the changes. Compared to this, the Helm Charts of the Kafka, Cassandra, Elasticsearch the rate of changes are small, so I prefer to install with Helm Install, off course if you think, you want to follow GitOps principles for the Infrastructure and use Helmfile / ArgoCD, you can modify the workflow by taking Service Deployments as an example.
One word of caution here, generally speaking, Infrastructure components are the ones that are most resource hungry, so if you need lots of environments, while you are developing too many feature ini parallel, you have to allocate lots of Kubernetes resources but this bring the dilemma, if you need peak resource demand rarely, why should you constantly allocate and pay those resources.
Thankfully GKE has one feature that would be really helpful.
‘Enable cluster autoscaler‘ for your GKE node pool, this way when GKE needs more resources it would add those to node pool and remove those when those are not necessary. We also have a safety valve, if our Pipelines goes havoc, it will not allocate 1000s of instances, ‘Maximum numbers of nodes‘ will prevents that things goes out of control.
create-services-environment-in-k8s:
name: Create Services Environment in K8s with Branch Name as Namespace
needs: create-infrastructure-in-k8s
uses: ./.github/workflows/create-services-environment-in-k8s.yaml
with:
branch-name: ${{ inputs.branch-name }}-${{ inputs.repo-name }}
base-branch-name: 'master'
secrets: inherit
The workflow will now continue in the context of ‘credit-score‘ Repository and will create an ArgoCD Application from the branch we previously created at ‘fsm-akka-dev-environment‘ repository. ArgoCD Repository ‘fsm-akka-4eyes-argocd‘ will be always based on ‘master‘ will only install ArgoCD Application Custom Resource Definition via Helm for new environment.
Which will do same process to remove unwanted character from branch name so we can convert to Kubernetes Namespace, after that we will deploy with this following command.
Only interesting parts in this command is the ‘targetBranch‘ parameter, which tells ArgoCD to monitor which branch in ‘fsm-akka-dev-environment‘ repository and which namespace the application should be deployed into the Kubernetes Cluster,
You will understand this better when you observe the ArgoCD Application Custom Resource Definition.
You can see in this Custom Resource Definition, we are using Feature Branch name as Kubernetes Namespace, ArgoCD will observe ‘fsm-akka-dev-environment‘ repository, under the directory ‘gitops/fsmakka‘ for the branch defined in the ‘targetBranch‘.
This will complete the workflow for this Use Case, now that we completed a workflow that create a complete new Environment to test it from opening of an Pull Request, let’s look to a workflow that will clean up the Environment when Pull Request is merged or Closed :).
Use Case 3: Environment Cleanup after competed Pull Request [ Merged / Closed ]
Trigger Action: Pull Request completed or closed
This workflow is quite simple, after GitHub receive an Event that Pull Request completed (merged / closed), it will first remove the created feature branches from ‘fsm-akka-helm-chart‘, ‘fsm-akka-dev-environment‘, then removes Helm Installations for Infrastructure and then the ArgoCD Application.
Use Case 4: Producing Release Candidates for Services
Trigger Action: Creation of ‘release/x.x.x’ branch or Commits to ‘release/x.x.x’ branch
This workflow is basically the same workflow as the Use Case 2 which triggers when a Pull Request is created from ‘development‘ branch. This Use Case will trigger when a Release Branch created or a Push to Release Branch occurs.
The only other change compare to Use Case 2 other then the trigger condition, this workflow takes ‘master‘ branches from Helm Umbrella Chart from Services and Infrastructure as base branches to create a new Environment for a Release Candidate. If you have an Epic Story and your configuration should depend to these Epic Branches (multiple Services collaborating for the implementation of an Epic Story) for the Release Candidate, you can change the workflow in ‘release/x‘ branch to these specific branches for Service and Infrastructure Umbrella Charts which we will closely in one of the following Workflows.
Use Case 5: Release Environment Cleanup
Trigger Action:‘release/*’ branch is deleted
This workflow functions with same principles as Use Case 3, only its trigger condition is different, it will trigger when a ‘release/*’ branch Merged to ‘master‘ branch and ‘release/*’ branch is deleted-
name: Cleanup after Branch Delete
run-name: Cleanup after Branch Delete triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
delete:
branches:
- release/**
jobs:
call-cleanup-workflow:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/cleanup-for-service.yaml@master
with:
repo-name: "customer-relationship-adapter"
branch-name: "${{ github.event.ref }}"
secrets: inherit
Use Case 6: Integration Environment for Helm Umbrella Charts / Sanity Check
Trigger Action: Creation of ‘integration/xxx‘ branch in ‘helm-umbrella-chart‘ with concrete Release Candidate versions of multiple services
As I previously mentioned in this blog, I am using GitFlow concepts, which is great for many scenarios but I have a problem with one specific topic. GitFlow dictates you have to start a ‘release/x‘ branch before you advance your application to production ‘master‘ branch, which mean artefacts that are produced from the ‘release/x’ branch, you would have version of ‘-rc.x’ but we don’t want our application promoted to ‘test‘, ‘staging‘, ‘production‘ with ‘-rc.x’ versions.
We want same concrete binary version of the application to be promoted between the environments, for ex, ‘1.2.09’ version of the binary deployed to all environments and not ‘1.2-rc2.4’ , as great Martin Fowler discusses here. With GitFlow, if we test the software state from ‘release/x’ branch in ‘test’ environment, the moment we merge it to ‘master’ branch binary will get another version number. In my opinion, it is also not realistic that that software that developed in a Sprint via 500 developers / 50 Scrum teams, can’t be directly merged to master and promoted between the environments.
Off course our automation tests can check the regression and assure our software quality is not deteriorated but new features / epic stories development, a robot can’t decide our requirements correctly implemented or not, so we will need a software state that we can do some Sanity Checks.
My solution to this dilemma is to introduce an ‘integration‘ branch GitFlow for ‘fsm-akka-helm-umbrella-chart‘ repository and create an Environment in our Dev Kubernetes Cluster, so preliminary Sanity Checks can be execute here before the software can be promoted to Test Environment. This way we can use ‘release/x‘ branch, which will use with concrete versions of our Services.
So Chart.yaml and gradle.properties which contains the versions in the ‘fsm-akka-helm-infrastructure-chart‘ for ‘integration/x’ will look like the following.
This workflow triggers with a push to any ‘integration/x‘ branch, then will reuse the workflows that we previously demonstrated to create a new branch ‘integration/x‘ in ‘fsm-akka-dev-environment‘ and ‘fsm-akka-helm-infrastructure-chart’ Git Repositories and rendering Kubernetes Manifests via Helmfile, then dispatching the workflow to ‘fsm-akka-4eyes-argocd‘ to deploy an ArgoCD Application which will deliver our Kubernetes Manifests to our Dev Cluster under the namespace ‘integration-x‘.
Use Case 7: Integration Environment Cleanup for Helm Umbrella Charts
Trigger Action: Deletion of ‘integration/x.x.x‘ branch ´ of ‘helm-umbrella-chart‘ repository after sanity checks are completed.
The previous Use Case 5 created en Environment for us in Dev Cluster for Sanity Checks, naturally we should have a process to clear this Environments when the Sanity Checks are complete. For this a Cleanup Workflow (‘cleanup-after-branch-delete.yaml‘) will trigger on the deletion of an ‘integration/x‘ branch, by reusing the workflows we already demonstrated.
name: Cleanup after Branch Delete
run-name: Cleanup after Branch ${{ github.event.ref }} delete triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
delete:
jobs:
call-cleanup-workflow:
if: ${{ contains(github.event.ref, 'integration/') }}
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/cleanup-environment.yaml@master
with:
branch-name: "${{ github.event.ref }}"
secrets: inherit
Use Case 8: Service Release Process
Trigger Action: Manual start of Pipeline after setting concrete version with ‘git tag‘ or automated with the merge of the ‘release/x.x.x‘ branch version to ‘master‘ branch.
In the previous Use Cases, we discussed Steps that can help us to bring our application closer to a Release, if you follow those workflows, you must be start having questions about Release Process.
Our Release Process for Services is not fully automated, it will not trigger automatically if you merge a ‘release/x’ branch to ‘master’, for the reasons I will explain shortly, if you think these reasons does not apply for you, you can make the necessary changes to workflow and you can convert those to an fully automated one. At its current state the Release workflow should be triggered via GitHub UI.
The main reason no to automate the Release workflow, it is not clear how to predict end user interaction with GitVersion.
If the version on the ‘master‘ branch would be controlled over ‘git tag‘ command, we have to give the end user the possibility to tag ‘master‘ branch before it can start the Release workflow.
If end user uses ‘+semver: major‘, ‘+semver: minor‘, ‘+semver: patch‘ in its commit message in ‘release‘ branch consistently then this workflow can also be automated.
name: Release GA
run-name: Release GA triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_dispatch:
jobs:
release:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/build.yaml@master
This will publish the Helm Chart of the Service with concrete release version which we can use in the next Use Case to build Release version of our Helm Umbrella Charts.
Use Case 9: Helm Umbrella Chart Release Process
Trigger Action: Manual triggering after the use of ‘git tag‘ or automated start after merging ‘release/x.x.x‘ branch with concrete Service Versions in ‘helm-umbrella-chart‘ repository to ‘master‘ branch.
And this will allow us to do the Environment Promotion.
Even we can implement Blue / Green deployment in Production.
After releasing our Services, to be able to promote our System between the Environments, we should also realise the release of our Helm Umbrella Chart for Services. To realise that first thing we have to do, is to place concrete Release Version Numbers in the ‘gradle.properties‘ in ‘fsm-akka-helm-umbrella-chart‘ on release branch, like the following.
A we discussed in the Service Releases, while we can’t dictate how the master branch versioned, over a tag or with ‘+semver: major/minor/patch‘, we can’t automate this workflow. If it is going to be ‘git tag‘ command, we have to give the chance the End User to tag the master branch before we start the release build and publish it to Helm Repository. If ‘+semver: major/minor/patch‘ used consistently in commit messages, then we can also automate this workflow.
name: Release GA
run-name: Release GA triggered via ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_dispatch:
jobs:
release:
uses: mehmetsalgar/fsm-akka-github-workflows/.github/workflows/build.yaml@master
this workflow is quite similar to other Helm Publish workflow, it delegates the execution to ‘build.yaml‘ in ‘fsm-akka-github-workflows‘
Use Case 10: Environment for Epic Stories
Until now we analysed the scenarios for Service Repositories that we are building environments to test one single Service. There might be scenarios that several Services should collaborate for the realisation of an Epic Story, so we have to configure specific versions of the services in Helm Umbrella Chart and create an environment for it in our Dev Kubernetes Cluster.
When we create an environment from a Service, the feature branch name in Helm Umbrella Chart has the pattern ‘feature/x’-‘service source-repo’, while our Epic Story will not be bound to a specific Service Repository, so for this user case, branch name would be ‘feature/x’ for Helm Umbrella Helm Chart.
This workflow will activate with a push to a ‘feature/x‘ branch in Helm Umbrella Chart repository and prepare environment based on the development branch, if you want to change those, only thing you have to do, change those and push those in workflow file in the feature branch, this will initialise the environment with that configuration.
Preparations
Google Cloud CLI
For lots of configuration in Google Cloud, we will need the Google Cloud CLI, you can install it by following instruction here.
Google Cloud Project
After we get our test account we have to create an Project that should contain all of resources (our Artifact Registries, Kubernetes Clusters, etc).
Google Cloud Artifact Registry
We will need two Artifact Repositories, one for Docker Images and another one for Helm Charts.
For Docker Images
You can create Docker Registry with the following instructions but if you follow the below screenshots you can achieve that also.
As you can see, I already created a Docker Registry ‘fsmakka-ar‘.
Service Account for Artifact Registry
Now that we created our Artifact Registry, we have to arrange permission mechanism so that Gitlab Pipelines can read and write artifacts to these registries.
Google Cloud has a concept of Service Accounts for control permissions / roles.
Now we have to give certain permissions / roles to this Service Account so we can upload our Docker Images / Helm Charts, which in this case is ‘Artifact Registry Writer’.
With this setup, your Github Actions would be able to upload Docker Images, Helm Charts to Artifact Repisotories.
Google Cloud Kubernetes Engine
Now we have to configure a Kubernetes Cluster to be able to deploy our Services to Kubernetes.
You can create a Kubernetes Cluster by following following instructions.
You can create your Kubernetes Cluster in Google Cloud portal using menu point ‘Kubernetes Engine -> Clusters’.
there are two option to create Kubernetes Cluster ‘Autopilot mode’, ‘Standard mode’, we are interested with the ‘Standard mode’, main difference in ‘Autopilot’ GKE takes lots of responsibility to actualise your Kubernetes Cluster, Autoscale it, etc…these are really nice options if you are new to Kubernetes concepts but I prefer the Standard mode.
then we have to do basic configuration like giving a name to our Kubernetes Cluster, a zone to run for (I am living in Germany so I have chosen ‘europe-west-3’ which is ‘Frankfurt’) , btw you can see at right side you can see monthly the cost estimates of you choices,
last relevant option which version of the Kubernetes we will use, we can pin our Kubernetes implementation to a specific version or let the Google Cloud automatically update current stable release version,
Another basic configuration of the GKE Cluster is the Node Pool configuration.
Please pay close attention to the option, ‘Enable Cluster autoscaler‘, while we are creating in our Pipelines dynamically new Environments four our ‘feature/xxx’, ‘release/xxx’, ‘integration/xxx’, etc, branches, we might need more Kubernetes Resources. Off course, we can install hard capped resource set say, 20 Instances of 8 CPU, 32 GB Machines but there will be two negatives about this.
if we would have more environment then we can host in these 20 machines, our pipelines will fail
this one is worst, if we don’t have enough environments to occupy 20 Machines and %90 of resources sitting idle, we are paying those for nothing. This worst scenario for Kubernetes environment, you main business objective is to pay for what you need, so paying for %90 resources that you are not using is not good.
So a feature that enable us to allocate instances from GCP as we need those and giving those back when we don’t need those is ideal for us, this is exactly what ‘Enable Cluster autoscaler‘ does for us. Off course there is also an safety option so that our pipeline does not run crazy and allocates 1000s of instances, ‘Maximum number of nodes‘ so we can say ‘Ok, if you need more resources allocate 10 more but not more then that.’
And finally we choose the machine type for our Node Pools.
Next part of the configuration, is about the Security of our Kubernetes Cluster, as I mentioned in the previous chapter Google Cloud has a concept of Service Accounts, we will here define which service account we will use for our Cluster, if we don’t do anything GCP will create a Service Account for us, I will use this option but as you can also create additional Service Account with necessary roles / permissions so our Gitlab pipelines can interact with our Kubernetes Cluster.
Here you can see the default account that GCloud created for us and also the Service Account that we will create further in the blog.
Service Account
Now let’s create and configure the Service Account that will interact with…
we should give the usual informations like Service Account name and id (id will look like an email address which we will need in further steps).
Service Account Roles / Workflow Identities
GCloud has concept ofWorkflow Identity 1 , Workflow Identity 2 to manage Permissions in Google Kubernetes Engine we have to use this concepts to add roles to our Service Account, you can find a general list of roles here.
Basic steps that we have to execute in you GCP CLI.
here you see our GCP Project name ‘fsmakka’, our service account ‘fsmakka-gke-service-account@fsmakka.iam.gserviceaccount.com‘ and the role ‘roles/composer.worker‘ which contains most of the roles we need to access and configure our GKE Cluster from Gitlab (if a specific roles necessary for you action the error message explicitly state which permission is necessary, you can find it from role list and added this role to your Service Account).
Kubeconfig
Now that we created our GKE Cluster lets get our authentication information for it.
and get the necessary input for ‘.kube/config‘ (off course you should realise a login to Google Cloud as described here). The input parameters that we need for this, are the name of the GKE Cluster ‘fsmakka-gke-dev‘ the zone that our cluster run ‘europe-west3-c‘ and the project that our GKE Cluster runs ‘fsmakka‘.
First if you want to observe what GitVersion is doing, you can install locally for me it was
> brew install gitversion
After GitVerison installed, you can configure it for your Service GiT Repository, I will demonstrate that in ‘credit-score’ repository. You can initialise the GitVersion with the following command.
> gitversion init
GitVersion will ask you some standard question, personally most of the companies that I worked for are using GitFlow so I also used the GitFlow.
Configuration
After this command, you can see the default configurations with the following command.
There are some really interesting things here, GitVersion gives you the ability bump version of service if you have certain commit message. What this mean, you are developing a feature and you know that is going to break the backward compatibility of your application, you can just place in your commit message ‘+semver: breaking’ (or major) and it will bump the major version of your application(for ex, ‘credit-score’ has the version ‘1.1.8’ this will bump to ‘1.2.0’).
Branch Configurations
Second interesting thing you see in the default configuration, GitVersion threats every branch differently.
As you can see GitVersion can identify your GitFlow branches with the help of the regular expressions. For ex, every commit to the ‘master‘ branch of your service will increase ‘patch‘ without any tag. Now you ask what is a ‘tag‘, let’s look to the ‘develop‘ branch. There the tag is ‘alpha‘, every version that is delivered from GitVersion for this branch will contain ‘alpha‘ tag in it. Now this is little bit irritating for Java developers, we are used to ‘SNAPSHOT’ as tag for ‘development‘ branch, if you like you can change this configuration value to ‘tag‘ ‘SNAPSHOT” but personally I prefer this way.
Similarly, ‘release‘ branch uses as tag ‘beta‘, personally I change this to ‘rc‘ as ‘release candidate‘, so the version will look like ‘1.2.0-rc.1‘. One more fancy feature, if you look to the ‘feature‘ branch the tag there is ‘{BranchName}‘, the version number would contain the actual branch name.
Now you probably understand why this topic important for me, without a human interpretable versioning system it is not possible to build an complete automated Continuous Deployment system for our Feature, Release, Hotfix, Development branches in Kubernetes.
Lifecycle Operations
Now let’s look to the doings, now we have our ‘credit-score‘ service in GiT, to enable the GitVersion create version numbers for us, first we have to
> git tag 1.1.8
for our service in ‘master‘ branch. (this is because I already developed this application of course for a brand new Service your tag should be ‘git tag 1.0.0‘ )
After that if we call the
> gitversion
command.
We will see which values GitVersion supplies us to use in our pipelines.
We see that GitVersion incremented the ‘minor‘ part of the version and also placed the tag that was configured for the ‘development‘ branch and produced the version ‘1.2.0-alpha.108′.
This configuration tells us to increment the ‘Minor’ part of the Version for the development branch (while according to GitFlow after you release your software in ‘master‘ branch development should continue in next minor version) and as mentioned before tag is configured to be ‘alpha’.
Now let’s look what is happening in a feature branch, after a
As we discussed before, ‘feature‘ branch is so configured that it will name the branch name in the Version and it will also ‘inherit’ the Version from the branch it is originated from.
Now that we completed development of our feature we like to make release, let’s look how the GitVersion would act for a Release branch, let’s execute the following command.
> git checkout -b release/1.2
And look what GitVersion delivers as Version (For simplicity from now on I will only value of the FullSemVer with the following command).
> gitversion /showvariable FullSemVer
which will deliver.
1.2.0-rc.1+0
As you can see GitVersion is clever enough to tag this Version as ‘release candidate’. Now if we can some bugfixes and merge that to our release branch, which will increment the Version number.
1.2.0-rc.1+1
Now that our Release Candidate 1 is tested and we want to go for Release Candidate 2, to achive that we only have to
> git tag 1.2.0-rc.1
the result of the ‘gitversion’ would be
1.2.0-rc.1
Now if we continue to. development of Release Candidate 2 and the moment that we make a commit to ‘release/1.2′ branch the version number would look like the following so we can continue with the process.
1.2.0-rc.2+3
When we want to release our Service to Production naturally we have to merge code state to ‘master‘ branch as GitFlow suggest, of course merging ‘release/1.2′ branch to the ‘master’ will make ‘git tag 1.2.0-rc.1 visible in the ‘master’ to state our intention to release our application with Version we have to tag the ‘master’ branch with
> git tag 1.2.0
which will make our Version number.
1.2.0
Or we will use a smart feature from GitVersion, which explained here, if we use in the commit message on this Release Branch ‘+semver: patch’ it will automatically set the version to ‘1.2.0’ which ever is fitting you.
In the previous chapters, we used GitVersion extensively, now you probably understand why solid versioning concept is extremely important for me.
ArgoCD
To achieve that we should first install the ArgoCD to our Kubernetes Cluster under the ‘argocd‘ namespace.
For this purpose, we use the Helm Chart that is in this Github Repository.
We are referencing a standard ArgoCD Helm Chart and configure for our needs.
If you are using this Helm Charts also for local environment, like me, I am writing this blog on Mac Pro M1 and local Kubernetes runs on the same hardware, so I need ‘arm64’ image, so us the following configuration.
When this must run on MS Azure, AWS, Google Cloud or any on premise Linux Cluster, you can remove this.
Next configuration give ArgoCD the permissions manipulate all namespaces in my Kubernetes Cluster (this is no production level configuration, for production, you should explicitly tell ArgoCD which namespaces it can manipulate for the security / wildcards also work, like ‘feature-*’) and finally my local Kubernetes Cluster is not operating with ‘https’ so I have to turn off for ArgoCD also.
These are the necessary parts to run ArgoCD in my k8s Cluster. Now we can install it with the following command.
Please pay attention to ‘argocd’ namespace, some components of the ArgoCD can only be installed in this namespace and in a production level configuration, only your administrators would have rights to install here, while these installation would cause ArgoCD to deploy application, modify your k8s Cluster.
Now we have installed the runtime components of the ArgoCD but ArgoCD at the moment knows nothing about out Business demands, so to tell ArgoCD that we want deploy our ‘foureyes-fsm-akka’ Business Unit we have to define ArgoCD Project Custom Resource Definition for ‘fsm-akka-4eyes-project‘.
To deploy this ArgoCD Project, we are using a Kubernetes Customer Resource Definition from ArgoCD, important points being.
Which k8s Clusters and Namespaces this ArgoCD Project can manipulate.
From which sources (Git and Helm Chart Repositories) it is allowed to install Applications(if you use an repository not listed here, you will get security exceptions and you application would not be installed).
What k8s Cluster resources the Applications belonging to this Project can modify (Again what you see here is not production configuration and only for demonstration purposes, for production you should use sensible restrictions).
What Namespace Cluster resources the Applications belonging to this Project can modify (Again what you see here is not production configuration and for demonstration purposes, for production you should use sensible restrictions).
Now before we install the ArgoCD ´in k8s Cluster, Custom Resource Definition would be unknown and would cause exceptions, for this reason I first installed ArgoCD now I can install the Project CRD with the following command.
After these configurations, if we would make a ‘port-forward’ to Argo CD, we should see the following UI.
You can use ‘admin’ user and the password would be in following k8s Secret.
Appendix
To Helm Umbrella Chart or Not
If you read to whole blog, you would remember that I told you that I prefer to use Helm Umbrella Chart concept because it would be possible to make local deployments to our ‘minikube’ from those, to increase the development speed, prototyping, etc.
This can be a non-factor for you, your System could be so big that it would be unfeasible to install to a ‘minikube’ or you just don’t value this option, so other option could be removing the ‘Helm Umbrella Chart’ and directly using ‘Helmfile’ to manage your system. In ‘fsm-akka-dev-environment’ Git Repository, you can convert your ‘helmfile.yaml‘ to this.
which will give us the possibility continuous deployment of our services from a certain range (of course some major/minor/patch versions changes we have to adapt here).
Finally in GitHub Actions, the workflow part the helmfile rendering the manifests, for the workflows that are working on releases / integration branches we don’t have to change anything but for workflows realising deployment of single services, we need slight modification.
One final topic I want to mention here, as the startup times of Java Application becoming a problem K8s environment (even with minimal functionality they need 5-6s, in a realistic application 15 to 30s), when your application is under load and you have auto-scale configurations, 15-30 s is a lifetime, people were motivated for Native Image Java Applications, specially with the release of the Spring Boot 3.0.x.
A Spring Boot native image with GraalVM would have arounf 0.7 to 1 s start times, which is really powerful, now probably you would ask why everybody not using it. There are several hurdles on the way,
first you have to change lit bit how you develop, you have to really try to avoid the use of Java Reflection or be ready to write lots of configuration files so how native image generation mechanism interpret reflection data
secondly, lots of the libraries Spring Boot uses depends on Java Reflection, Spring Boot team tries to adapt most of these libraries that they have a direct dependency but it will take time to catch for the most of the frameworks. GraalVM is also trying to establish at preset of Reflection Metadata Repository for popular Java Libraries, you can find the configuration information here.
third, to build cross platform native images is really problematic, if you install GraalVM let’s say on Ubuntu 20.04 the image that you would produce will probably would have problems in ‘alpine‘, to be able to produce a native image for ‘alpine‘, you would have to first create Docker image based on ‘alpine‘, install on it a GraalVM and then produce a native image and upload to a Docker Container Registry only this native image then would work for alpine. At least for Github Actions, you might go for Matrix Strategy to build for Cross Platform builds as explained here, Github Action ‘graalvm/setup-graalvm@v1’ is really big help for the topic.
fourth, please read the paragraph below little bit scepticism, as June 14, 2023, there is a change to GraalVM Free Terms and Conditions (GFTC) license which you can read here, as I understand it previous Enterprise version of GraalVM is for free to use in production, so you have better Java Garbage Collectors and a faster JVM then CE version, of course you have to check that with your legal department, if new license is no option for you, then you are stuck with the problems below.
A point that is not communicated not that clearly, GraalVM has one Community Edition and Enterprise Edition, as you may guess Enterprise has some costs attached to it, now what is not said is Community Edition is only supporting SerialGC, in my opinion it is no option for a serious enterprise application, it is the main culprit of the famous’ Stop the World’ problem during Java Garbage Collection. For a modern application, lets say with 32GB memory, a garbage collection with SerialGC will be most probably means 30s non reactive Java Application, which will most probably mean the end of that application. So be either ready to pay huge amount of money to Oracle for GraalVM Enterprise Edition for a reasonable GC or for trying something for radical, if you read the link above carefully, CE is also offers the possibility of using ‘Epsilon GC‘, this garbage collector does not garbage collect at all and let the application crash :). Is that sound weird? Well think like this if SerialGC collector stops the world like 30s, wouldn’t it be better to let the Spring Boot application crash with out of memory and start new in1s? One thing to think about 🙂
Now let’s look to the doing, Google’s JiB with Gradle will be huge help for us, so let’s look to the Gradle Configuration.
buildscript {
dependencies {
classpath('com.google.cloud.tools:jib-native-image-extension-gradle:0.1.0')
classpath "org.unbroken-dome.gradle-plugins.helm:helm-plugin:1.7.0"
}
}
plugins {
id 'java'
id 'org.springframework.boot' version '3.0.0'
id 'io.spring.dependency-management' version '1.1.0'
id 'org.graalvm.buildtools.native' version '0.9.20'
id 'com.google.cloud.tools.jib' version '3.3.1'
id 'com.github.johnrengelman.shadow' version '7.1.2'
}
As you can see we need two Gradle plugins, GraalVM native build tools, JiB and JiB Native Image Extension,
At the start of the blog, you saw a diagram explaining our plan. In that picture you saw that we would have identical Kubernetes Clusters for our ‘development’, ‘test’, ‘staging’, ‘production’ (or any additional environment you might need), instead of creating this environments manually, it is better follow the ‘Infrastructure as Code‘ approach and use Terraform to create those.
Additionally, as I hinted previously, real potential of cost saving in Kubernetes Environment is possible for Development, Test Environments, so we have a really interesting use case for us.
In the projects that I was involved, I always criticised Development / Test environment running in idle for months while there is nothing to test, while ordering mechanism for new environment can take up to months, so it is easier keep them idle and pay for it than creating those when we need.
With Kubernetes, that is not anymore the reality, we can increase / decrease the capacity of my environment in the matters of minutes, but there is still room for optimisations. If you follow the paradigms mentioned in this blog, you are aware that we are creating a new environment for ‘Pull Request‘, ‘Releases‘, ‘Integration’ so those can be submitted to Quality Checks, these environments can be running for days, so can’t be automatically downscaled by Kubernetes, the dilemma here, most of the work force of the software companies are working between 06:00 and 18:00 o’clock, so between 18:00 – 06:00, we will pay for these resources for 12:00 hours for nothing.
My solution to this dilemma, to have a Kubernetes Environment for office hours and create this environment at start of working day, let’s say at 06:00 o’clock and destroy it 19:00 o’clock, when we create feature branch, we can place a marker file, let’s say ‘day_over.txt’, so our pipelines will know to install this feature to this special GKE Cluster.
As you can see in the pipeline that is responsible for creating new environments,
- id: checkDayOver
shell: bash
run: |
if test -f "day_over.txt"; then
echo "cluster_name=fsmakkaGKEDayOver" >> "$GITHUB_OUTPUT"
echo "cluster_name_not_normalised=fsmakka-gke-dev-day-over" >> "$GITHUB_OUTPUT"
else
echo "cluster_name=fsmakkaGKE" >> "$GITHUB_OUTPUT"
echo "cluster_name_not_normalised=fsmakka-gke-dev" >> "$GITHUB_OUTPUT"
fi
create-branch-helm-umbrella:
create-infrastructure-in-k8s:
name: Create Infrastructure in K8s with Branch Name as Namespace
needs: [calculate-version, create-branch-helm-umbrella]
uses: ./.github/workflows/create-infrastructure-in-k8s.yaml
with:
branch-name: ${{ inputs.branch-name }}-${{ inputs.repo-name }}
base-branch-name: ${{ inputs.infrastructure-base-branch-name }}
value-file: ${{ inputs.value-file }}
cluster-name-not-normalised: ${{ needs.calculate-version.outputs.cluster-name-not-normalised }}
secrets: inherit
create-services-environment-in-k8s:
name: Create Services Environment in K8s with Branch Name as Namespace
needs: [calculate-version, create-infrastructure-in-k8s]
uses: ./.github/workflows/create-services-environment-in-k8s.yaml
with:
branch-name: ${{ inputs.branch-name }}-${{ inputs.repo-name }}
base-branch-name: 'master'
cluster-name: ${{ needs.calculate-version.outputs.cluster-name }}
cluster-name-not-normalised: ${{ needs.calculate-version.outputs.cluster-name-not-normalised }}
Now I am hearing you are saying, what about our batch jobs that run in the middle of the night, we will have a dedicate Kubernetes Environment for that sort of testing these scenarios and install it to our default GKE Cluster.
GKE Cluster Creation
So how are we creating a Kubernetes Environment at 06:00 o’clock, with the help of Github Workflow Actions and Terraform.
Now that Terraform created our new GKE Cluster, we have to install necessary Kubernetes Operator that are responsible to instal Apache Kafka with Strimzi Operator, Apache Cassandra with k8ssandra-operator and Elasticsearch with ECK Operator.
Unfortunately completion of the Terraform configuration does not mean that GKE Cluster is ready to serve the requests, so when want to update the ‘.kube/config’ we have to wait for Cluster initialisation.
until gcloud container clusters get-credentials fsmakka-${{ vars.GKE_CLUSTER_NAME }} --zone europe-west3-c --project fsmakka;
do
echo "Try again for get-credentials!"
sleep 10
done
when this succeeds, we could get the the authentication information but that does not mean the GKE ready, so we have. to wait until GKE report ‘RUNNING’ status.
until [[ $(gcloud container clusters describe fsmakka-${{ vars.GKE_CLUSTER_NAME }} --zone europe-west3-c --project fsmakka --format json | jq -j '.status') == 'RUNNING' ]];
do
echo "Try again for status!"
sleep 10
done
‘gcloud container cluster’ has really nice function ‘describe’, which display current state of the GKE Cluster, with ‘–format json’ option this will be delivered with JSON format so we can query with ‘jq -j .status’ and when it reports ‘RUNNING’ we can continue with Pipeline, which will install mentioned Kubernetes Operators.
Next step is to let the ArgoCD know the existence of new Kubernetes Cluster.
prepare-argo-cd:
name: Prepare ArgoCD for new GKE Cluster
runs-on: ubuntu-latest
steps:
- name: Prepare ArgoCD for new GKE Cluster Step
uses: aurelien-baudet/workflow-dispatch@v2
with:
workflow: 'prepare-new-gke-cluster.yaml'
repo: 'mehmetsalgar/fsm-akka-argocd'
ref: "master"
token: ${{ secrets.PERSONAL_TOKEN }}
wait-for-completion: true
wait-for-completion-timeout: 5m
wait-for-completion-interval: 10s
Now one really nice feature of ArgoCD Operator, our installation in our Default GKE Cluster can control the deployment to our newly created GKE Cluster with the helm of the following configuration.
We can configure our GKE Cluster ‘fsmakkaGKEDayOver‘, over ‘fsmakkaGKE‘ but we have to give ArgoCD the authorisation information with the help of the following Kubernetes Secret.
This will create necessary configuration in ArgoCD ‘Project’ Custom Resource Definition.
If we create a Feature Branch for ‘fraud-prevention’ and create a Pull Request, ArgoCD is creating the environment in the new Cluster.
As you can see ArgoCD is deploying to ‘fsmGkeDayOver’ cluster and in Lens IDE we see that our Infrastructure and Service are deployed to new GKE Cluster (don’t worry about yellow triangles, I didn’t give enough resources to new GKE Cluster).
GKE Cluster Destruction
We also need a mechanism to destroy the environment at “19:00 o’clock”.
then we have to configure the network for our GKE Cluster, only interesting parts being that we are defining different IP ranges for Pods, Services and Terraform should ignore IP range changes because Query API for Secondary IP ranges does not deliver those every time in same time and Terraform unnecessary try to actualise the network.
# GKE cluster
resource "google_container_cluster" "fsmakka_cluster" {
name = "${var.project}-${var.cluster_name}"
location = var.zone
# We can't create a cluster with no node pool defined, but we want to only use
# separately managed node pools. So we create the smallest possible default
# node pool and immediately delete it.
remove_default_node_pool = true
initial_node_count = 1
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
vertical_pod_autoscaling {
enabled = var.vpa_enabled
}
ip_allocation_policy {
cluster_secondary_range_name = "${var.project}-${var.cluster_name}-gke-pods-1"
services_secondary_range_name = "${var.project}-${var.cluster_name}-gke-services-1"
}
addons_config {
network_policy_config {
disabled = false
}
}
network_policy {
enabled = true
}
lifecycle {
ignore_changes = [
node_pool,
network,
subnetwork,
resource_labels,
]
}
}
# Separately Managed Node Pool
resource "google_container_node_pool" "fsmakka_cluster_nodes" {
name = google_container_cluster.fsmakka_cluster.name
location = var.zone
cluster = google_container_cluster.fsmakka_cluster.name
node_count = var.gke_num_nodes
node_config {
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
]
labels = {
env = var.project
}
# preemptible = true
machine_type = var.machine_type
tags = ["gke-node", "${var.project}-${var.cluster_name}"]
metadata = {
disable-legacy-endpoints = "true"
}
}
}
the GKE configuration, the location of Cluster, number of nodes, machine type for node pool, all delivered via Variables, mostly over default values, but if you want to customise those you can do over Github Variables.
One variable that does not have a default is the Custer Name which you have to define over Github variable.
This blog is the detail analysis of an implementation of how to host Akka / Pekko Finite State Machine in a Spring Boot application, which this Netflix blog thinks also that is great idea, so you can use Dependency Injection, Spring Data Elasticsearch, Spring Cloud Kubernetes, Spring Profile, SpringTest support features of Spring Boot, as part of series of blogs explaining how to build an Event Sourcing application with the help of Akka / Pekko Finite State Machine.
Additionally I will explain here the configuration of the Google JiB tool to build Docker images, which saves Maven / Gradle users from hard to maintain cryptic Dockerfiles, gives them the possibility of daemon-less building of Docker Images of the Spring Boot applications and how it is optimises Docker Images in layers to speed up your development cycle which can observe with awesome Dive tool.
And finally how to deploy this Spring Boot application with the help of the Helm Charts.
This blog is a part of a series demonstrate the full implementation of Proof of Concept application with Event Sourcing principles using Akka Framework, Apache Kafka, Apache Cassandra and Elasticsearch which you can find it here, to be able to understand whole context of what here explained, I advice to start reading from that blog but if you just need implementation detail of the topic mentioned in the title please continue to read.
PS: I like to apologise for the quality of the some images, unfortunately WordPress reduces the image quality during the upload. For some images that you have to really see the details, I placed links to externally hosted images, so please click the images and you will be forwarded to these high definition images.
This project does not contain that much functionality, it only contains two classes, one for starting the Spring Boot application and one configuring Spring Context but all the integration tests are here and they are proofs that all the ideas in this blog functions, so we will look them closely.
Additionally under this project, there are here the Helm Charts for the deployment to Kubernetes and the Docker Image creation configurations but I will not go in details here about this topic but I will have further in blog a dedicated chapter / blog about it.
Docker / JiB
Our application is designed to run in Cloud / Kubernetes so we have to create a Docker Image from it. To achieve this goal, we can use the classical approach and use Dockerfile’s. My Problem with as Java / Scala developers, we have to learn another Syntax / Grammar. The developers of the Google had the same idea and developed JiB for Gradle and Maven with following missions statement. ‘Jib builds optimised Docker and OCI images for your Java applications without a Docker daemon – and without deep mastery of Docker best-practices‘.
Image Layers
It has additional advantages like you don’t need a docker daemon on your build machine and more important of all it can do real good optimisation that reduces your build times, by separating your application into multiple layers, splitting dependencies from classes. Now you don’t have to wait for Docker to rebuild your entire Java application – just deploy the layers that changed.
There is another extraordinary tool called Dive that show how the Docker Image Layers build and what are the advantages JiB brings with.
For ex, following show us JiB is clever enough to build a Layer for our Dependencies with a concrete version number, the reason for it, we will not change the version number for Akka / Pekko or Spring not every commit / build (may be every six months or may be years but defintetly not every commit), it makes no sense to build this Image Layer for every checkin in our in Continuous Build environment or even for local development, as you can see Layer size is 150MB which will costs lots time if we build 30 times per day. This is something you get for free in JiB, with Dockerfiles you can make this yourself but I can tell you this will cost you lots of Lines of Code.
Second picture shows us, JiB is even clever enough to pack our direct project dependencies (the code that belongs to us but provided as -SNAPSHOT dependencies) to another layer with the assumption that our source code in the local project has a much bigger possibility to be changed then our project dependencies, so this should layer should not build every time.
It even packs our application configuration data to another layer.
And finally another layer for our code.
Couldn’t we achieve the same goal by writing our own Dockerfiles, sure but why should we waste time by teaching everybody in our project the syntax of Dockerfiles and dealing with it complexities while JiB is already doing it for us.
Most important configurations are base image configuration, I am using Zulu JDKs as base image ‘azul/zulu-openjdk:17.0.2‘ (mainly because I am using a Mac Notebook with M1 chip and it is one of the JDK that currently perform good with M1) which I hear real good comments about it and the second configuration is the Image name, the version tag of the image (based on the Gradle Project version) and destination Docker Registry that we will install the image.
You might see many more configuration parameters for JiB in the documentation but only one interesting for us here is the ‘extraDirectories‘ if we want extra directory and its permission. To create this directory you should create this directory physically under ‘fsm-akka-4eyes-application’ with the following naming convention.
so our Akka / Pekko Cluster can write it’s Cluster information to this directory if necessary, we also let the JiB give ‘644’ chmod permissions to this directory.
And voila. with so few configurations and hustle you will get your layered Docker image.
Native Image
With Spring Boot 3.X and Google JiB it is really easy to create ‘ native-image’ with GraalVM as I explained extensively in this blog so I will not explain here, please check the link to learn reach unbelievable low start up times with Spring Boot..
Kubernetes Configuration
Helm
As I mentioned multiple times in this Blog our mechanism to deploy our Proof of Concept to Cloud/Kubernetes would be via Helm Charts, for that we have to package our application via Helm and we to create a structure for it.
Thankfully ‘helm init‘ command prepare a template that fulfil the %90 of our requirements and you just have to fill the blanks. Industry ‘Best Practice‘ is to keep your Helm Chart definitions in the same repository of Scala / Java Code, so you don’t have to jump between the repositories and the fact that you developers knows best how to configure your application, then this Helm Chart will be packaged with the help of an Gradle Plugin and deployed to Helm Registry, so it can be referenced from an Umbrella Helm Chart managing your whole system (Application and Infrastructure components).
If we look to the ‘Chart.yaml’.
apiVersion: v2
name: fsm-pekko-foureyes-application
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 1.0.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "${appVersion}"
In which we are defining the Helm Chart Name ‘fsm-pekko-foureyes-application‘, Version ‘1.0.0‘ and appVersion, Helm Chart two Version concept, one is the Version of the Helm Chart, which in theory should not change, as long the configuration of Helm Chart changes and one Application Version, which most of the people relates to the Version of the Docker Images. So as long as only your Docker Image version changes, you should not increment the version of the Helm Chart with Image changes but only with Helm configuration changes and use the ‘appVersion‘ for Image changes.
While this would be to nice to be true, there is one problem with when you reference another Helm Chart as dependency, the Helm dependency resolution mechanism respects only ‘version‘, so if you are going to use umbrella charts like me, you might have to change the ‘version‘ when Docker Image version changes otherwise not.
Now if you look to the above snippet, I am using a variable notation for ‘appVersion: “${appVersion}”‘, the Gradle Plugin that we use ‘org.unbroken-dome.helm‘ has nice feature called ‘filtering‘ which help us replace some values in Helm file, which you will see in Gradle Configuration chapter.
Image Configuration
# Default values for fsm-pekko-4eyes-application.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 2
image:
repository: "k3d-${imageRepository}"
pullPolicy: Always
# Overrides the image tag whose default is the chart appVersion.
tag: "${imageTag}"
And the interesting parts of the ‘valus.yaml’, while we want to test our application in Kubernetes with Akka Cluster Sharding, we are starting two instances of our application (this is the place you upscale your application if you need more instances, in further chapters I will show how you can configure the Horizontal Pod Autoscaling over Helm). Next point, is telling the Helm Chart which Docker Image to use (which is previously.build with JiB and deployed to our Docker Registry in our k3d instance)., while we are working with a SNAPSHOT Version for this reason ‘pullPolicy’ is ‘Always’. Finally ‘tag’ defines which Docker Image tag that we want to use, while this is a development version, we have SNAPSHOT version and not a concrete Version.
We are again using filtering feature of ‘org.unbroken-dome.helm‘ Gradle Plugin and the values of ‘repository: “k3d-${imageRepository}”‘ and ‘tag: “${imageTag}”‘ will be replaced.
This configuration block, tells Kubernetes to how much CPU time and memory it should allocate for this pod.
Readiness / Liveness Check
As you can remember we discussed in ‘fsm-pekko-akkasystem‘ chapter hte topic of HealthCheck / Readiness / Liveness, this is the mechanism decides for Kubernetes an application is successfully started and the Pod initialisation process is complete and it is ready accept network traffic.
If the Liveness return false after certain period of the time, Kubernetes will try to restart and restart the Pod until certain number of tries before it switch to complete Error mode. This is the reason you don’t need humans observing system 24 / 7 to restart the applications, it would occur automatically.
Second check Readiness decides the Pod should receive network traffic, imagine you have 5 Akka Nodes, if for some reason Node 5 receives can’t process requests because of some internal problem but still receiving Network request, Kubernetes is clever enough not to direct requests to Node 5, with the knowledge that can’t be anyway processed. At this point thanks to Akka Cluster Sharding, the other 4 Akka Nodes will take the load and process. If Akka Node will be healthy again and Liveness Check returns positive results, Node 5 will be again included Cluster Sharding and can process requests. If the Liveness Check does not return positive after a while the Pod will be started new.
Below the configuration of the Helm Chart for Readiness / Liveness Check.
As you can see, we are installing Apache Cassandra, Traefik, Aapache Kafka, Elasticsearch, Nexus and finally our Akka / Pekko Application ‘fsm-akka-4eyes-application’ with the help of this Helm Chart.
Autoscaling
This Topic become too big to cover on this blog entry, so I created a dedicated blog for it.
If you want to learn more about it, please check that one.
Gradle Configuration
To be able use ‘fsm-akka-4eyes-application‘ Helm Chart from the Umbrella Helm Chart, we have to upload it to a Helm Repository (for our PoC I choose Nexus and you can see here how to setup) and to package / upload our Helm Chart we will use a Gradle Plugin.
For Helm definition we have to give a name to our chart, naturally our Gradle project name and a Version, our Gradle project version, then the things gets interesting, remeber the filtering feature of the Helm plugin I mentioned, we can here replace the values of ‘appVersion’, ‘Docker Image’ and ‘Docker Tag’, please pay attention that the values for these replacements supplied in collaboration with JiB plugin.
The rest of the configuration is about how publish to Nexus Repository and authentication information etc.
One thing I like to point out, for every Helm Chart Publishing you have to increase the Version of the Chart for every upload, Nexus does not accept upload if the Version already exist in Nexus Helm Repository (or you can delete the Version from Nexus).
You can upload Helm Charts to a Helm repository with following command.
gradle :fsm-pekko-4eyes-application:helmPublish
The commands how to install these Helm Charts to Kubernetes would be dealt in another chapter.
Application Initialisation
Now that we dealt with how we package our application in Docker and deliver with Helm, let’s look how we initialise it with Spring.
As you can see there is nothing fancy about the Spring Boot configuration, we define in ‘@Import‘ statement which Spring Context configuration classes should be inspected and with ‘scanBasePackages‘ parameters tells Spring Boot which packages should be scanned for Component Scan.
package org.salgar.fsm.pekko.foureyes;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.salgar.fsm.pekko.elasticsearch.OffsetFacade;
import org.salgar.fsm.pekko.foureyes.addresscheck.kafka.stream.AdressCheckSMStreamConfig;
import org.salgar.fsm.pekko.foureyes.addresscheck.protobuf.AdressCheckSMCommand;
import org.salgar.fsm.pekko.foureyes.credit.kafka.config.TopicProperties;
import org.salgar.fsm.pekko.foureyes.credit.kafka.facade.AskFacade;
import org.salgar.fsm.pekko.foureyes.credit.kafka.stream.CreditSMStreamConfig;
import org.salgar.fsm.pekko.foureyes.credit.protobuf.CreditSMCommand;
import org.salgar.fsm.pekko.foureyes.creditscore.kafka.stream.CreditScoreSMStreamConfig;
import org.salgar.fsm.pekko.foureyes.creditscore.kafka.stream.MultiTenantCreditScoreSMStreamConfig;
import org.salgar.fsm.pekko.foureyes.creditscore.protobuf.CreditScoreSMCommand;
import org.salgar.fsm.pekko.foureyes.creditscore.protobuf.MultiTenantCreditScoreSMCommand;
import org.salgar.fsm.pekko.foureyes.fraudprevention.kafka.stream.FraudPreventionSMStreamConfig;
import org.salgar.fsm.pekko.foureyes.fraudprevention.protobuf.FraudPreventionSMCommand;
import org.salgar.fsm.pekko.foureyes.projections.CreditSMProjection;
import org.salgar.fsm.pekko.foureyes.projections.CreditSMProjectionHandler;
import org.salgar.fsm.pekko.kafka.config.ConsumerConfig;
import org.salgar.fsm.pekko.pekkosystem.ActorService;
import org.springframework.boot.context.event.ApplicationReadyEvent;
import org.springframework.context.event.EventListener;
import org.springframework.stereotype.Component;
@Component
@RequiredArgsConstructor
@Slf4j
public class Starter {
private final ActorService actorService;
private final TopicProperties topicProperties;
private final AskFacade askFacade;
private final CreditSMProjectionHandler creditSMProjectionHandler;
private final OffsetFacade offsetFacade;
private final ConsumerConfig<String, CreditSMCommand> creditSMConsumerConfig;
private final ConsumerConfig<String, CreditScoreSMCommand> creditScoreSMConsumerConfig;
private final ConsumerConfig<String, MultiTenantCreditScoreSMCommand> multiTenantCreditScoreSMConsumerConfig;
private final ConsumerConfig<String, AdressCheckSMCommand> adressCheckSMConsumerConfig;
private final ConsumerConfig<String, FraudPreventionSMCommand> fraudPreventionSMConsumerConfig;
@EventListener(ApplicationReadyEvent.class)
public void initialised(ApplicationReadyEvent applicationReadyEvent) {
log.info("FSM Pekko 4eyes Initialised!");
CreditSMStreamConfig.apply(
creditSMConsumerConfig,
actorService,
topicProperties,
askFacade);
...
CreditSMProjection.init(
actorService.actorSystem(),
creditSMProjectionHandler,
offsetFacade);
}
}
This class get over Spring Dependency Injection all components necessary to start Akka Alpakka Kafka Stream / Pekko Kafka Connectors to be able receives Events for our Akka Finite State Machine from Apache Kafka. For starting Kafka Stream, we are waiting ‘@EventListener(ApplicationReadyEvent)‘ which will signal that all Spring Components are initialised and our application is ready to accept Events.
Now let’s look to the integration tests, which are the proof that this Proof of Concept works.
Proof of Concep Tests
Until now we discussed the theory of Event Sourcing with Apache Kafka and Akka / Pekko Finite State Machine, now it is time to look all these theories functions or not
Naturally we have to first setup our tests, interesting part here is how we are configuring the @EmbeddedKafka for test environment with the necessary Topics and URL / Port that Apache Kafka should Kafka listens along with some configuration that is necessary from mocking of Notifier Service.
Our initial test would test a full workflow from submitting a credit request to it’s positive resolution. First we have to prepare payload objects of our first Event.
@Test
@SneakyThrows
public void creditAcceptedTest() {
final String creditUuid = UUID.randomUUID().toString();
final Customer customer =
new Customer(
"John",
"Doe",
"123456789X",
new Address(
"muster strasse 1",
"11A",
"city1",
"country1"
),
"customer1@test.info");
final List<Customer> creditTenants = new ArrayList<>();
creditTenants.add(customer);
Map<String, Object> payload = preparePayload(creditUuid, 100000.0, creditTenants);
creditSMFacade.submit(payload);
Thread.sleep(WAIT_TIME_BETWEEN_STEPS);
...
After this part of test runs, this is what we see in logs.
as you can see with ‘onSubmit‘ event the State Machine will transition to State (please remember that WAIT_APPROVAL state has a Nested Submachine that is the reason State name looks that complex).
but it will not be as straightforward as you expect like the previous States. Approval of the Sales Manager also triggers our inquiries to our Partner Systems about this Credit Approval. To handle this complexity ‘SALES_MANAGER_APPROVED’ has a Nested / Sub State Machine (do you remember our discussion about State Explosion and managing complexity by dividing and conquering?).
Now to progress with our test, we should receive responses from our Partner Systems, in which order we receive the responses is not relevant for State Machine. This test will receives first the response from CustomerScore Slave State Machine but in other tests under this project you simulate the responses arrives in different order, you can observe these tests yourself to convince yourselves.
As you can see, we are checking ‘onResultReceived‘ Event is originated from Customer Score Slave State Machine or not, when we were simulating the response in the test, we placed this flag in the Event payload, the flag ‘SOURCE_SLAVE_SM_TAG’.
Now if you are paying attention, this Transition looks quite different then the previous, while we are using the wonderful advantages Nested Submachines, while we previous two States in the Submachine handled but final State change will return us to the original State Machine.
while this Guard ‘isResultSufficientGuard‘ is not in the Submachine it must deal with the fact it can be an answer from any of our the Partner Systems, additionally it should also check the informations delivered from our Partner Systems, to decide we can continue with our workflow.
There are two other Transitions that are reacting ‘onResultReceived‘.
One is dealing the scenario, Credit Score for an application is not quite enough, a human interaction must happen, a Senior Manager should decide to continue with credit process or reject the credit application.
Or the case, the results from Partner Systems dictates, the Credit Application must be flat out rejected .
These additional scenarios will be dealt in additional tests.
Now that we have the results, we place the results in the Control Object in the Master State Machine and notify next person to let the workflow to continue.
And the assertions but please pay the attention for the difference previous one, as I just explained ‘CREDIT_ACCEPTED‘ is final state and it terminates / stops the CreditSM Actor so if we use our standard ‘reportState’ assertion will fail because the Actor was probably stopped. So we will make our assertion over the Elasticsearch, for this purpose we have to create Spring Data Elasticsearch Repository for CreditSM.
Now we completed the analysis of our first test, which was observing a positive test for completing a successful workflow. If you remember our initial discussion, I presented as one of the biggest advantage of State Machine the iterative approach, now either we are in the initial phase of our project, our design not complete or we made a mistake, we oversaw a State or a Transition. Let’s observe in the following Test, how our System would behave in the case of an missing transition information.
The test is nearly the same as previous one, the problem / bug that we would simulate, would be the event that Sales Manager approved and our State Machine progressed further in the workflow for accepting the results from our Partner Systems, but the client of our State Machine (may be a Web Interface) had send the Sales Manager Approved twice (which can happen easily in any Web Application or Messaging System).
If you check the UML State Machine, ‘WAITING_CREDIT_ANALYST_APPROVAL‘ State has no Transition for Trigger ‘onSalesManagerApproved’.
And State Machines does exactly what we expect, report the problem.
[o.s.f.a.s.f.StateMachineFacade] [FourEyes-akka.actor.default-dispatcher-18] [akka://FourEyes/system/creditSMGuardian] – We are processing onSalesManagerApproved(payload): {creditUuid=01e82520-cb35-41a2-9f27-1b5b4bbb4445, creditTenants=[Customer(firstName=John, lastName=Doe, personalId=123456789X, address=Address(street=muster strasse 1, houseNo=11A, city=city1, country=country1), email=customer1@test.info)]} [WARN ] [o.s.a.f.a.FSMAspect] [FourEyes-akka.actor.default-dispatcher-18] [] – Unhandled transition! call(public akka.persistence.typed.scaladsl.EffectBuilder akka.persistence.typed.scaladsl.Effect..unhandled()) event: onSalesManagerApproved(org.salgar.fsm.akka.foureyes.credit.CreditSMGuardian$,{creditUuid=01e82520-cb35-41a2-9f27-1b5b4bbb4445, creditTenants=[Customer(firstName=John, lastName=Doe, personalId=123456789X, address=Address(street=muster strasse 1, houseNo=11A, city=city1, country=country1), email=customer1@test.info)]},null) state: CREDIT_APPLICATION_SUBMITTED_WAITING_CREDIT_ANALYST_APPROVAL ( {fraudPreventionResult=true, salesManagerNotificationList=[salesmanager1@example.com, salesmanager2@example.com], creditScoreTenantResults={123456789X=CreditTenantScoreResult(personalId=123456789X, creditScore=73.72)}, sourceSlaveSMTag=addressCheckSM, creditUuid=01e82520-cb35-41a2-9f27-1b5b4bbb4445, creditTenants=[Customer(firstName=John, lastName=Doe, personalId=123456789X, address=Address(street=muster strasse 1, houseNo=11A, city=city1, country=country1), email=customer1@test.info)], addressCheckResult=true}) java.lang.RuntimeException: Unhandled transition! at org.salgar.fsm.akka.foureyes.credit.CreditSM.unhandled_aroundBody53$advice(CreditSM.scala:27) at org.salgar.fsm.akka.foureyes.credit.CreditSM.$anonfun$commandHandlerInternal$14(CreditSM.scala:1128) at org.salgar.akka.fsm.base.actors.BaseActor.base(BaseActor.scala:34) at org.salgar.fsm.akka.foureyes.credit.CreditSM.commandHandlerInternal( CreditSM.scala:1074) at org.salgar.fsm.akka.foureyes.credit.CreditSM.commandHandler(CreditSM.scala:239) at org.salgar.fsm.akka.foureyes.credit.CreditSMGuardian$.$anonfun$prepare$3( CreditSMGuardian.scala:211 ) at akka.persistence.typed.internal.Running$HandlingCommands.onCommand( Running.scala:260 )
You can see State Machine reports exactly where the unexpected Event is received, Event type, what was the Payload of the Event and current State of the State Machine and content of the Control Object.
What you see here, is the biggest advantage of a State Machine, if this is a Use Case overseen by you, you have to teach this Use Case to the State Machine or if it is a bug, that is something you have to fix for next iteration.
Now think how you use to program this with ‘IF Salads‘, your program will just do something here, now it might be correct thing or a colossal bug, what it think. it fits best. Not with the State Machine, it will stop here and ask you to this teach to it, how exactly it should behave here.
Long Running Workflow / Persistance Recovery Test
If you remember the previous chapters of this blog, I mentioned one of the biggest advantages of the Akka / Pekko Framework was its capability to persist long running workflows. You have seen in the Positive Test scenario, our Test executed in 10s but this is not how the real word works, isn’t it? May be Sales Manager’s research to approve the Credit takes the next 10 days and he/she could not give its approval directly.
So what should happen, should we kept the CreditSM Actor then days in memory, which might not be possible because we might run into a resource bottle neck and we might have to stop and unload some Actors from the Memory until another event arrives for this Actor or may be our Kubernetes Cluster observes reducing loads and decides to auto downscale number of Pods for our application or in these 10 days, we might release a new version our application so we have to restart it so the Actor would not be in the memory.
Following test would simulate these scenarios, first part of the test will bring the State Machine to a certain state and stop it, shutdown the Java Virtual Machine. Then the second part of the Test will take over and recover CreditSM Actor and continue with the workflow(while first part of the test, create a unique id for the Credit Application and second part must recover with this unique id, this id is unique per day, if you want to run this test twice in one day, you have to delete the Cassandra Database for it).
Now if you look to Mock configuration above you will see something really interesting, we have two Credit Tenants for the Credit Application, normally we should have two Credit Scores but we simulate here that before the second Credit Score result received a system shutdown will occur (while we work with Apache Kafka after the restart Event will still be there), to prove that State Machine can deal with any scenario after Akka / Pekko recovery..
Before Test shutdowns, we check that we are at the correct State.
Application stops when Fraud Prevention, Address Check and one of the Credit Score Result Received, so we will not be in State ‘WAITING_CREDIT_ANALYST_APPROVAL‘ but in ‘FRAUDPREVENTION_ADDRESSCHECK_RESULT_RECEIVED‘.
Now lets continue the Test after the Recovery of the CreditSM Actor.
First we check the after the recovery, we are at correct State ‘FRAUDPREVENTION_ADDRESSCHECK_RESULT_RECEIVED’.That proves that after restart, Akka recovered CreditSM Actor.
Now we have to simulate missing Credit Score Result Event (if you are asking should Apache Kafka do this, this module does not deal with Kafka but here additional Integration Tests dealing with Kafka under module ‘fsm-pekko-4eyes-kafka‘).
Customer customer = creditTenants.get(0);
log.info("Sending Credit Score Result for Customer: {}", customer);
Map<String, Object> creditScorePayload = new HashMap<>();
creditScorePayload.put(PayloadVariableConstants.CREDIT_SCORE_RESULT, 83.45);
creditScoreSMFacade.resultReceived(
() -> creditUuid + "_" + customer.getPersonalId(),
creditScorePayload);
This will bring us to following State.
And naturally we will assert that we reached the correct State.
This whole Proof of Concept application is called ‘Four Eyes Credit Approval’ but we didn’t see until now that much about this ‘Four Eyes‘ principle. The State Machine that we modelled, for the credit amounts more than 2 000 000, State Machine will switch to Four Eyes mode, which means every decision must be approved by two person.
have Nested Sub State Machines ( Submachine ). These Submachine will for amounts less than 2 000 000 as money amount would accept only one of the available managers approval, for higher amounts these Sub State Machines and Guard Conditions will not accept approvals unless all of the Managers approves.
Also normal workflow only requires the Approval of a Senior Manager, if the Credit Score from the applicants are in a certain range, if the credit amount is more than 2 000 000 approval of the Senior Manager would be mandatory.
You can see this Test ‘creditAmountHighCreditAcceptedTest‘ n the following Java Test Class.
Major difference to the previous tests being, two different Managers should send the Approval Events for the Workflow to proceed.