Writing own Kubernetes operator in Java
Operators introduced by CoreOS in 2016 are now considered early majority on Categories of Adopters scale.
It means that the technology is becoming pretty mainstream. They use Kubernetes Server API to enforce some operational patterns for a deployment of application. This way application developers can translate the "domain" knowledge to Kubernetes land.
I experimented with them lately, but not enough skills of Go discouraged me at first. But then I realised that I can try to write an Operator in my "mother tonque" - Java. In this post I will share couple of reflections on the topic.
TLDR Here is the hello world I wrote.
Anatomy of an operator
First and most important of all, Operator is an extension to Kubernetes. The core idea is to capture what human operator/admin would normally do and encode it in software.
At some point when Kubernetes was adopted by more and more users there occured an obvious need to extend it's API. A resource in Kubernetes is an endpoint that groups several API objects. That is for example pods resource groups pods and allows actions like get, delete, patch and so on to be acted on them.
But operators are extension to the system so there must be a way to represent their objectives Kubernetes (declarative) proper way. And there is - Custom Resources.
Whenever you write a new Operator, you will want to create custom resource for it. As soon as you register your custom resouce in Kubernetes it will start to serve it from API Server and all actions like get, delete and others will be available. And then your Operator will monitor this new resource and take appropirate actions when desired.
Control loop
Demystification no. 1 is that Operators are nothing new. Kubernetes is driven internally by Controllers. From the definition they implement control loop.
Funny thing, the term is actually taken from robotics and it is nonterminating loop that corrects the state of the system. What does it mean? For example one simple controller could watch that the certain pod is present in system all the times. Whenever it detects that pod is down it will take steps to restart it and correct the "glitch".
Now, operators are basically the same thing, but as a form of extension. Kubernetes is very permissive when it comes to it's API. So as long as you can reach the Server API (and have rights) then you can do any thing with it. So here is the thing -- operator is a custom controller. But: when it boils down to it's core -- it's just a pod! Like any other. I mean you will probably want to add replication, leader election and so on, but this is still a piece of software with an access to Kubernetes API.

To sum up: operator watches events from the server (it can), or reads the status of current resources and matches that with whats specified in CRD. If there is a mismatch it does reconcillation. This is just another term for control loop. But outside Kubernetes core API.
Kubernetes Client in Java
Go to solution when dealing in Kubernetes in Java is Fabric8 client. You also want to watch this presentation.
You init the client very easily:
KubernetesClient client = new KubernetesClientBuilder().build();
And then there is a matter of reading the api and ivoking appropirate methods:
var currentServingDeploymentNullable = client.apps().deployments().inNamespace("default")
.withName("web-serving-app-deployment").get();
var currentServingDeployment = Optional.ofNullable(currentServingDeploymentNullable);
Informers
So, the basic idea is to forever run the control loop, sleep couple of seconds and do everthing again. But this is not the best way to do this. Ideally you would want to only do work when something changes. Actually you can do this easily as well.
Kubernetes has a notion of informers. These are WebSocket connections-subscriptions to changes of particular resources. So for example, you could watch all the pods in the namespace for changes and get informed when anything changes at all.
This leads to following solution - in the end of reconcillation block on the monitor:
private static Object changes = new Object();
...
synchronized (changes) {
changes.wait();
}
Whenever something changes our informer will let us unblock the control loop:
var handler = new GenericResourceEventHandler<>(update -> {
synchronized (changes) {
changes.notifyAll();
}
});
crdClient.inform(handler).start();
The Callback is a little bit bloated so I abstracted it away like so:
public class GenericResourceEventHandler<T> implements ResourceEventHandler<T> {
private final Consumer<T> handler;
public GenericResourceEventHandler(Consumer<T> handler) {
this.handler = handler;
}
@Override
public void onAdd(T obj) {
this.handler.accept(obj);
}
@Override
public void onUpdate(T oldObj, T newObj) {
this.handler.accept(newObj);
}
@Override
public void onDelete(T obj, boolean deletedFinalStateUnknown) {
this.handler.accept(null);
}
}
Deployment
So when you are done with the implementation you will probably want to deploy the operator... I wrote it as the Spring Boot application and some interesting stuff happened on the way.
Tip 1
You will want to have a private repository, and I chose GHCR. You can set it up and download the passcode. Then create secret for kubernetes:
kubectl create secret docker-registry regcred \
--docker-server=ghcr.io \
--docker-username=dgawlik \
--docker-password=$GITHUB_TOKEN
Tip 2
You have to create CRD of course. Actually fabric8 client got you covered:
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-client</artifactId>
<version>6.13.4</version>
</dependency>
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>crd-generator-apt</artifactId>
<version>6.13.4</version>
<scope>provided</scope>
</dependency>
Whenever you create CRD classes it will generate the CRD manifest so you can kubectl apply it. So for example:
@Group("com.github.webserving")
@Version("v1alpha1")
@ShortNames("websrv")
public class WebServingResource extends CustomResource<WebServingSpec, WebServingStatus> implements Namespaced {
}
public record WebServingSpec(String page1, String page2) {
}
public record WebServingStatus (String status) {
}
Tip 3
You will want to create native images with GraalVm to speed things up. If you don't have a lot of memory then you can tradeoff the quality of binary for building time/resources.
<build>
<plugins>
<plugin>
<groupId>org.graalvm.buildtools</groupId>
<artifactId>native-maven-plugin</artifactId>
<configuration>
<buildArgs>
<buildArg>-Ob</buildArg>
</buildArgs>
</configuration>
</plugin>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<image>
<publish>true</publish>
<builder>paketobuildpacks/builder-jammy-full:latest</builder>
<name>ghcr.io/dgawlik/webpage-serving:1.0.5</name>
<env>
<BP_JVM_VERSION>21</BP_JVM_VERSION>
</env>
</image>
<docker>
<publishRegistry>
<url>https://ghcr.io/dgawlik</url>
<username>dgawlik</username>
<password>${env.GITHUB_TOKEN}</password>
</publishRegistry>
</docker>
</configuration>
</plugin>
</plugins>
</build>
And second when you set publish property to true, the package step will automatically push the image to your repositry.
The third - you pass -Ob parameter to GraalVM. This will insturct the runtime to do fastest, cheapest build possible. And of course - BP_JVM_VERSION has to be java of your project or things will not work.
And last thing -- if you want debug the container, you will have to choose paketobuildpacks/builder-jammy-full:latest as other buildpacks don't include shell (shame).
Conclusion
I haven't covered everything, but everything else is in the repo. The repo is proof of concept that operators in Java are not complicated at all. I would say that even they are easier than in Go. So in the repo you will find following:
- spring-boot static server
- operator that watches CRD and mounts config maps in server so that it can serve the websites from the CRD
So this basically is Operator hello world.