Topics In Demand
Notification
New

No notification found.

A step-by-step guide to writing a Kubernetes Operator in Golang
A step-by-step guide to writing a Kubernetes Operator in Golang

January 21, 2025

3

0

Prerequisites

  1. Basic Knowledge of Kubernetes: Familiarity with pods, deployments, and CRDs
  2. Golang Development Environment: Go version 1.18 or later
  3. kubectl and Minikube: For deploying and testing your operator
  4. Operator SDK: A CLI tool for scaffolding and building Kubernetes Operators

As a Kubernetes user, you’ve likely experienced its power in managing stateless applications. With Kubernetes, you can easily deploy, scale, and manage these applications without much manual intervention.

However, when it comes to stateful applications, things get a bit more complex. These applications require persistent storage, complex configuration, and careful state management. Kubernetes, while powerful, doesn’t natively address these specific needs.

Why are Kubernetes operators needed?

Kubernetes is a fantastic tool for managing your stateless applications. You can easily define the desired state for your applications, and Kubernetes works hard to maintain it. For instance, you can deploy a Nginx deployment with 3 replicas, and Kubernetes ensures they’re all up and running, ready to handle web traffic.

But what about your stateful applications? These are the ones that require persistent storage, complex configuration, and careful state management. Kubernetes, while powerful, doesn’t natively handle these complexities. This is where Kubernetes Operators come to the rescue. They automate the management of stateful applications, handling tasks like:]

  • Configuration: Ensuring your application is configured correctly.
  • Scaling: Automatically scaling your application up or down based on demand.
  • Backups: Taking regular backups of your data to protect against failures.
  • Upgrades: Seamlessly upgrading your application to newer versions.

For example, a PostgreSQL operator can automate the entire lifecycle of a PostgreSQL database, from initialization to scaling and backups.

In this blog, we'll dive into the world of Kubernetes Operators, exploring how to build your own in Golang. We'll even build a simple "PodWatcher" operator to demonstrate the core concepts.

Key concepts of writing an operator

  • Custom resource definitions - The purpose of Custom Resource Definitions is to extend the Kubernetes API with custom resource types. For example we can define a resource like PodWatcher with custom fields such as labelSelector or emailAddress. This enables users to create and manage domain-specific objects in Kubernetes. 
  • Custom resources - Instances of the custom resource type defined by a CRD.
  • Controllers - Controllers reconcile the actual state of a resource with its desired state, as defined in the CR. They continuously observe resources and check if the actual state matches the desired state and take corrective action to achieve the desired state. 
  • Reconciliation - The core logic of an Operator that ensures the actual state matches the desired state.
  • Informers - Cache resource states and notify controllers of changes.
  • Watchers - Provide real-time event streams from the Kubernetes API.
  • Service accounts and RBAC control the permissions an Operator has in the cluster.
    • Service account -  An identity for the Operator.
    • Role/ClusterRole -  Defines what actions the Operator can take (e.g., list pods, update CRs).
    • RoleBinding/ClusterRoleBinding - Grants the Operator access to resources.

Steps to write a Kubernetes Operator in Golang

Now let’s implement an operator which will monitor a PodWatcher Resource and alert users when a pod is restarted. 

Step 1: Install the Operator SDK

brew install operator-sdk

Step 2: Create an Operator project

Start by creating an Operator project using the Operator SDK:

operator-sdk init --domain example.com --repo github.com/JonesJefferson/operator-example

This sets up the foundational structure and configuration files for your Operator.

  • --domain example.com: Defines the default domain for your API group. This is used in CRD generation (example.com/v1).
  • --repo github.com/JonesJefferson/operator-example: Specifies the module path for the Go project. This is useful for dependency management.

Generated File Structure

├── config/
│   ├── crd/
│   ├── default/
│   ├── manager/
│   ├── rbac/
│   ├── samples/
│   └── scorecard/
├── controllers/
│   └── <empty initially>
├── Dockerfile
├── go.mod
├── go.sum
├── main.go
├── Makefile
├── PROJECT
└── README.md

config/ Directory:
  • Contains Kubernetes manifests for deploying the Operator, including:
    • CRD definitions (config/crd).
    • RBAC configurations (config/rbac) for the Operator.
    • Deployment manifests for the Operator controller (config/manager).
  • Templates and settings for default resources and namespaces.
controllers/ Directory:
  • Placeholder for custom controller logic. Initially empty but will contain controllers for custom resources.
main.go:
  • Entry point for the Operator.
  • Initializes the manager, which sets up controllers and handles reconciliation.
Makefile:
  • Includes useful targets to build, test, and deploy the Operator (e.g., make run, make docker-build).

Step 3: Create the API and controller

operator-sdk create api --group=core --version=v1 --kind=PodWatcher --controller --resource

This command adds a new Custom Resource Definition and corresponding controller to the project and also defines the API schema for the custom resource and scaffolds the controller logic.

--group=core: Specifies the API group (core.example.com).
--version=v1: Defines the API version (core.example.com/v1).
--kind=PodWatcher: Specifies the custom resource kind (PodWatcher).
--controller: Indicates that a controller should be generated for this resource.
--resource: Indicates that the CRD should be scaffolded.

Generated Files

podwatcher-operator/
├── api/
│   └── v1/
│       ├── podwatcher_types.go
│       ├── groupversion_info.go
│       └── zz_generated.deepcopy.go
├── controllers/
│   └── podwatcher_controller.go

api/v1/podwatcher_types.go:
  • Contains the Go struct definition for the PodWatcher resource.
  • Defines the schema of the resource (spec, status, etc.).
api/v1/groupversion_info.go:
  • Registers the API group and version with the Kubernetes scheme.
  • Ensures the custom resource can be serialized/deserialized.
controllers/podwatcher_controller.go:
  • Contains the reconciliation logic for the PodWatcher custom resource.
  • Scaffolds the Reconcile function, which is the heart of the controller.
func (r *PodWatcherReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // Reconciliation logic here
    return ctrl.Result{}, nil
}

Step 4: Modify the PodWatcher resource

Modify the PodWatcher API (api/v1/podwatcher_types.go) to include the fields for filtering pods and notification details.

package v1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.

// PodWatcherSpec defines the desired state of PodWatcher
type PodWatcherSpec struct {
    // INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
    // Important: Run "make" to regenerate code after modifying this file

    LabelSelector map[string]string `json:"labelSelector,omitempty"`
}

// PodWatcherStatus defines the observed state of PodWatcher
type PodWatcherStatus struct {
    LastPodRestartTime string `json:"lastPodRestartTime,omitempty"`
    // Important: Run "make" to regenerate code after modifying this file
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status

// PodWatcher is the Schema for the podwatchers API
type PodWatcher struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   PodWatcherSpec   `json:"spec,omitempty"`
    Status PodWatcherStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// PodWatcherList contains a list of PodWatcher
type PodWatcherList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []PodWatcher `json:"items"`
}

func init() {
    SchemeBuilder.Register(&PodWatcher{}, &PodWatcherList{})
}

After making this change, run “make” to regenerate the code

Step 5: Implement the controller logic

Edit the PodWatcher controller in controllers/podwatcher_controller.go:

func (r *PodWatcherReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    // Fetch the PodWatcher resource
    var podWatcher appv1.PodWatcher
    if err := r.Get(ctx, req.NamespacedName, &podWatcher); err != nil {
        logger.Error(err, "unable to fetch PodWatcher")
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

The Reconcile function is the core of the controller. It is invoked whenever there is a change (add, update, or delete) to a PodWatcher custom resource or any related resources being watched. In the above snippet we first get the PodWatcher resource. The LabelSelector in the PodWatcher specification determines which pods the operator should monitor.

podList := &corev1.PodList{}
   listOpts := &client.ListOptions{
       Namespace:     req.Namespace,
   }
   if err := r.List(ctx, podList, listOpts); err != nil {
       logger.Error(err, "unable to list pods")
       return ctrl.Result{}, err
   }

for _, pod := range podList.Items {
       // Check if the pod matches the label selector
       matches := true
       for key, value := range labelSelector {
           if pod.Labels[key] != value {
               matches = false
               break
           }
       }
       if matches {
           for _, status := range pod.Status.ContainerStatuses {
               if status.RestartCount > 1 {
                   message := fmt.Sprintf("Pod '%s' in namespace '%s' has restarted %d times!",
                       pod.Name, pod.Namespace, status.RestartCount)
                   fmt.Println(message)

                   // Update PodWatcher status
                   podWatcher.Status.LastPodRestartTime = time.Now().String()
                   if err := r.Status().Update(ctx, &podWatcher); err != nil {
                       logger.Error(err, "failed to update PodWatcher status")
                   }
               }

           }
       }
   }
return ctrl.Result{RequeueAfter: time.Second}, nil

In the above snippet, we list the pods and if they match the label selector mentioned in the PodWatcher resource, then we get the number of times that pod was restarted and print that info. We then set the next reconciliation loop to occur the very next second. 

Step 6: Build and push the image

make docker-build docker-push IMG=clivebixby/podwatcher-operator:v1.0.0

Step 7: Deploy the Operator

make deploy IMG=<your-dockerhub-username>/podwatcher-operator:v1.0.0

This will deploy the operator along with the resources needed to run the operator as well

build config/default | kubectl apply -f -
namespace/kubernetes-operators-system unchanged
customresourcedefinition.apiextensions.k8s.io/podwatchers.core.example.com unchanged
serviceaccount/kubernetes-operators-controller-manager unchanged
role.rbac.authorization.k8s.io/kubernetes-operators-leader-election-role unchanged
clusterrole.rbac.authorization.k8s.io/kubernetes-operators-manager-role unchanged
clusterrole.rbac.authorization.k8s.io/kubernetes-operators-metrics-auth-role unchanged
clusterrole.rbac.authorization.k8s.io/kubernetes-operators-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/kubernetes-operators-podwatcher-editor-role unchanged
clusterrole.rbac.authorization.k8s.io/kubernetes-operators-podwatcher-viewer-role unchanged
rolebinding.rbac.authorization.k8s.io/kubernetes-operators-leader-election-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-operators-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-operators-metrics-auth-rolebinding unchanged
service/kubernetes-operators-controller-manager-metrics-service unchanged
deployment.apps/kubernetes-operators-controller-manager configured

Step 8: Apply the PodWatcher CR

apiVersion: example.com/v1
kind: PodWatcher
metadata:
  name: podwatcher-example
spec:
  labelSelector:
    app: nginx

kubectl apply -f podwatcher.yaml

Step 9: Create ClusterRole and ClusterRoleBinding to allow operator to list pods

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-watcher-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "watch"]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pod-watcher-rolebinding
subjects:
- kind: ServiceAccount
  name: kubernetes-operators-controller-manager
  namespace: kubernetes-operators-system
roleRef:
  kind: ClusterRole
  name: pod-watcher-role
  apiGroup: rbac.authorization.k8s.io

And we’re done!

Now to test this. Create a faulty pod that restarts endlessly and watch the logs in the operator pod

jones.jefferson@OPLPT043 ~ % kubectl logs -f kubernetes-operators-controller-manager-55967d886b-p8gw8 -n kubernetes-operators-system
2024-12-11T14:17:54Z    INFO    setup    starting manager
2024-12-11T14:17:54Z    INFO    controller-runtime.metrics    Starting metrics server
2024-12-11T14:17:54Z    INFO    setup    disabling http/2
2024-12-11T14:17:54Z    INFO    starting server    {"name": "health probe", "addr": "[::]:8081"}
I1211 14:17:54.942626       1 leaderelection.go:250] attempting to acquire leader lease kubernetes-operators-system/525f3881.example.com...
2024-12-11T14:17:55Z    INFO    controller-runtime.metrics    Serving metrics server    {"bindAddress": ":8443", "secure": true}
I1211 14:18:25.606874       1 leaderelection.go:260] successfully acquired lease kubernetes-operators-system/525f3881.example.com
2024-12-11T14:18:25Z    DEBUG    events    kubernetes-operators-controller-manager-55967d886b-p8gw8_e8c06ec2-85d1-47ba-845d-c9aa3f2a3eda became leader    {"type": "Normal", "object": {"kind":"Lease","namespace":"kubernetes-operators-system","name":"525f3881.example.com","uid":"1d3c41a8-2f84-42ce-9aae-ff852afed66e","apiVersion":"coordination.k8s.io/v1","resourceVersion":"17951"}, "reason": "LeaderElection"}
2024-12-11T14:18:25Z    INFO    Starting EventSource    {"controller": "podwatcher", "controllerGroup": "core.example.com", "controllerKind": "PodWatcher", "source": "kind source: *v1.PodWatcher"}
2024-12-11T14:18:25Z    INFO    Starting Controller    {"controller": "podwatcher", "controllerGroup": "core.example.com", "controllerKind": "PodWatcher"}
2024-12-11T14:18:25Z    INFO    Starting workers    {"controller": "podwatcher", "controllerGroup": "core.example.com", "controllerKind": "PodWatcher", "worker count": 1}
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!
Pod 'ubuntu-pod' in namespace 'default' has restarted 13 times!

You've successfully built a Kubernetes Operator to manage PodWatcher resources using Golang. This example can be extended to include more complex logic, such as creating other Kubernetes resources or integrating external APIs. Operators empower you to encapsulate application logic in a declarative and Kubernetes-native way, enhancing automation and maintainability.


That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.


© Copyright nasscom. All Rights Reserved.