Peiman Jafari
Peiman Jafari Sr. Cloud Engineer at Skillz

Introducing Opvic

Introducing Opvic

Introduction

Kubernetes is complicated. It becomes more so when there are hundreds, if not thousands, of add-ons and applications deployed on it. Thanks to the mature ecosystem, many aspects of everyday operations can be automated by various tools for networking, monitoring, GitOps, DNS, log aggregation, etc. Just to name a few: datadog-agent, external-dns, Istio, and ArgoCD.

Each of these add-ons has its own release cadence. When it comes to new version upgrades, there are two common strategies:

Conservative and reactive

  • Keep the add-ons untouched as long as they are “working”
  • Only upgrade when it’s absolutely needed
  • Sometimes resulting in upgrades which span several major versions

Radical and proactive

  • Always plan for a change as long as there is a remaining error budget
  • Upgrade as soon as a new stable version is released
  • Usually minor version or patch version upgrades

Depending on the team culture, industry, and levels of automation, different teams’ strategies may sit anywhere in between. At Skillz, we practice DevOps culture and prefer frequent and gradual changes. We like the bug fixes, features, and security updates that come with new versions and want to avoid potential issues before even encountering them.

The problem is, how can we stay on top of new releases for all the add-ons that are deployed on Kubernetes? It’s not an out-of-box feature, and we didn’t find any existing solutions for this purpose. While GitHub lets us subscribe to repositories and emails us when there are new releases, there are some flaws with this strategy:

  • Not all projects are using GitHub Release and not all projects are on GitHub
  • The first question we often ask is: great there’s now a new version, but which version are we currently running, and what’s the difference between them?
  • Email notifications can quickly get out of control if there are many add-ons watched
  • You cannot build automation on top of GitHub notifications

We need a cloud-native solution that can track new versions from different release channels, detect the currently running versions, aggregate and present the version gaps, and enable further automation solutions.

Since we didn’t find an existing solution, we created Opvic and open-sourced it. This blog will walk you through the design, implementation, and usage of Opvic.

Architecture Design

Opvic started with scalability in mind, and that’s why we chose to decouple the process of tracking the running versions, configuration, and getting the remote versions.

Architecture Diagram

Agent

Agents are installed on all clusters and send the collected information to the centralized control plane.

  • Agents interact with K8s API to retrieve figure out the running version of each component
  • App discovery and configuration can be managed by custom resources (CRD)
  • Each agent has a unique identifier (normally, your cluster identifier)

App Discovery

By default, agents do not perform any action as Opvic’s agent acts as a Kubernetes operator that watches certain custom resources. CRDs are highly flexible in terms of configuration, as they support more components to track.

Based on the CRD, agents can discover Kubernetes resources such as Nodes, Deployments, Pods, etc. based on the standard label selector configuration. Then agents can extract the semver formatted versions from any field, such as image, label, or annotation.

Control Plane

For the initial implementation, the control plane will receive information from agents and store them in memory cache. Then, it aggregates the information and retrieves the remote versions based on the configuration sent by each agent. Overall, this Opvic component is responsible for:

  • Exposing an API endpoint for agents to send the collected information (and also exposing endpoints to query detailed information about each component
  • Storing the information in a memory cache with a configurable expiration duration
  • Interacting with external systems such as GitHub, Helm registries, etc. to retrieve the versions between the running version and latest
  • Exposes Prometheus format metrics to show running versions across all clusters, as well as available major, minor, and patches versions to upgrade

Installation

There is a Helm chart available for Opvic here. Begin by adding the repository:

1
helm repo add opvic https://skillz.github.io/opvic

Then deploy it with minimal required configuration. Check out values.yaml for all the available options:

1
2
3
4
helm upgrade --install --namespace opvic --create-namespace opvic opvic/opvic \
--set controlplane.enabled=true \
--set agent.enabled=true --set agent.identifier=test-agent \
--set sharedAuthentication.token=test

Ensure that pods are running and that you connect to the control plane:

1
2
3
4
5
6
7
8
9
10
11
12
13
kubectl get pods -n opvic

NAME                                   READY   STATUS    RESTARTS   AGE
opvic-agent-9b64fb88c-qfvnv            1/1     Running   0          34m
opvic-control-plane-56d6955f7d-wgttd   1/1     Running   0          34m

kubectl port-forward deployment/opvic-control-plane 8080:8080 -n opvic

Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

curl -H "Authorization: Bearer test" localhost:8080/api/v1alpha1/ping
pong

Note:

  • You only need to run control plane in one of your clusters
  • You need to deploy agent and CRD in all clusters
  • You don’t need an ingress if the control plane and agent run on the same cluster
  • VersionTracker resources should be deployed in all clusters

Examples

Now that the control plane and agent are running, you can deploy some VersionTracker resources. Below is an example of a VersionTracker resource:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
  name: myApp # name of the subject
spec:
  name: myApp # Unique identifier of the app to track
  resources: # How agent should find the resources to extract the version
    strategy: Pods # Resource Kind. Pods, Nodes, Deployments, etc (default to Pods)
    namespaces: # (optional) namespaces of the app (default to query all namespaces)
      - kube-system
    selector: # Kubernetes standard selector configuration
      matchLabels:
        app.kubernetes.io/name: myApp
  localVersion: # How agent should extract the version
   strategy: FieldSelection # strategy to use for the app (ImageTag, FieldSelection)
   fieldSelector: '.metadata.labels.myApp\.io/version' # Valid JsonPath to extract the version from the resource
   extraction: # Regex to extract the version from the resource
     regex:
       pattern: '^v([0-9]+\.[0-9]+\.[0-9]+)$'
       result: '$1'
  remoteVersion: # How control plane should find the remote versions
    provider: github # name of the provider (github, helm)
    strategy: releases # method to use to get the remote versions (releases, tags)
    repo: owner/repoName # name of the repository (owner/repoName)
    extraction:
     regex:
       pattern: '^myApp-v([0-9]+\.[0-9]+\.[0-9]+)$'
       result: '$1'
    constraint: '~>3' # Semver constraint to use to filter the remote versions

Note that if the remote versions are not exposed or the provider is not supported by Opvic yet, you can still track the running versions and not specify the remote version configuration.

Tracking CoreDNS From Container Image Tag

Let’s say you want to track the running version of CoreDNS on your cluster, as well as the available upstream versions. You can deploy a VersionTracker resource coredns.yaml like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
  name: coredns
spec:
  name: coredns
  resources:
    namespaces:
      - kube-system
    selector:
      matchLabels:
        k8s-app: kube-dns
  localVersion:
   strategy: ImageTag
  remoteVersion:
   provider: github
   strategy: releases
   repo: coredns/coredns
   extraction:
     regex:
       pattern: ^v([0-9]+\.[0-9]+\.[0-9]+)$
       result: $1

Let’s apply this:

1
kubectl apply  -f coredns.yaml -n opvic

Since most versions can be extracted from containers’ image tags, you can use the ImageTag strategy, which extracts the version from the first container image tag of the resource.

For remote versions, you can use the GitHub provider and look at releases by using releases strategy. You need to specify the GitHub repository and a regex for extraction.

Now you can query the control plane for running versions:

1
curl -H "Authorization: Bearer test" localhost:8080/api/v1alpha1/agents/test/coredns | jq

And you would get a response like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
 "id": "coredns",
 "namespace": "opvic",
 "count": 1,
 "uniqVersions": [
   "1.7.0"
 ],
 "versions": [
   {
     "runningVersion": "1.7.0",
     "resourceCount": 1,
     "resourceKind": "Pods",
     "extractedFrom": "k8s.gcr.io/coredns:1.7.0"
   }
 ],
 "remoteVersion": {
   "provider": "github",
   "strategy": "releases",
   "repo": "coredns/coredns",
   "extraction": {
     "regex": {
       "pattern": "^v([0-9]+\\.[0-9]+\\.[0-9]+)$",
       "result": "$1"
     }
   }
 }
}

Normally, it takes up to a minute for the control plane to aggregate collected data and retrieve remote versions since that process runs on an interval in the background.

1
curl -H "Authorization: Bearer test" localhost:8080/api/v1alpha1/agents/test/coredns/versions | jq

And you would get a response like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
 "id": "coredns",
 "agentId": "test",
 "resourceCount": 1,
 "runningVersions": [
   "1.7.0"
 ],
 "latestVersion": "1.8.6",
 "remoteProvider": "github",
 "remoteRepo": "coredns/coredns",
 "versions": [
   {
     "currentVersion": "1.7.0",
     "resourceCount": 1,
     "resourceKind": "Pods",
     "extractedFrom": "k8s.gcr.io/coredns:1.7.0",
     "latestVersion": "1.8.6",
     "availableVersions": [
       "1.7.1",
       "1.8.0",
       "1.8.1",
       "1.8.2",
       "1.8.3",
       "1.8.4",
       "1.8.5",
       "1.8.6"
     ],
     "availableMajors": [],
     "availableMinors": [
       "1.8.0",
       "1.8.1",
       "1.8.2",
       "1.8.3",
       "1.8.4",
       "1.8.5",
       "1.8.6"
     ],
     "availablePatches": [
       "1.7.1"
     ],
     "majorAvailable": false,
     "minorAvailable": true,
     "patchAvailable": true
   }
 ]
}

The control plane also exposes Prometheus metrics at /metrics endpoint, so let’s take a look at those:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# HELP opvic_controlplane_agent_last_heartbeat Last time the agent was seen
# TYPE opvic_controlplane_agent_last_heartbeat gauge
opvic_controlplane_agent_last_heartbeat{agent_id="test",tags=""} 1.639773192e+09
# HELP opvic_controlplane_major_versions_count Number of available major versions to upgrade to
# TYPE opvic_controlplane_major_versions_count gauge
opvic_controlplane_major_versions_count{agent_id="test",available_major_versions="",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 0
# HELP opvic_controlplane_minor_versions_count Number of available minor versions to upgrade to
# TYPE opvic_controlplane_minor_versions_count gauge
opvic_controlplane_minor_versions_count{agent_id="test",available_minor_versions="1.8.0,1.8.1,1.8.2,1.8.3,1.8.4,1.8.5,1.8.6",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 7
# HELP opvic_controlplane_patch_versions_count Number of available patch versions to upgrade to
# TYPE opvic_controlplane_patch_versions_count gauge
opvic_controlplane_patch_versions_count{agent_id="test",available_patch_versions="1.7.1",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 1
# HELP opvic_controlplane_requests_total The number of HTTP requests processed
# TYPE opvic_controlplane_requests_total counter
opvic_controlplane_requests_total{method="GET",path="/api/v1alpha1/agents/test/coredns",status="200"} 1
opvic_controlplane_requests_total{method="GET",path="/api/v1alpha1/agents/test/coredns/versions",status="200"} 1
opvic_controlplane_requests_total{method="GET",path="/api/v1alpha1/ping",status="200"} 1
opvic_controlplane_requests_total{method="POST",path="/api/v1alpha1/agents",status="202"} 15
# HELP opvic_controlplane_version_resource_count Number of resources running with a specific version
# TYPE opvic_controlplane_version_resource_count gauge
opvic_controlplane_version_resource_count{agent_id="test",extracted_from="k8s.gcr.io/coredns:1.7.0",latest_version="1.8.6",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 1
# HELP opvic_provider_github_rate_limit_remaining The number of requests remaining in the current rate limit window.
# TYPE opvic_provider_github_rate_limit_remaining gauge
opvic_provider_github_rate_limit_remaining 58

Extract the Version From Any Field

Now that you know how everything works, let’s look at some other examples. As you now know, you can extract the local version from any field. So let’s grab Kubelet version of your nodes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
kind: VersionTracker
metadata:
  name: kubernetes
spec:
  name: kubernetes
  resources:
    strategy: Nodes
    selector:
      matchExpressions:
      - key: kubernetes.io/hostname
        operator: Exists
  localVersion:
    strategy: FieldSelector
    fieldSelector: '.status.nodeInfo.kubeletVersion'
    extraction:
      regex:
        pattern: '^v([0-9]+\.[0-9]+\.[0-9]+)'
        result: $1
  remoteVersion:
    provider: github
    strategy: tags
    repo: kubernetes/kubernetes
    extraction:
      regex:
        pattern: ^v([0-9]+\.[0-9]+\.[0-9]+)
        result: $1

In this example we use FieldSelector strategy in the localVersion configuration and set the JsonPath to .status.nodeInfo.kubeletVersion to extract the Kubelet version from the node status. We also use tags strategy in the remoteVersion configuration to get the latest version from the remote repository.

Use appVersion of a Helm Repository

If the application that you are tracking does not have a GitHub repository, check to see if it has a Helm chart as appVersion in the helm charts normally represents the actual version. We can use helm provider and appVersion strategy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
  name: artifactory
spec:
  name: artifactory
  resources:
    namespaces:
      - artifactory
    selector:
      matchLabels:
        component: artifactory
  localVersion:
    strategy: ImageTag
  remoteVersion:
    provider: helm
    strategy: appVersion
    repo: https://charts.jfrog.io
    chart: artifactory

Track Your Helm Chart Versions

You can also track Helm chart versions using the helm provider and chartVersion strategy. Assuming the deployed helm chart version is exposed in one of the labels called chart:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
  name: artifactory-helm
spec:
  name: artifactory-helm-chart
  resources:
    namespaces:
      - artifactory
    selector:
      matchLabels:
        component: artifactory
  localVersion:
    strategy: FieldSelector
    fieldSelector: '.metadata.labels.chart'
    extraction:
      regex:
        pattern: ^artifactory-([0-9]+\.[0-9]+\.[0-9]+)$
        result: $1
  remoteVersion:
    provider: helm
    strategy: chartVersion
    repo: https://charts.jfrog.io
    chart: artifactory

Conclusion

Opvic is a tool that runs on Kubernetes, detects version gaps between releases and your deployments for your cluster add-ons, and enables you to develop further automation that suits your team’s version upgrade strategy. It currently supports software release channels on GitHub and Helm, and more will be added in the future.

Skillz has been benefiting from and contributing to the open-source community, and ultimately, that’s why we are open-sourcing Opvic today.

Please check out our GitHub repository Opvic and feel free to give it a try.

Also if this is something that interests you, we encourage you to learn more about joining our team here!