Introducing Opvic
Introduction
Kubernetes is complicated. It becomes more so when there are hundreds, if not thousands, of add-ons and applications deployed on it. Thanks to the mature ecosystem, many aspects of everyday operations can be automated by various tools for networking, monitoring, GitOps, DNS, log aggregation, etc. Just to name a few: datadog-agent, external-dns, Istio, and ArgoCD.
Each of these add-ons has its own release cadence. When it comes to new version upgrades, there are two common strategies:
Conservative and reactive
- Keep the add-ons untouched as long as they are “working”
- Only upgrade when it’s absolutely needed
- Sometimes resulting in upgrades which span several major versions
Radical and proactive
- Always plan for a change as long as there is a remaining error budget
- Upgrade as soon as a new stable version is released
- Usually minor version or patch version upgrades
Depending on the team culture, industry, and levels of automation, different teams’ strategies may sit anywhere in between. At Skillz, we practice DevOps culture and prefer frequent and gradual changes. We like the bug fixes, features, and security updates that come with new versions and want to avoid potential issues before even encountering them.
The problem is, how can we stay on top of new releases for all the add-ons that are deployed on Kubernetes? It’s not an out-of-box feature, and we didn’t find any existing solutions for this purpose. While GitHub lets us subscribe to repositories and emails us when there are new releases, there are some flaws with this strategy:
- Not all projects are using GitHub Release and not all projects are on GitHub
- The first question we often ask is: great there’s now a new version, but which version are we currently running, and what’s the difference between them?
- Email notifications can quickly get out of control if there are many add-ons watched
- You cannot build automation on top of GitHub notifications
We need a cloud-native solution that can track new versions from different release channels, detect the currently running versions, aggregate and present the version gaps, and enable further automation solutions.
Since we didn’t find an existing solution, we created Opvic and open-sourced it. This blog will walk you through the design, implementation, and usage of Opvic.
Architecture Design
Opvic started with scalability in mind, and that’s why we chose to decouple the process of tracking the running versions, configuration, and getting the remote versions.
Agent
Agents are installed on all clusters and send the collected information to the centralized control plane.
- Agents interact with K8s API to retrieve figure out the running version of each component
- App discovery and configuration can be managed by custom resources (CRD)
- Each agent has a unique identifier (normally, your cluster identifier)
App Discovery
By default, agents do not perform any action as Opvic’s agent acts as a Kubernetes operator that watches certain custom resources. CRDs are highly flexible in terms of configuration, as they support more components to track.
Based on the CRD, agents can discover Kubernetes resources such as Nodes, Deployments, Pods, etc. based on the standard label selector configuration. Then agents can extract the semver formatted versions from any field, such as image, label, or annotation.
Control Plane
For the initial implementation, the control plane will receive information from agents and store them in memory cache. Then, it aggregates the information and retrieves the remote versions based on the configuration sent by each agent. Overall, this Opvic component is responsible for:
- Exposing an API endpoint for agents to send the collected information (and also exposing endpoints to query detailed information about each component
- Storing the information in a memory cache with a configurable expiration duration
- Interacting with external systems such as GitHub, Helm registries, etc. to retrieve the versions between the running version and latest
- Exposes Prometheus format metrics to show running versions across all clusters, as well as available major, minor, and patches versions to upgrade
Installation
There is a Helm chart available for Opvic here. Begin by adding the repository:
1
helm repo add opvic https://skillz.github.io/opvic
Then deploy it with minimal required configuration. Check out values.yaml for all the available options:
1
2
3
4
helm upgrade --install --namespace opvic --create-namespace opvic opvic/opvic \
--set controlplane.enabled=true \
--set agent.enabled=true --set agent.identifier=test-agent \
--set sharedAuthentication.token=test
Ensure that pods are running and that you connect to the control plane:
1
2
3
4
5
6
7
8
9
10
11
12
13
kubectl get pods -n opvic
NAME READY STATUS RESTARTS AGE
opvic-agent-9b64fb88c-qfvnv 1/1 Running 0 34m
opvic-control-plane-56d6955f7d-wgttd 1/1 Running 0 34m
kubectl port-forward deployment/opvic-control-plane 8080:8080 -n opvic
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
curl -H "Authorization: Bearer test" localhost:8080/api/v1alpha1/ping
pong
Note:
- You only need to run control plane in one of your clusters
- You need to deploy agent and CRD in all clusters
- You don’t need an ingress if the control plane and agent run on the same cluster
- VersionTracker resources should be deployed in all clusters
Examples
Now that the control plane and agent are running, you can deploy some VersionTracker resources. Below is an example of a VersionTracker resource:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
name: myApp # name of the subject
spec:
name: myApp # Unique identifier of the app to track
resources: # How agent should find the resources to extract the version
strategy: Pods # Resource Kind. Pods, Nodes, Deployments, etc (default to Pods)
namespaces: # (optional) namespaces of the app (default to query all namespaces)
- kube-system
selector: # Kubernetes standard selector configuration
matchLabels:
app.kubernetes.io/name: myApp
localVersion: # How agent should extract the version
strategy: FieldSelection # strategy to use for the app (ImageTag, FieldSelection)
fieldSelector: '.metadata.labels.myApp\.io/version' # Valid JsonPath to extract the version from the resource
extraction: # Regex to extract the version from the resource
regex:
pattern: '^v([0-9]+\.[0-9]+\.[0-9]+)$'
result: '$1'
remoteVersion: # How control plane should find the remote versions
provider: github # name of the provider (github, helm)
strategy: releases # method to use to get the remote versions (releases, tags)
repo: owner/repoName # name of the repository (owner/repoName)
extraction:
regex:
pattern: '^myApp-v([0-9]+\.[0-9]+\.[0-9]+)$'
result: '$1'
constraint: '~>3' # Semver constraint to use to filter the remote versions
Note that if the remote versions are not exposed or the provider is not supported by Opvic yet, you can still track the running versions and not specify the remote version configuration.
Tracking CoreDNS From Container Image Tag
Let’s say you want to track the running version of CoreDNS on your cluster, as well as the available upstream versions. You can deploy a VersionTracker resource coredns.yaml
like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
name: coredns
spec:
name: coredns
resources:
namespaces:
- kube-system
selector:
matchLabels:
k8s-app: kube-dns
localVersion:
strategy: ImageTag
remoteVersion:
provider: github
strategy: releases
repo: coredns/coredns
extraction:
regex:
pattern: ^v([0-9]+\.[0-9]+\.[0-9]+)$
result: $1
Let’s apply this:
1
kubectl apply -f coredns.yaml -n opvic
Since most versions can be extracted from containers’ image tags, you can use the ImageTag strategy, which extracts the version from the first container image tag of the resource.
For remote versions, you can use the GitHub provider and look at releases by using releases strategy. You need to specify the GitHub repository and a regex for extraction.
Now you can query the control plane for running versions:
1
curl -H "Authorization: Bearer test" localhost:8080/api/v1alpha1/agents/test/coredns | jq
And you would get a response like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"id": "coredns",
"namespace": "opvic",
"count": 1,
"uniqVersions": [
"1.7.0"
],
"versions": [
{
"runningVersion": "1.7.0",
"resourceCount": 1,
"resourceKind": "Pods",
"extractedFrom": "k8s.gcr.io/coredns:1.7.0"
}
],
"remoteVersion": {
"provider": "github",
"strategy": "releases",
"repo": "coredns/coredns",
"extraction": {
"regex": {
"pattern": "^v([0-9]+\\.[0-9]+\\.[0-9]+)$",
"result": "$1"
}
}
}
}
Normally, it takes up to a minute for the control plane to aggregate collected data and retrieve remote versions since that process runs on an interval in the background.
1
curl -H "Authorization: Bearer test" localhost:8080/api/v1alpha1/agents/test/coredns/versions | jq
And you would get a response like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"id": "coredns",
"agentId": "test",
"resourceCount": 1,
"runningVersions": [
"1.7.0"
],
"latestVersion": "1.8.6",
"remoteProvider": "github",
"remoteRepo": "coredns/coredns",
"versions": [
{
"currentVersion": "1.7.0",
"resourceCount": 1,
"resourceKind": "Pods",
"extractedFrom": "k8s.gcr.io/coredns:1.7.0",
"latestVersion": "1.8.6",
"availableVersions": [
"1.7.1",
"1.8.0",
"1.8.1",
"1.8.2",
"1.8.3",
"1.8.4",
"1.8.5",
"1.8.6"
],
"availableMajors": [],
"availableMinors": [
"1.8.0",
"1.8.1",
"1.8.2",
"1.8.3",
"1.8.4",
"1.8.5",
"1.8.6"
],
"availablePatches": [
"1.7.1"
],
"majorAvailable": false,
"minorAvailable": true,
"patchAvailable": true
}
]
}
The control plane also exposes Prometheus metrics at /metrics
endpoint, so let’s take a look at those:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# HELP opvic_controlplane_agent_last_heartbeat Last time the agent was seen
# TYPE opvic_controlplane_agent_last_heartbeat gauge
opvic_controlplane_agent_last_heartbeat{agent_id="test",tags=""} 1.639773192e+09
# HELP opvic_controlplane_major_versions_count Number of available major versions to upgrade to
# TYPE opvic_controlplane_major_versions_count gauge
opvic_controlplane_major_versions_count{agent_id="test",available_major_versions="",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 0
# HELP opvic_controlplane_minor_versions_count Number of available minor versions to upgrade to
# TYPE opvic_controlplane_minor_versions_count gauge
opvic_controlplane_minor_versions_count{agent_id="test",available_minor_versions="1.8.0,1.8.1,1.8.2,1.8.3,1.8.4,1.8.5,1.8.6",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 7
# HELP opvic_controlplane_patch_versions_count Number of available patch versions to upgrade to
# TYPE opvic_controlplane_patch_versions_count gauge
opvic_controlplane_patch_versions_count{agent_id="test",available_patch_versions="1.7.1",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 1
# HELP opvic_controlplane_requests_total The number of HTTP requests processed
# TYPE opvic_controlplane_requests_total counter
opvic_controlplane_requests_total{method="GET",path="/api/v1alpha1/agents/test/coredns",status="200"} 1
opvic_controlplane_requests_total{method="GET",path="/api/v1alpha1/agents/test/coredns/versions",status="200"} 1
opvic_controlplane_requests_total{method="GET",path="/api/v1alpha1/ping",status="200"} 1
opvic_controlplane_requests_total{method="POST",path="/api/v1alpha1/agents",status="202"} 15
# HELP opvic_controlplane_version_resource_count Number of resources running with a specific version
# TYPE opvic_controlplane_version_resource_count gauge
opvic_controlplane_version_resource_count{agent_id="test",extracted_from="k8s.gcr.io/coredns:1.7.0",latest_version="1.8.6",remote_provider="github",remote_repo="coredns/coredns",resource_kind="Pods",running_version="1.7.0",version_id="coredns"} 1
# HELP opvic_provider_github_rate_limit_remaining The number of requests remaining in the current rate limit window.
# TYPE opvic_provider_github_rate_limit_remaining gauge
opvic_provider_github_rate_limit_remaining 58
Extract the Version From Any Field
Now that you know how everything works, let’s look at some other examples. As you now know, you can extract the local version from any field. So let’s grab Kubelet version of your nodes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
kind: VersionTracker
metadata:
name: kubernetes
spec:
name: kubernetes
resources:
strategy: Nodes
selector:
matchExpressions:
- key: kubernetes.io/hostname
operator: Exists
localVersion:
strategy: FieldSelector
fieldSelector: '.status.nodeInfo.kubeletVersion'
extraction:
regex:
pattern: '^v([0-9]+\.[0-9]+\.[0-9]+)'
result: $1
remoteVersion:
provider: github
strategy: tags
repo: kubernetes/kubernetes
extraction:
regex:
pattern: ^v([0-9]+\.[0-9]+\.[0-9]+)
result: $1
In this example we use FieldSelector strategy in the localVersion configuration and set the JsonPath to .status.nodeInfo.kubeletVersion
to extract the Kubelet version from the node status. We also use tags strategy in the remoteVersion configuration to get the latest version from the remote repository.
Use appVersion of a Helm Repository
If the application that you are tracking does not have a GitHub repository, check to see if it has a Helm chart as appVersion in the helm charts normally represents the actual version. We can use helm provider and appVersion strategy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
name: artifactory
spec:
name: artifactory
resources:
namespaces:
- artifactory
selector:
matchLabels:
component: artifactory
localVersion:
strategy: ImageTag
remoteVersion:
provider: helm
strategy: appVersion
repo: https://charts.jfrog.io
chart: artifactory
Track Your Helm Chart Versions
You can also track Helm chart versions using the helm provider and chartVersion strategy. Assuming the deployed helm chart version is exposed in one of the labels called chart:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: opvic.skillz.com/v1alpha1
kind: VersionTracker
metadata:
name: artifactory-helm
spec:
name: artifactory-helm-chart
resources:
namespaces:
- artifactory
selector:
matchLabels:
component: artifactory
localVersion:
strategy: FieldSelector
fieldSelector: '.metadata.labels.chart'
extraction:
regex:
pattern: ^artifactory-([0-9]+\.[0-9]+\.[0-9]+)$
result: $1
remoteVersion:
provider: helm
strategy: chartVersion
repo: https://charts.jfrog.io
chart: artifactory
Conclusion
Opvic is a tool that runs on Kubernetes, detects version gaps between releases and your deployments for your cluster add-ons, and enables you to develop further automation that suits your team’s version upgrade strategy. It currently supports software release channels on GitHub and Helm, and more will be added in the future.
Skillz has been benefiting from and contributing to the open-source community, and ultimately, that’s why we are open-sourcing Opvic today.
Please check out our GitHub repository Opvic and feel free to give it a try.
Also if this is something that interests you, we encourage you to learn more about joining our team here!