Linkerd at loveholidays — Deploying Linkerd in a GitOps world

Dan Williams
loveholidays tech
Published in
4 min readMar 3, 2023

--

In our accompanying post, Linkerd at loveholidays — Our journey to a production service mesh we describe the journey we took from initial investigations to running all of our applications in the Linkerd service mesh. We also describe how we monitor our applications using Linkerd in another post from this series: Linkerd at loveholidays — Monitoring our apps using Linkerd metrics.

In this post, I will be talking about how we deploy Linkerd.

During out initial testing of Linkerd we deployed it using the Linkerd CLI, which works extremely well but it doesn’t fit with our GitOps way of working. It also deploys short-lived certificate chains, so it’s recommended to generate your own certificates even when using the CLI.

We use a combination of Kustomize, Flux, Helm, Cert-Manager, and Cert-Manager-Trust to deploy Linkerd to our Google Kubernetes Engine (GKE) clusters in a GitOps fashion, all using examples found either in Linkerd’s documentation or Github.

It did take us a long time to find the right combination to get all of this working together, and we do see a lot of questions around this topic in the Linkerd Slack, so I thought I would share our configuration and the particular settings/workarounds we have come across.

Creating the certificate chains

Two certificate chains are required to deploy Linkerd; one for the control plane and one for webhooks. The Linkerd documentation suggests creating a root certificate manually using step , but we opted to automate this step too. Below you can see our cert-manager configuration:

control plane certificates

webhook certificates

The control-plane bundle relies on Cert-Manager-Trust to distribute the required certificate secret as the config map that Linkerd expects. This method was shared by an exceptionally helpful buoyant.io staff member called Matei David, whose original post is here.

Linkerd supports distributed tracing as part of the Jaeger extension. Up until the most recent edge version, Linkerd traces only support B3 propagation, and we use the w3c Trace Context header in our applications, so we’ve not got this feature enabled. This has now been resolved in the most recent edge version (see PR here), aiming for a 2.13 release of Linkerd, so hopefully soon we’ll be enabling Linkerd traces to further enrich our application traces.

Deploying Linkerd with these certificates

We use the Helm charts provided to deploy Linkerd, using Flux’s HelmRelease CRD. The resources we use are described below:

linkerd-crds

The linkerd-crds Helm chart must be installed before the control plane. Flux requires us to create a HelmRepository resource before it can deploy a HelmRelease:

linkerd-control-plane

The linkerd-control-plane Helm chart installs the control plane. The externalCA, issuer.scheme, externalSecret and injectCaFrom fields are all related to passing certificates in from cert-manager.

Linkerd provides a secondary Helm values file, values-ha.yaml, in their repository which contains all of the recommended settings to run Linkerd control-plane in High Availability mode. You will see that referenced in-line in the Gist.

Other settings to note:

  • In GKE we have seen inconsistencies when using EndpointSlices, so we disable this and revert back to using Endpoints which are slower but have been 100% consistent for us. This is a complicated topic and improvements are being made by the maintainers to this area.
  • We run IPTables in Legacy mode, due to missing modules in Google’s Container OS. More details here.
  • At the time of deployment, we used the Docker runtime, which requires proxyInit.runAsRoot: true. More details here. We have recently switched to the ContainerD runtime, so this setting needs revisiting. You will likely not need to use this setting, given Docker runtime’s deprecation in recent K8s versions.
  • linkerd-proxy runs in single core mode by default (this can be seen in the proxy start up logs), which is controlled based on the number of cores in the CPU limit. Official documentation on this here. We have defaulted proxy.cores to 2, to ensure all proxies run in multi-core mode. We have not seen any proxies get even close to the 2 CPU limit, but also did not notice any performance gains going from single core to multi-core mode in the proxy. If this is an area you have more experience with, please do share details with us.

linkerd-viz

The linkerd-viz helm chart installs the Viz extension (Viz is an awesome Dashboard which represents real time traffic information within your cluster, see here for more details). The externalSecret and injectCaFrom fields are related to passing certificates in from cert-manager.

As with the control-plane, there is an additional values-ha.yaml file for running Linkerd Viz in HA mode which you will see in-line in the Gist.

Other settings to note:

  • dashboard.enforcedHostRegexp is set to the external URL that points to our Viz ingress (Ingress is not included as standard so we have added a GKE ingress resource to expose Viz)
  • grafana.externalUrl is the external URL for our Grafana (this powers the Grafana clickthrough from Viz UI)
  • We disable Prometheus in Viz. Instead we have added prometheus-viz’s scrape config to our prometheus-operator Prometheus HA pair. Prometheus scrapes the Linkerd proxies, and sends the data to Mimir for retention. The prometheusUrl field instructs Viz to use Mimir (via an internal clusterIP service) to retrieve the metrics that power the Viz Dashboard instead of its own Prometheus.

⚠️ Disabling the internal Prometheus and copying the scrape config does come with a bit of toil, in that during Linkerd upgrades we will need to check if the provided scrape config in the chart has changed.

These settings work for us, but may not be optimal. If this helps you in any way we’d love to hear from you, and we’re always open to any feedback or comments. You’ll find us over on the Linkerd Slack both asking and answering questions, and if this sort of thing interests you, we have an open vacancy for a Senior Site Reliability Engineer.

--

--