After several hours investigating why the metrics-server component running in my home kubernets cluster was getting errors related to invalid certificate when retrieving metrics from the nodes, I have been eable to fix the issue and get metrics correctly from all the kubernetes nodes using HTTPS.
Below lines will describe what the problem is and how to fix it.
Issue
After deploying metrics-server, the metrics-server pod is not able to start. I can see lot of messages like the below lines:
E1229 07:09:05.013998 1 summary.go:97] error while getting metrics summary from Kubelet node2(192.168.x.x:10250): Get https://192.168.x.x:10250/metrics/resource/: x509: cannot validate certificate for 192.168.x.x because it doesn't contain any IP SANs
The error message is very clear. The issue is that the certificate configured in the nodes of the kubernetes cluster does not allow IPs as hostnames in the URL to query metrics.
Fix
Thanks to Github, I found a solution that worked at the time of the issue was raised but that is not working nowadays.
Anyway, after some little tweaks, I get it to work in Kubernetes 1.23
- Install some software needed to run next steps. In my case, I am using a laptop that runs on Ubuntu so all the steps are valid for Ubuntu
sudo apt install -y golang-cfssl jq
- Generate a Certificate Signing Request (CSR) . After running below command, two new files should be created kubelet-server-key.pem and kubelet-server.csr
{
"hosts": [
"master1",
"node2",
"node3",
"node4",
"192.168.30.12",
"192.168.30.11",
"192.168.30.15",
"192.168.30.16"
],
"CN": "system:node:kubelet-server",
"name": [
{
"C": "ES",
"ST": "Spain",
"L": "Malaga",
"O": "system:nodes",
"OU": "system:nodes"
}
]
}
- It is time to create the CSR in the kubernetes cluster so kubernetes can create the correct certificate
cat <<EOF | kubectl create -f -
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: kubelet-server
spec:
request: $(cat kubelet-server.csr | base64 | tr -d '\n')
signerName: kubernetes.io/kubelet-serving
groups:
- system:nodes
- system:authenticated
usages:
- digital signature
- key encipherment
- server auth
EOF
- After creating the CSR in kubernetes, we need to approve it
kubectl certificate approve kubelet-server
- Get the new certificate
kubectl get csr kubelet-server -o jsonpath='{.status.certificate}' | base64 --decode > kubelet-server.pem
- At this point, we should have two different files that contains the private key (kubelet-server-key.pem) and the public key (kubelet-server.pem) of the certificate. We need to copy these two files to all the kubernetes nodes (masters y workers). I have decided to place these files at /var/lib/kubernetes/pki so the files should located at /var/lib/kubernetes/pki/kubelet-server-key.pem and /var/lib/kubernetes/pki/kubelet-server.pem
- We should configure kubelet to read the files we have just copied over to all the kubernetes nodes. We need to know which file the kubelet component uses to load configuration
sudo ps aux | grep kubelet
- The output of the above command should look like below lines
root 2020195 5.6 0.3 2160136 121488 ? Ssl Sep18 73:58 /opt/bin/kubelet --logtostderr=true --v=2 --node-ip=192.168.30.16 --hostname-override=node4
--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/etc/kubernetes/kubelet-config.yaml --kubeconfig=/etc/kubernetes/kubelet.conf
--container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --runtime-cgroups=/systemd/system.slice --network-plugin=cni --cni-conf-dir=/etc/cni/net.d
--cni-bin-dir=/opt/cni/bin --volume-plugin-dir=/var/lib/kubelet/volumeplugins
- We can check that the config file that kubelet reads is /etc/kubernetes/kubelet-config.yaml
We need to edit this file to configure where the private and public parts of the certificates are located. Bear in mind that below commands need to be run as root user.
echo "tlsPrivateKeyFile: /var/lib/kubelet/pki/kubelet-server-key.pem" >> /etc/kubernetes/kubelet-config.yaml
echo "tlsCertFile: /var/lib/kubelet/pki/kubelet-server.pem" >> /etc/kubernetes/kubelet-config.yaml
- Last step is to restart kubelet in all nodes
sudo systemctl daemon-reload
sudo systemctl restart kubelet
At this point, kubelet should be using a certificate that contains the IPs of all the nodes of the kubernetes and that has been signed by the cluster CA so the metrics server is able to get all the metrics using HTTPS.
I hope you find this post useful and you can fix this issue easily if you find it.
See you in the next post!
