Bug ID 1135853: Openshift kubelet-server and kubelet-client certificates expire after 365 days

Last Modified: Nov 14, 2022

Bug Tracker

Affected Product:  See more info
F5OS Velos(all modules)

Known Affected Versions:
1.3.0, 1.3.1, 1.3.2, 1.5.0

Opened: Aug 11, 2022
Severity: 2-Critical

Symptoms

See https://support.f5.com/csp/article/K64001020 The kubelet-server and kubelet-client certificates on each blade and controller expire after 365 days and are not automatically renewed when they expire. When the blade kubelet-server and kubelet-client certificates expire, the blade(s) will go offline in the openshift cluster, and be re-added to the Openshift cluster by the orchestration-manager daemon. This will cause a tenant outage. On the active system controller, messages appear similar to the following example, indicating the certificates are expired: controller-2.chassis.local dockerd-current[4212]: E0809 19:48:01.601509 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] The systemd journal on the system controller logs messages similar to the following example: controller-2.chassis.local origin-node[19920]: E0808 08:35:03.754013 19930 certificate_manager.go:326] Certificate request was not signed: timed out waiting for the condition

Impact

The blade(s) will go offline in the Openshift cluster and be re-added to the Openshift cluster by the orchestration-manager daemon. This will cause a tenant outage, and the tenants may not restart correctly after the blades have been re-added to the cluster.

Conditions

Any system where the Openshift cluster was installed with a release of 1.5.0 or earlier.

Workaround

The renew_nodes.sh script mentioned in K64001020 can be used to renew the kubelet-server and kubelet-client certificates for one more year. It is not possible to renew these certificates for more than a year without rebuilding the Openshift cluster. At 2 years, other certificates in the Openshift cluster will expire, so it is necessary to rebuild the Openshift cluster with the fix for this issue.

Fix Information

None

Behavior Change