Bug ID 1135853: Openshift kubelet-server and kubelet-client certificates expire after 365 days

Last Modified: Mar 30, 2024

Affected Product(s):
F5OS Velos(all modules)

Known Affected Versions:
F5OS-C 1.3.1, F5OS-C 1.3.2, F5OS-C 1.5.0

Fixed In:
F5OS-C 1.6.0, F5OS-C 1.5.1

Opened: Aug 11, 2022

Severity: 2-Critical

Symptoms

See https://support.f5.com/csp/article/K64001020 The kubelet-server and kubelet-client certificates on each blade and controller expire after 365 days and are not automatically renewed when they expire. When the blade kubelet-server and kubelet-client certificates expire, the blade(s) will go offline in the openshift cluster, and be re-added to the Openshift cluster by the orchestration-manager daemon. This will cause a tenant outage. On the active system controller, messages appear similar to the following example, indicating the certificates are expired: controller-2.chassis.local dockerd-current[4212]: E0809 19:48:01.601509 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] The systemd journal on the system controller logs messages similar to the following example: controller-2.chassis.local origin-node[19920]: E0808 08:35:03.754013 19930 certificate_manager.go:326] Certificate request was not signed: timed out waiting for the condition

Impact

The blade(s) will go offline in the Openshift cluster and be re-added to the Openshift cluster by the orchestration-manager daemon. This will cause a tenant outage, and the tenants may not restart correctly after the blades have been re-added to the cluster.

Conditions

Any system where the Openshift cluster was installed with a release of 1.5.0 or earlier.

Workaround

The renew_nodes.sh script mentioned in K64001020 can be used to renew the kubelet-server and kubelet-client certificates for one more year. It is not possible to renew these certificates for more than a year without rebuilding the Openshift cluster. At 2 years, other certificates in the Openshift cluster will expire, so it is necessary to rebuild the Openshift cluster with the fix for this issue.

Fix Information

Openshift has been updated to use a certificate expiration time of 10 years, and new Openshift containers have been added to releases with this fix. To make use of these new containers with longer certificate expiration times, it is necessary to rebuild the Openshift cluster. Warning messages have been added to the “show cluster cluster-status” output on the system controller CLI that warn when certificates are within 90 days of expiring, and when the Openshift cluster needs to be rebuilt to take advantage of the new containers with the longer certificate expiration times. syscon-1-active# show cluster cluster-status cluster cluster-status summary-status "Openshift cluster is healthy, and all controllers and blades are ready. WARNING: 1 or more Openshift certificates expiring within 90 days. WARNING: Manual Openshift cluster rebuild necessary to update containers." INDEX STATUS -------------------------------------------------------------------------------------------------------------------- 15 2023-08-20 12:03:09.773660 - WARNING: Openshift cluster needs manual rebuild to upgrade to latest version. 16 2023-08-20 12:05:05.373785 - WARNING: Openshift certificates expiring within 90 days. The Openshift cluster can be rebuilt after upgrading to a release containing the fix by issuing a “touch /var/omd/CLUSTER_REINSTALL” command from the shell on the active system controller. This rebuild will take 90+ minutes and will cause a tenant outage. Once the cluster rebuild is complete, all chassis partitions should be disabled and re-enabled, and all tenants should be cycled to provisioned and back to deployed to ensure they have restarted correctly after the cluster rebuild. At this point all certificates in the cluster will have a 10 year expiration. Once the Openshift cluster is rebuilt using this fix, it is not possible to downgrade without rebuilding the Openshift cluster after the downgrade. This is due to the new Openshift containers not being available after the downgrade. If a downgrade is done before the Openshift cluster is rebuilt, there will not be any issues.

Behavior Change

Guides & references

K10134038: F5 Bug Tracker Filter Names and Tips