Known Issues

General Issues


Existing clusters show Kubernetes version as 0.0.0 after CSE is upgraded to 2.6.0

The way Kubernetes version of a cluster is determined, changed between CSE 2.5.x and 2.6.0. If the cluster metadata is not properly updated, then CSE 2.6.0 defaults the version to 0.0.0.

Workaround: CSE 2.6.1 takes care of this issue and defaults to the Kubernetes version of the template from which the cluster is deployed. However, please note that if the template itself was created by CSE 2.5.x, then this approach is not foolproof. In such cases it’s better to recreate the template in CSE 2.6.1, and then run cse convert-cluster command against the affected cluster to fix its metadata. Possible error messages if the template is not recreated and cse convert-cluster is not run are as follows (but not limited to):

N/A or patch version missing in/from Kubernetes version field

$ vcd cse cluster list
Name                        VDC          Org      Kubernetes      Status      Provider
--------------------------  -----------  -------  --------------  ----------  ----------
used_old_tempalte           new-org-vdc  new-org  upstream 1.16   POWERED_ON  native
didn_t_run_cluster_convert  new-org-vdc  new-org  N/A             POWERED_ON  native

Kubernetes upgrade operation fails

$ vcd cse cluster upgrade "used_old_tempalte" ubuntu-16.04_k8-1.17_weave-2.6.0 1
cluster operation: Upgrading cluster 'used_old_tempalte' software to match template
ubuntu-16.04_k8-1.17_weave-2.6.0 (revision 1): Kubernetes: 1.16 -> 1.17.2,
Docker-CE: 18.09.7 -> 19.03.5, CNI: weave 2.6.0 -> 2.6.0,
.
.
task: [REDACTED uuid], result: error, message: Unexpected error while upgrading
cluster 'used_old_tempalte': Invalid version string: '1.16'

Never ending CSE tasks in VCD UI / Failed CSE tasks without proper error message

If CSE server encounters any error during cluster/node creation, users may see CSE tasks in VCD never reach to completion, or the tasks may show up as failed without a proper error message. Currently, UI lacks the ability to properly express error messages upon operation failures. Some examples might be - A user input parameter was invalid, or an unexpected error (network connection/outage) occurred. Please inspect CSE server logs in these cases, or file a github issue.


Fresh installation of CSE 2.5.1 or below via pip install is broken

CSE 2.5.1 or below versions have an open-ended dependencies, which permit pip to pull and install latest versions of the dependencies. Two such dependencies are pyvcloud and vcd-cli, and their latest available versions are incompatible with CSE 2.5.1 or below. We are reviewing our design on dependencies, and hope to bring improvements in near future.

Workaround: - Un-install incompatible pyvcloud and vcd-cli libraries, and manually install compatible versions.

# Un-install pyvcloud and vcd-cli
pip3 uninstall pyvcloud vcd-cli --user --yes

#Install specific version of the libraries which are compatible with CSE 2.5.1 and CSE 2.0.0
pip3 install pyvcloud==21.0.0 vcd-cli==22.0.0 --upgrade --user

vcd cse ovdc list operation will timeout when numerous OrgVDCs exist

CSE makes an API call per OrgVDC in order to access required metadata, and that can timeout with large number of OrgVDCs.

Example - Trying to use vcd cse ovdc list with 250+ VDCs:

vcd cse ovdc list
Usage: vcd cse ovdc list [OPTIONS]
Try "vcd cse ovdc list -h" for help.

Error: Unknown error. Please contact your System Administrator

Workaround: extend the cell timeout to be able to wait for the required amount of time. See the section ‘Setting the API Extension Timeout’ under CSE Server Management.


CSE server fails to start up after disabling the Service Provider Access to the Legacy API Endpoint

Workaround: Don’t disable Service Provider Access to the Legacy API Endpoint

VCD 10.0 deprecates the /api/sessions REST end point, and introduces a new /cloudapi/ based REST endpoint for authenticating VCD users. CSE relies on the ‘/api’ end point for operations, so it is necessary that the legacy API endpoint is not disabled in vCloud Director.

More details

Update : CSE 2.6.0 has resolved this issue.


Failures during template creation or installation


CSE service fails to start


CSE 1.2.6 and up are incompatible with VCD 9.0


Cluster creation fails when VCD external network has a DNS suffix and the DNS server resolves localhost.my.suffix to a valid IP

This is due to a bug in etcd (More detail HERE, with the kubeadm config file contents necessary for the workaround specified in this comment).

The main issue is that etcd prioritizes the DNS server (if it exists) over the /etc/hosts file to resolve hostnames, when the conventional behavior would be to prioritize checking any hosts files before going to the DNS server. This becomes problematic when kubeadm attempts to initialize the control plane node using localhost. etcd checks the DNS server for any entry like localhost.suffix, and if this actually resolves to an IP, attempts to do some operations involving that incorrect IP, instead of localhost.

The workaround (More detail HERE is to create a kubeadm config file (no way to specify listen-peer-urls argument in command line), and modify the kubeadm init command in the CSE control plane script for the template of the cluster you are attempting to deploy. CSE control plane script is located at ~/.cse-scripts/<template name>_rev<template_revision>/scripts/mstr.sh

Change command from, kubeadm init --kubernetes-version=v1.13.5 > /root/kubeadm-init.out to kubeadm init --config >/path/to/kubeadm.yaml > /root/kubeadm-init.out

Kubernetes version has to be specified within the configuration file itself, since --kubernetes-version and --config are incompatible.


NFS Limitations

Currently, NFS servers in a Kubernetes cluster are not only accessible by nodes of that cluster but also by any VM (outside of the cluster) residing in the same OrgVDC. Ideal solution is to have vApp network created for each Kubernetes cluster, which is in our road-map to implement. Until then, please choose one of below workarounds to avert this problem if the need arises.


Enterprise PKS Limitations