Kubernetes Cluster Management

Overview

This page shows basic commands that allow tenants to create, manage, and remove Kubernetes clusters using CSE. The primary tool for these operations is the vcd cse client command.

Useful Commands

vcd cse ... commands are used by tenant organization administrators and tenant users to:

Here is a summary of commands available to view templates and manage clusters and nodes:

CSE server configured against VCD 10.3.z, 10.2.z in non legacy mode

Command Description Native TKG
vcd cse template list List templates that a Kubernetes cluster can be deployed from. Yes Yes
vcd cse cluster apply CLUSTER_CONFIG.YAML Create or update a Kubernetes cluster. Yes Yes
vcd cse cluster list List available Kubernetes clusters. Yes Yes
vcd cse cluster info CLUSTER_NAME Retrieve detailed information of a Kubernetes cluster. Yes Yes
vcd cse cluster config CLUSTER_NAME Retrieve the kubectl configuration file of the Kubernetes cluster. Yes Yes
vcd cse cluster delete CLUSTER_NAME Delete a Kubernetes cluster. Yes Yes
vcd cse cluster delete CLUSTER_NAME --force Delete a Kubernetes cluster even if they are in an unrecoverable state. Yes Yes
vcd cse cluster upgrade-plan CLUSTER_NAME Retrieve the allowed path for upgrading Kubernetes software on the custer. Yes No
vcd cse cluster upgrade CLUSTER_NAME TEMPLATE_NAME TEMPLATE_REVISION Upgrade cluster software to specified template’s software versions. Yes No
vcd cse cluster delete-nfs CLUSTER_NAME NFS_NODE_NAME Delete NFS node of a given Kubernetes cluster Yes No
vcd cse cluster share --name CLUSTER_NAME --acl FullControl USER1 Share cluster ‘mycluster’ with FullControl access with ‘user1’ Yes No
vcd cse cluster share-list --name CLUSTER_NAME View the acl info for a cluster. Yes No
vcd cse cluster unshare --name CLUSTER_NAME USER1 Unshare the cluster with the user1. Yes No

CSE server configured against VCD 10.3.z, 10.2.z, 10.1.z in legacy mode

Command Description
vcd cse template list List templates that a Kubernetes cluster can be deployed from.
vcd cse cluster create CLUSTER_NAME Create a new Kubernetes cluster.
vcd cse cluster create CLUSTER_NAME --enable-nfs Create a new Kubernetes cluster with NFS Persistent Volume support.
vcd cse cluster list List available Kubernetes clusters.
vcd cse cluster info CLUSTER_NAME Retrieve detailed information of a Kubernetes cluster.
vcd cse cluster resize CLUSTER_NAME Grow a Kubernetes cluster by adding new nodes.
vcd cse cluster config CLUSTER_NAME Retrieve the kubectl configuration file of the Kubernetes cluster.
vcd cse cluster upgrade-plan CLUSTER_NAME Retrieve the allowed path for upgrading Kubernetes software on the custer.
vcd cse cluster upgrade CLUSTER_NAME TEMPLATE_NAME TEMPLATE_REVISION Upgrade cluster software to specified template’s software versions.
vcd cse cluster delete CLUSTER_NAME Delete a Kubernetes cluster.
vcd cse node create CLUSTER_NAME --nodes n Add n nodes to a Kubernetes cluster.
vcd cse node create CLUSTER_NAME --nodes n --enable-nfs Add an NFS node to a Kubernetes cluster.
vcd cse node list CLUSTER_NAME List nodes of a cluster.
vcd cse node info CLUSTER_NAME NODE_NAME Retrieve detailed information of a node in a Kubernetes cluster.
vcd cse node delete CLUSTER_NAME NODE_NAME Delete nodes from a cluster.
vcd cse node list CLUSTER_NAME List nodes of a cluster.
vcd cse node info CLUSTER_NAME NODE_NAME Retrieve detailed information of a node in a Kubernetes cluster.
vcd cse node delete CLUSTER_NAME NODE_NAME Delete nodes from a cluster.

CSE 3.1 Cluster apply command

  1. vcd cse cluster apply <create_cluster.yaml> command - Takes a cluster specification file as an input and applies it to a cluster resource. The cluster resource will be created if it does not exist. It can be used to create the cluster, scale up/down workers, scale up NFS nodes, upgrade the cluster to a new K8s version.
    • Note that a new property spec.settings.network.expose can be used to expose the cluster to the external world. This would require user to have EDIT rights on edge gateway. Refer to expose cluster for more details.
    • Command usage examples:
        vcd cse cluster apply <create_cluster.yaml> (creates the cluster if the resource already does not exist.)
        vcd cse cluster apply <resize_cluster.yaml> (resizes the specification on the resource specified). 
        vcd cse cluster apply <upgrade_cluster.yaml> (upgrades the cluster to match the user specified template and revision)
        vcd cse cluster apply --sample --tkg-s (generates the sample specification file for tkg-s clusters).
        vcd cse cluster apply --sample --tkg (generates the sample specification file for tkg clusters).
        vcd cse cluster apply --sample --native (generates the sample specification file for native clusters).
      
    • How to construct the specification for the cluster creation?
      • Get a sample native cluster specification from vcd cse cluster apply -s -n.
      • Populate the required properties. Note that the sample file has detailed comments to identify the required and optional properties.
      • Run vcd cse cluster apply <create_cluster.yaml>
    • How to construct the specification for the update operation (scale-up/down workers, K8 upgrade, scale-up NFS node) ?
      • Retrieve the current status of the cluster: Save the result of vcd cse cluster info for further editing.
      • Update the saved specification with the current status of the cluster:
        • update the spec section with the accurate values provided in status section. Note that the status section of the output is what actually represents the true current state of the cluster and specportion of the result just represents the latest desired state expressed by the user. For example, the current count of status.nodes.workers could be different from the spec.topology.workers.count because of the potential failure in the previous resize operation.
      • Update the new specification with the desired status of the cluster:
        • update the spec with the new desired state of the cluster. Note that you can only update few properties: scale-up/down (spec.topology.workers.count), scale-up nfs (spec.topology.nfs.count), and upgrade (spec.distribution.templateName and spec.distribution.templateRevision).
      • Save the file as update_cluster.yaml and issue the command vcd cse cluster apply update_cluster.yaml

    • CSE 3.1.3 for TKG - CPI, CSI, and CNI fields in RDE 2.1
      • CSE 3.1.3 introduces the following fields that only apply to TKG clusters: spec.settings.cpi, spec.settings.csi, and spec.settings.cni. Each of these fields is covered by CSE defaults. For each of these fields, if a version is not specified, then the version specified in the CSE server configuration for each of these components will be used. If no version is specified in the CSE server configuration, then a default will be used for csi (1.2.0 in CSE 3.1.3) and cpi (1.1.1 in CSE 3.1.3); for cni, a TKG-compatible version will be used.
        • cpi.name will be cloud-provider-for-cloud-director
        • cpi.version can be specified
        • csi.name will be cloud-director-named-disk-csi-driver
        • cpi.version can be specified
        • cni.name will be antrea
        • cni.version can be specified
      • spec.settings.csi.defaultK8sStorageClass allows users to create a default storage class, which has the following 4 fields
        • filesystem: This can be ext4 (default) or xfs
        • k8sStorageClassName: the name of the storage class
        • useDeleteReclaimPolicy: If true the Delete reclaim policy is used, which deletes the PV when the PVC is deleted. If false, the Retain reclaim policy is used, which allows the PV to be manually reclaimed after the PVC is deleted.
        • vcdStorageProfileName: The VCD storage profile to use
        • If a default storage class is not needed, set spec.settings.csi.defaultK8sStorageClass: null.

    • Sample input specification file
        # Short description of various properties used in this sample cluster configuration
        # apiVersion: Represents the payload version of the cluster specification. By default, "cse.vmware.com/v2.1" is used.
        # kind: The kind of the Kubernetes cluster.
        #
        # metadata: This is a required section
        # metadata.name: Name of the cluster to be created or resized.
        # metadata.orgName: The name of the Organization in which cluster needs to be created or managed.
        # metadata.virtualDataCenterName: The name of the Organization Virtual data center in which the cluster need to be created or managed.
        # metadata.site: VCD site domain name where the cluster should be deployed.
        #
        # spec: User specification of the desired state of the cluster.
        # spec.topology.controlPlane: An optional sub-section for desired control-plane state of the cluster. The properties "sizingClass" and "storageProfile" can be specified only during the cluster creation phase. These properties will no longer be modifiable in further update operations like "resize" and "upgrade".
        # spec.topology.controlPlane.count: Number of control plane node(s). Only single control plane node is supported.
        # spec.topology.controlPlane.sizingClass: The compute sizing policy with which control-plane node needs to be provisioned in a given "ovdc". The specified sizing policy is expected to be pre-published to the given ovdc.
        # spec.topology.controlPlane.storageProfile: The storage-profile with which control-plane needs to be provisioned in a given "ovdc". The specified storage-profile is expected to be available on the given ovdc.
        #
        # spec.distribution: This is a required sub-section.
        # spec.distribution.templateName: Template name based on guest OS, Kubernetes version, and the Weave software version
        # spec.distribution.templateRevision: revision number
        #
        # spec.topology.nfs: Optional sub-section for desired nfs state of the cluster. The properties "sizingClass" and "storageProfile" can be specified only during the cluster creation phase. These properties will no longer be modifiable in further update operations like "resize" and "upgrade".
        # spec.topology.nfs.count: Nfs nodes can only be scaled-up; they cannot be scaled-down. Default value is 0.
        # spec.topology.nfs.sizingClass: The compute sizing policy with which nfs node needs to be provisioned in a given "ovdc". The specified sizing policy is expected to be pre-published to the given ovdc.
        # spec.topology.nfs.storageProfile: The storage-profile with which nfs needs to be provisioned in a given "ovdc". The specified storage-profile is expected to be available on the given ovdc.
        #
        # spec.settings: This is a required sub-section
        # spec.settings.ovdcNetwork: This value is mandatory. Name of the Organization's virtual data center network
        # spec.settings.rollbackOnFailure: Optional value that is true by default. On any cluster operation failure, if the value is set to true, affected node VMs will be automatically deleted.
        # spec.settings.sshKey: Optional ssh key that users can use to log into the node VMs without explicitly providing passwords.
        # spec.settings.network.expose: Optional value that is false by default. Set to true to enable access to the cluster from the external world.
        #
        # spec.topology.workers: Optional sub-section for the desired worker state of the cluster. The properties "sizingClass" and "storageProfile" can be specified only during the cluster creation phase. These properties will no longer be modifiable in further update operations like "resize" and "upgrade". Non uniform worker nodes in the clusters is not yet supported.
        # spec.topology.workers.count: number of worker nodes (default value:1) Worker nodes can be scaled up and down.
        # spec.topology.workers.sizingClass: The compute sizing policy with which worker nodes need to be provisioned in a given "ovdc". The specified sizing policy is expected to be pre-published to the given ovdc.
        # spec.topology.workers.storageProfile: The storage-profile with which worker nodes need to be provisioned in a given "ovdc". The specified storage-profile is expected to be available on the given ovdc.
        #
        # status: Current state of the cluster in the server. This is not a required section for any of the operations.
      
        apiVersion: cse.vmware.com/v2.1
        kind: native
        metadata:
          name: cluster_name
          orgName: organization_name
          site: VCD_site
          virtualDataCenterName: org_virtual_data_center_name
        spec:
          distribution:
            templateName: ubuntu-16.04_k8-1.17_weave-2.6.0
            templateRevision: 2
          settings:
            network:
              expose: false
            ovdcNetwork: ovdc_network_name
            rollbackOnFailure: true
            sshKey: null
          topology:
            controlPlane:
              count: 1
              cpu: null
              memory: null
              sizingClass: Large_sizing_policy_name
              storageProfile: Gold_storage_profile_name
            nfs:
              count: 0
              sizingClass: Large_sizing_policy_name
              storageProfile: Platinum_storage_profile_name
            workers:
              count: 2
              cpu: null
              memory: null
              sizingClass: Medium_sizing_policy_name
              storageProfile: Silver_storage_profile
      

      Note: CSE 3.1.3 introduces RDE 2.1, which is reflected in the spec field apiVersion: cse.vmware.com/v2.1

CSE 3.1 Cluster share command

The vcd cse cluster share command is supported for both TKGs and native clusters.

Upgrading software installed on Native Kubernetes clusters

Kubernetes is a fast paced piece of software, which gets a new minor release every three months and numerous patch releases (including security patches) in between those minor releases. To keep already deployed clusters up to date, in CSE 2.6.0 we have added support for in place software upgrade for Kubernetes clusters. The softwares that can be upgraded to a newer version are

The upgrade matrix is built on the CSE native templates (read more about them here). The template originally used to deploy a cluster determines the valid target templates for upgrade. The supported upgrade paths can be discovered using the following command

vcd cse cluster upgrade-plan 'mycluster'

Let’s say our cluster was deployed using template T1 which is based off Kubernetes version x.y.z. Our potential target templates for upgrade will satisfy at least one of the following criteria:

If you don’t see a desired target template for upgrading your cluster, please feel free to file a GitHub issue .

The actual upgrade of the cluster is done via the following command.

vcd cse cluster upgrade 'mycluster'

The upgrade process needs little to zero downtime, if the following conditions are met,

  1. Docker is not being upgraded.
  2. Weave (CNI) is not being upgraded.
  3. Kubernetes version upgrade is restricted to patch version only.

If any of the conditions mentioned above is not met, the cluster will go down for about a minute or more (depends on the actual upgrade process).

Creating clusters in Organizations with routed OrgVDC networks backed by NSX-T

Traditionally, CSE requires a directly connected OrgVDC network for K8s cluster deployment. This is to make sure that the cluster VMs are reachable from outside the scope of the OrgVDC network. With NSX-T, directly connected OrgVDC networks are not offered and routed OrgVDC networks are used to deploy K8s clusters. In order to grant Internet access to the cluster VMs connected to NSX-T backed routed OrgVDC networks, and maintain accessibility to the clusters, CSE 3.0.2 offers an option to expose the cluster.

It is a prerequisite that the network is configured to receive external traffic and to send out traffic externally

Users deploying clusters must have the following rights, if they want to leverage the expose functionality.

If even one of these rights are missing, CSE will ignore the request to expose the K8s cluster.

User can expose their K8s cluster during the first vcd cse cluster apply command by specifying expose : True under spec section in the cluster specification file. It should be noted that any attempt to expose the cluster after it has been created will be ignored by CSE. Once a cluster has been exposed, the status section of the cluster would show a new field viz. exposed, which would be set to True.

Users can de-expose a cluster, by setting the value of expose field to False and applying the updated specification on the cluster via vcd cse cluster apply. The value for the exposed field would be False for clusters that are not exposed. An exposed cluster if ever de-exposed can’t be re-exposed.

Creating clusters with VMware Tanzu Kubernetes Grid

Starting CSE 3.1.1, VMware Tanzu Kubernetes Grid (TKG) clusters can be deployed like Native clusters using vcd cse cluster apply command. TKG cluster specification file differs from a native cluster specification file in the value of the fields kind and template_name. Sample file for TKG cluster deployment can be generated using the following command

vcd cse cluster apply --sample --tkg

Please note:

CSE 3.1.3 Core package installation

In CSE 3.1.3, TKG clusters will automatically install kapp-controller and metrics-server. Nothing needs to be specified in the cluster spec for these installations. The installed kapp-controller and metrics-server versions can be viewed using vcd cse cluster info or the cluster info on the container plugin.

Note: It is a known issue (see more here) that TKG 1.3.Z ova’s will not have automatic kapp-controller and metrics-server installation.

Force deleting clusters

If cluster deployment fails on CSE 3.1.1+ and VCD 10.3.1+, it is possible for the cluster to end up in state, where users are unable to delete them from UI/CLI. The issue and a workaround has been mentioned here. In CSE 3.1.2, a new option -f/--force has been added to the command vcd cse cluster delete. This option if specified deletes those clusters (and its associated resources) that were not fully created and were left in unremovable state. User must have following rights to use this option,

cse:nativeCluster: Full Access (Administrator Full Control if not owner)
vApp: Delete
Organization vDC Gateway: Configure NAT
Organization vDC Gateway: View

and a FullControl access level on the RDE type cse:nativeCluster:2.0.0.

{
    "grantType": "MembershipAccessControlGrant",
    "accessLevelId": "urn:vcloud:accessLevel:FullControl",
    "memberId": "urn:vcloud:user:uuid"
}