This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting Guide

The Photon OS Troubleshooting Guide provides solutions for common problems that you might encounter while using Photon OS.

Product version: 3.0

This documentation applies to all 3.0.x releases.

Intended Audiences

This information is intended for Photon OS administrators who install and set up Photon OS.

1 - Introduction

The Troubleshooting Guide covers the basics of troubleshooting systemd, packages, network interfaces, services such as SSH and Sendmail, the file system, and the Linux kernel. The guide also includes information about the tools that you can use for troubleshooting with examples, how to access the logs, and best practices.

1.1 - Systemd and TDNF

By using systemd, Photon OS adopts a contemporary Linux standard to bootstrap the user space and concurrently start services, an architecture that differs from traditional Linux systems such as SUSE Linux Enterprise Server 11.

A traditional Linux system contains an initialization system called SysVinit. With SLES 11, for instance, SysVinit-style init programs control how the system starts up and shuts down. Init implements system runlevels. A SysVinit runlevel defines a state in which a process or service runs. In contrast to a SysVinit system, systemd defines no such runlevels. Instead, systemd uses a dependency tree of targets to determine which services to start when.

Because the systemd commands differ from those of an init.d-based Linux system, a section later in this guide illustrates how to troubleshoot by using systemctl commands instead of init.d-style commands.

Tdnf keeps the operating system as small as possible while preserving yum’s robust package-management capabilities. On Photon OS, tdnf is the default package manager for installing new packages. Since troubleshooting with tdnf differs from using yum, a later section of this guide describes how to solve problems with packages and repositories by using tdnf commands.

1.2 - The Root Account and the `sudo` and `su` Commands

The Troubleshooting Guide assumes that you are logged in to Photon OS with the root account and running commands as root. The sudo program comes with the full version of Photon OS. On the minimal version, you must install sudo with tdnf if you want to use it. As an alternative to installing sudo on the minimal version, you can switch users as needed with the su command to run commands that require root privileges.

1.3 - Checking the Version and Build Number

To check the version and build number of Photon OS, concatenate /etc/photon-release.

Example:

cat /etc/photon-release
VMware Photon Linux 1.0
PHOTON_BUILD_NUMBER=a6f0f63

The build number in the results maps to the commit number on the VMware Photon OS GitHub commits page.

1.4 - General Best Practices

When troubleshooting, it is recommended that you follow some general best practices:

  • Take a snapshot. Before you do anything to a virtual machine running Photon OS, take a snapshot of the VM so that you can restore it if need be.

  • Make a backup copy. Before you change a configuration file, make a copy of the original file. For example: cp /etc/tdnf/tdnf.conf /etc/tdnf/tdnf.conf.orig

  • Collect logs. Save the log files associated with a Photon OS problem. Include not only the log files on the guest but also the vmware.log file on the host. The vmware.log file is in the host’s directory that contains the VM.

  • Know what is in your toolbox. View the man page for a tool before you use it so that you know what your options are. The options can help focus the command’s output on the problem you’re trying to solve.

  • Understand the system. The more you know about the operating system and how it works, the better you can troubleshoot.

1.5 - Photon OS Logs

On Photon OS, all the system logs except the installation logs and the cloud-init logs are written into the systemd journal. The journalctl command queries the contents of the systemd journal.

The installation log files and the cloud-init log files reside in /var/log. If Photon OS is running on a virtual machine in a VMware hypervisor, the log file for the VMware tools, vmware-vmsvc.log, also resides in /var/log.

##Journalctl Journalctl is a utility to query and display logs from journald and systemd’s logging service. Since journald stores log data in a binary format instead of a plain text format, journalctl is the standard way of reading log messages processed by journald.

Journald is a service provided by systemd. To see the staus of the daemon, run the following commands:

# systemctl status systemd-journald
● systemd-journald.service - Journal Service
Loaded: loaded (/lib/systemd/system/systemd-journald.service; static; vendor preset: enabled)
Active: active (running) since Tue 2020-04-07 14:33:41 CST; 2 days ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 943 (systemd-journal)
Status: "Processing requests..."
Tasks: 1 (limit: 4915)
Memory: 18.0M
CGroup: /system.slice/systemd-journald.service
└─943 /lib/systemd/systemd-journald



Apr 07 14:33:41 photon-4a0e7f2307d4 systemd-journald[943]: Journal started
Apr 07 14:33:41 photon-4a0e7f2307d4 systemd-journald[943]: Runtime journal (/run/log/journal/b8cebc61a6cb446a968ee1d4c5bbbbd5) is 8.0M, max 1.5G, 1.5G free.
Apr 07 14:33:41 photon-4a0e7f2307d4 systemd-journald[943]: Time spent on flushing to /var is 88.263ms for 1455 entries.
Apr 07 14:33:41 photon-4a0e7f2307d4 systemd-journald[943]: System journal (/var/log/journal/b8cebc61a6cb446a968ee1d4c5bbbbd5) is 40.0M, max 4.0G, 3.9G free.
root@photon-4a0e7f2307d4 [ ~ ]#

The following command are related to journalctl:

  • journalctl : This command displays all the logs after the system has booted up. journalctl splits the results into pages, similar to the less command in Linux. You can navigate using the arrow keys, the Page Up, Page Down keys or the Space bar. To quit navigation, press the q key.
  • journalctl -b : This command displays the logs for the current boot.

The following commands pull logs based on a time range:

  • journalctl --since "1 hour ago" : This command displays the journal logs from the past 1 hour.
  • journalctl --since "2 days ago" : This command displays the logs generated in the past 2 days.
  • journalctl --since "2020-03-25 00:00:00" --until "2020-04-09 00:00:00" : This command displays the logs generated between the mentioned time frame.

To traverse for logs in the reverse order, run the following command:

  • journalctl -r : This command displays the logs in reverse order.

Note: If you add -r at the end of a command, the logs are displayed in the reverse order. For example: journalctl -u unit.service -r

To pull logs related to a particular daemon, run the following command:

  • journalctl -u unit.service : This command displays logs for a specific service. mention the name of the service instead of unit. This command helps when a service is not behaving properly or when there are crash/core dumps.

To see Journal logs by their priority, run the following command:

  • journalctl -p "emerg".."crit : This command displays logs emerg to critical. For example: core dumps.

Journalctl can print log messages to the console as they are added, like the Linux tail command. Add the -f switch to follow a specific service or daemon.

journalctl -u unit.service -f

To list the boots of the system, run the following command:

journalctl --list-boots

You can maintain the journalctl logs manually, by running the following vacuum commands:

  • journalctl --vacuum-time=2d : This command retains the logs from the last 2 days.
  • journalctl --vacuum-size=500M : This command helps retain logs with a maximum size of 500 MB.

You can configure Journald using the conf file located at /etc/systemd/journald.conf. Run the following command to configure the file:

# cat /etc/systemd/journald.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See journald.conf(5) for details.

[Journal]
#Storage=auto
#Compress=yes
#Seal=yes
#SplitMode=uid
#SyncIntervalSec=5m
#RateLimitIntervalSec=30s
#RateLimitBurst=10000
#SystemMaxUse=
#SystemKeepFree=
#SystemMaxFileSize=
#SystemMaxFiles=100
#RuntimeMaxUse=
#RuntimeKeepFree=
#RuntimeMaxFileSize=
#RuntimeMaxFiles=100
#MaxRetentionSec=
#MaxFileSec=1month
#ForwardToSyslog=no
#ForwardToKMsg=no
#ForwardToConsole=no
#ForwardToWall=yes
#TTYPath=/dev/console
#MaxLevelStore=debug
#MaxLevelSyslog=debug
#MaxLevelKMsg=notice
#MaxLevelConsole=info
#MaxLevelWall=emerg
#LineMax=48K
root@photon-4a0e7f2307d4 [ ~ ]#

By default rotate is disable in Photon. Once the changes are made to the conf file, for the changes to take effect you must restart the systemd-journald by running the systemctl restart systemd-journald command.

##Cloud-init Logs Cloud-init is the industry standard multi-distribution method for cross-platform cloud instance initialisation.

If there are with the Cloud-init behaviour, we can debug them by looking at the logs. Run the following command to look at Cloud-init logs:

journalctl -u cloud-init

For better understanding/debugging, You can also look at logs from the following locations:

  • /var/log/cloud-init.log : This log contains information from each stage of Cloud-init.
  • /var/log/cloud-init-output.log : This log contains errors, warnings, etc..

##Syslog Syslog is the general standard for logging system and program messages in the Linux environment.

Photon provides the following two packages to support syslog:

  • syslog-ng : syslog-ng is syslog with some advanced next gen features. It supports TLS encryption, TCP for transport with other existing features. Configurations can be added to the /etc/syslog-ng/syslog-ng.conf file.
  • rsyslog : The official RSYSLOG website defines the utility as “the rocket-fast system for log processing”. rsyslog supports some advanced features like relp, imfile, omfile, gnutls protocols. Configurations can be added to the /etc/rsyslog.conf file. You can configure the required TLS certificates by editing the conf file.

##Logs for RPMS on Photon Logs for a particular RPM can be checked in the following ways:

  • If the RPM provides a daemon, we can see the status of daemon by running systemctl command and check logs using journactl -u <service name> command.
  • For additional logs, check if a conf file is provided by the RPM by running the rpm -ql <rpm name> | grep conf command and find the file path of the log file. You can also check the /var/log folder.

1.6 - Troubleshooting Progression

If you encounter a problem running an application or appliance on Photon OS and you suspect it involves the operating system, you can troubleshoot by proceeding as follows.

  1. Check the services running on Photon OS:

    systemctl status

  2. Check your application log files for errors. For VMware applications, see Location of Log Files for VMware Products.)

  3. Check the service controller or service monitor for your application or appliance.

  4. Check the network interfaces and other aspects of the network service with systemd-network commands.

  5. Check the operating system log files:

    journalctl

    Next, run the following commands to view all services according to the order in which they were started:

    systemd-analyze critical-chain

  6. Use the troubleshooting tool that you think is most likely to help with the issue at hand. For example, use strace to identify the location of the failure.

2 - Solutions to Common Problems

This section describes solutions to problems that you might encounter:

2.1 - Boot in Emergency Mode

If you encounter problems during normal boot, you can boot in Emergency Mode.

Perform the following steps to boot in Emergency Mode:

  1. Restart the Photon OS machine or the virtual machine running Photon OS.

    When the Photon OS splash screen appears, as it restarts, type the letter e quickly.

  2. Append emergency to the kernel command line.

  3. Press F10 to proceed with the boot.

  4. At the command prompt, provide the root password to log in to Emergency Mode.

    By default, / is mounted as read-only.

    To make modifications, run the following command to remount with write access:

    mount -o remount,rw /

2.2 - Resetting a Lost Root Password

Perform the following steps to rest a lost password:

  1. Restart the Photon OS machine or the virtual machine running Photon OS.

    When the Photon OS splash screen appears as it restarts, type the letter e to go to the GNU GRUB edit menu quickly. Because Photon OS reboots so quickly, you won’t have much time to type e. Remember that in vSphere and Workstation, you might have to give the console focus by clicking in its window before it will register input from the keyboard.

Second, in the GNU GRUB edit menu, go to the end of the line that starts with linux, add a space, and then add the following code exactly as it appears below:

rw init=/bin/bash

After you add this code, the GNU GRUB edit menu should look exactly like this:

The modified GNU GRUB edit menu

Now type F10.

At the command prompt, type passwd and then type (and re-enter) a new root password that conforms to the password complexity rules of Photon OS. Remember the password.

Next, type the following command:

umount /

Finally, type the following command. You must include the -f option to force a reboot; otherwise, the kernel enters a state of panic.

reboot -f

This sequence of commands should look like this:

The series of commands to reset the root password

After the Photon OS machine reboots, log in with the new root password.

Resetting the failed logon count

Resetting the root password will not reset the failed logon count, if you’ve had to many failed attempts, you may not be able to logon after resetting the password.

You will know if this is the case, if you see Account locked due to X failed logins at the photon console.

To reset the count, before you unmount the filesystem, run the following…

/sbin/pam_tally2 --reset --user root

2.3 - Fixing Permissions on Network Config Files

When you create a new network configuration file as root user, the network service might be unable to process it until you set the file mode bits to 644.

If you query the journal with journalctl -u systemd-networkd, you might see the following error message along with an indication that the network service did not start:

`could not load configuration files. permission denied`

The permissions on the network files might cause this problem. Without the correct permissions, networkd-systemd cannot parse and apply the settings, and the network configuration that you created will not be loaded.

After you create a network configuration file with a .network extension, you must run the chmod command to set the new file’s mode bits to 644. Example:

`chmod 644 10-static-en.network`

For Photon OS to apply the new configuration, you must restart the systemd-networkd service by running the following command:

`systemctl restart systemd-networkd`

2.4 - Permitting Root Login with SSH

The full version of Photon OS prevents root login with SSH by default. To permit root login over SSH, open /etc/ssh/sshd_config with the vim text editor and set PermitRootLogin to yes.

Vim is the default text editor available in both the full and minimal versions of Photon OS. The full version also contains Nano. After you modify the SSH daemon’s configuration file, you must restart the sshd daemon for the changes to take effect. Example:

vim /etc/ssh/sshd_config

# override default of no subsystems
Subsystem       sftp    /usr/libexec/sftp-server

# Example of overriding settings on a per-user basis
#Match User anoncvs
#       X11Forwarding no
#       AllowTcpForwarding no
#       PermitTTY no
#       ForceCommand cvs server
PermitRootLogin yes
UsePAM yes

Save your changes in vim and then restart the sshd daemon:

systemctl restart sshd

You can then connect to the Photon OS machine with the root account over SSH:

steve@ubuntu:~$ ssh root@198.51.100.131

2.5 - Fixing Sendmail

If Sendmail is not behaving as expected or hangs during installation, it might be because FQDN is not set.

Perform the following steps:

  1. Set an FQDN for your Photon OS machine.

  2. Run the following commands in the order below:

    echo $(hostname -f) > /etc/mail/local-host-names
    
        cat > /etc/mail/aliases << "EOF"
            postmaster: root
            MAILER-DAEMON: root
            EOF
    
        /bin/newaliases
    
        cd /etc/mail
    
        m4 m4/cf.m4 sendmail.mc > sendmail.cf
    
        chmod 700 /var/spool/clientmqueue
    
        chown smmsp:smmsp /var/spool/clientmqueue
    

3 - Photon OS General Troubleshooting

The section includes general troubleshooting instruction for Photon OS.

3.1 - Photon Code

Photon is an RPM based Linux distribution similar to variants like CentOS and Fedora. With RPM based distributions granular updates as opposed to updating the whole OS image is possible.

##SPEC File The “Recipe” for creating an RPM package is a spec file. The Photon code base’s SPECS folder hast the following directory structure:

SourceRoot

       SPECS
            linux
                patch1
                patch2
                linux.spec

##To Check if a Package is Signed Run the following commands to check if the package is signed:

#check if a package is signed
rpm -q linux --qf '%{NAME}-%{VERSION}-%{RELEASE} %{SIGPGP:pgpsig} %{SIGGPG:pgpsig}\n'
linux-4.19.79-2.ph3 RSA/SHA1, Thu 31 Oct 2019 10:05:05 AM UTC, Key ID c0b5e0ab66fd4949 (none)
 
#or
rpm -qi linux | grep "Signature"
Signature   : RSA/SHA1, Thu 31 Oct 2019 10:05:05 AM UTC, Key ID c0b5e0ab66fd4949
 
#Last 8 chars of Key ID: 66fd4949
#See if it matches the version of any of the gpg keys installed.
rpm -qa | grep gpg-pubkey | xargs -n1 rpm -q --queryformat "%{NAME} %{VERSION} %{PACKAGER}\n"
gpg-pubkey 66fd4949 VMware, Inc. -- Linux Packaging Key -- linux-packages@vmware.com
gpg-pubkey 3e1ba8d5 Google Cloud Packages RPM Signing Key gc-team@google.com

##To Check if Your Image Has Vulnerabilities Use the security scanners to find security issues. Alternatively The tdnf updateinfo info command displays all the applicable security updates the host needs.

##To Check if a CVE is Fixed The Photon team fix the vulnerabilities and then publish the advisories to (https://github.com/vmware/photon/wiki/Security-Advisories).

##To Check if Security Updates are Available Use the tdnf updateinfo info, tdnf update --security or tdnf update ---sec-severity <level> commands to check if security updates are available. For example:

#check if there are any security updates
root@photon-9a8c05dd97e9 [ ~ ]# tdnf updateinfo
70 Security notice(s)
 
#check if there are security updates for libssh2. note this is relative to what is installed in local
root@photon-9a8c05dd97e9 [ ~ ]# tdnf updateinfo list libssh2
patch:PHSA-2020-3.0-0047 Security libssh2-1.9.0-2.ph3.x86_64.rpm
patch:PHSA-2019-3.0-0025 Security libssh2-1.9.0-1.ph3.x86_64.rpm
patch:PHSA-2019-3.0-0009 Security libssh2-1.8.2-1.ph3.x86_64.rpm
patch:PHSA-2019-3.0-0008 Security libssh2-1.8.0-2.ph3.x86_64.rpm
 
#show details of all the libssh2 updates
root@photon-9a8c05dd97e9 [ ~ ]# tdnf updateinfo info libssh2
       Name : libssh2-1.9.0-2.ph3.x86_64.rpm
  Update ID : patch:PHSA-2020-3.0-0047
       Type : Security
    Updated : Wed Jan 15 10:48:25 2020
Needs Reboot: 0
Description : Security fixes for {'CVE-2019-17498'}
       Name : libssh2-1.9.0-1.ph3.x86_64.rpm
  Update ID : patch:PHSA-2019-3.0-0025
       Type : Security
    Updated : Sat Aug 17 16:14:35 2019
Needs Reboot: 0
Description : Security fixes for {'CVE-2019-13115'}
       Name : libssh2-1.8.2-1.ph3.x86_64.rpm
  Update ID : patch:PHSA-2019-3.0-0009
       Type : Security
    Updated : Sat Apr 13 03:34:22 2019
Needs Reboot: 0
Description : Security fixes for {'CVE-2019-3859', 'CVE-2019-3862', 'CVE-2019-3861', 'CVE-2019-3857', 'CVE-2019-3858', 'CVE-2019-3863', 'CVE-2019-3860', 'CVE-2019-3856'}
       Name : libssh2-1.8.0-2.ph3.x86_64.rpm
  Update ID : patch:PHSA-2019-3.0-0008
       Type : Security
    Updated : Fri Mar 29 16:04:18 2019
Needs Reboot: 0
Description : Security fixes for {'CVE-2019-3855'}
 
 
 
#install all security updates >= score 9.0 (CVSS_v3.0_Severity)
root@photon-9a8c05dd97e9 [ ~ ]# tdnf update --sec-severity 9.0
Upgrading:
apache-tomcat                  noarch          8.5.50-1.ph3         photon-updates    9.00M 9440211
bash                           x86_64          4.4.18-2.ph3         photon-updates    3.16M 3315720
bzip2                          x86_64          1.0.8-1.ph3          photon-updates  124.99k 127990
bzip2-libs                     x86_64          1.0.8-1.ph3          photon-updates   74.31k 76096
file                           x86_64          5.34-2.ph3           photon-updates   43.02k 44056
file-libs                      x86_64          5.34-2.ph3           photon-updates    5.21M 5458536
git                            x86_64          2.23.1-2.ph3         photon-updates   24.34M 25519969
glib                           x86_64          2.58.0-4.ph3         photon-updates    3.11M 3265152
libseccomp                     x86_64          2.4.0-2.ph3          photon-updates  315.79k 323368
libssh2                        x86_64          1.9.0-2.ph3          photon-updates  238.41k 244136
linux-esx                      x86_64          4.19.97-2.ph3        photon-updates   12.68M 13299655
 
Total installed size:  58.28M 61114889

3.2 - Package Management

TDNF is the default package manager for Photon OS. The standard syntax for tdnf commands is the same as that for DNF and YUM. TDNF reads YUM repositories from /etc/yum.repos.d/.

To find the main configuration file and see its contents, run the following command:

cat /etc/tdnf/tdnf.conf
[main]
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=true
repodir=/etc/yum.repos.d
cachedir=/var/cache/tdnf

Repositories have a .repo file extension, The following repositories are available in /etc/yum.repos.d/ :

ls /etc/yum.repos.d/
lightwave.repo
photon-extras.repo
photon-iso.repo
photon-updates.repo
photon.repo

Use the tdnf repolist command to list the repositories. Tdnf filters the results by their status enabled, disabled, and all. Running the tdnf repolist command without arguments displays the enabled repositories.

#tdnf repolist

repo id repo name status
photon-extras        VMware Photon Extras 3.0(x86_64) enabled
photon-debuginfo VMware Photon Linux debuginfo 3.0(x86_64)enabled
photon                    VMware Photon Linux 3.0(x86_64) enabled
photon-updates     VMware Photon Linux 3.0(x86_64) Updates enabled
root@photon-75829bfd01d0 [ ~ ]#

The following repositories are important for Photon:

  • photon-updates : This repo contains RPM updates for CVE/version and updates/others fixes.
  • photon-debuginfo : This repo contains information about RPMs with debug symbols.
  • photon : This repo generally contains the RPM versions packaged with the released ISO.

To check the local cache data from the repository, run the following command:

# ls -l /var/cache/tdnf/photon
total 12
-r--r----- 1 root root 0 Apr 3 22:34 lastrefresh
drwxr-x--- 2 root root 4096 Apr 3 22:34 repodata
drwxr-x--- 4 root root 4096 Feb 4 14:31 rpms
drwxr-x--- 2 root root 4096 Apr 3 22:34 solvcache

##Usage The tdnf command can be used in the following ways:

#tdnf repolist --refresh : This command is used to refresh the repolist. Generally there is a cache of the repo data stored in the local VM.

#tdnf install <rpm name> : This command is used to install a RPM. This command installs the latest version of the RPM.

#tdnf install <pkg-name>-<verison>-<release>.<photon-release> : This command is used to install a particular RPM version. For example, run # tdnf install systemd-239-11.ph3.

#tdnf list systemd : This command is used to list the available RPM versions in the repository.

#tdnf makecache : This command updates the cached binary metadata for all known repositories.

tdnf clean all : This command cleans up temporary files, data, and metadata. It takes the argument all.

#tdnf list systemd

Refreshing metadata for: 'VMware Photon Linux 3.0(x86_64)'

systemd.x86_64                                                                       239-15.ph3                                            @System

systemd.x86_64                                                                       239-11.ph3                                     photon-updates

systemd.x86_64                                                                       239-12.ph3                                     photon-updates

systemd.x86_64                                                                       239-13.ph3                                     photon-updates

systemd.x86_64                                                                       239-14.ph3                                     photon-updates

systemd.x86_64                                                                       239-15.ph3                                     photon-updates

systemd.x86_64                                                                       239-17.ph3                                     photon-updates

systemd.x86_64                                                                       239-18.ph3                                     photon-updates

systemd.x86_64                                                                       239-19.ph3                                     photon-updates

systemd.x86_64                                                                       239-10.ph3                                             photon

systemd.x86_64                                                                       239-10.ph3                                             photon

root@photon-4a0e7f2307d4 [ /WS/photon-dev/photon ]#

Here, @System indicates that the particular RPM is installed in the VM.

To upgrade/downgrade RPMs run the following commands:

#tdnf upgrade <pkg-name>

#tdnf downgrade <pkg-name>

After upgrade/downgrade the dependent packages must be manually upgraded/downgraded as well. Use the #tdnf remove <pkg-name> command to remove packages and # tdnf clean all to clear cached packages, metadata, dbcache, plugins and expire-cache.

#RPM RPM is an open source package management system capable of building software from source into easily distributable packages. It is used for installing, updating and uninstalling packaged software. RPM can also be used to query detailed information about the packaged software and to check if a particular package is installed or not.

You can do the following operation using the RPM binaries:

  • Install/Upgrade/Downgrade/Remove RPMs from a virtual machine.
  • Check the version of the packages installed.
  • Check the package contents.
  • Check the dependencies of a package.
  • Find the source package of a file.

To find the package that contains a particular binary, run rpm -q —whatprovides <binary/file path> command.

##Usage The rpm command can be used in the following ways:

  • rpm -ivh <rpm file path> : This command installs the RPM in a virtual machine.
  • rpm -Uvh <rpm file path> : This command is used to upgrade/downgrade the RPM.
  • rpm -e <rpm file path> : This command uninstalls the RPM from the virtual machine.
  • rpm -qp <rpm file path> --provides : This displays the libraries provided by the RPM.
  • rpm -qp <rpm file path> --requires : This displays the binaries/libraries required to install a particular rpm.
  • rpm -qa : This displays a list of all installed packages.
  • rpm -ql <package file.rpm> : This command lists all files in the package file.

3.3 - Network Configuration

systemd-networkd is a system daemon that manages network configurations. It detects and configures network devices as they appear. It can also create virtual network devices.

##Configuration Examples All configurations are stored as foo.network in the /etc/systemd/network/, /lib/systemd/network/ and /run/systemd/network/ folder. Use the networkctl list command to list all the devices on the system.

After making changes to a configuration file, restart the systemd-networkd.service if version is < 245, for other version run the following commands:

root@photon [ /home/sus ]# networkctl reload
root@photon [ /home/sus ]# networkctl reconfigure eth0

Note:

  • The options mentioned in the configuration files are case sensitive.
  • Set DHCP=yes to accept IPv4 and IPv6 DHCP requests.
  • Set DHCP=ipv4 to accept IPv4 DHCP requests.
  • Set LinkLocalAddressing=no to disable IPv6. Please do not disable IPv6 via sysctl. When LinkLocalAddressing=no in the .network file, the kernel drops addresses starting with fe80, for example fe80::20c:29ff:fe4c:7eca. If IPv6LL address is not available networkd will not start IPv6 configurations.

To link network configurations using DHCPv4 (IPv6 disabled), run the following command:

/etc/systemd/network/20-eth0.network
[Match]
Name=eth0

[Network]
LinkLocalAddressing=no

DHCP=ipv4

To link network configurations using DHCPv6, run the following command:

/etc/systemd/network/20-eth0.network
[Match]
Name=eth0

[Network]
IPv6AcceptRA=yes

DHCP=ipv6

To link network configurations using a static IP address, run the following command:

/etc/systemd/network/20-wired.network
[Match]
Name=enp1s0

[Network]
Address=10.1.10.9/24
Gateway=10.1.10.1
DNS=10.1.10.1

Here Address= can be used more than once to configure multiple IPv4 or IPv6 addresses.

A .link file can be used to rename an interface. For example, set a predictable interface name for a Ethernet adapter based on its MAC address by running the following command:

/etc/systemd/network/10-test0.link
[Match]
MACAddress=12:34:56:78:90:ab

[Link]
Description=my custom name
Name=test123

##Configuration Files Configuration files are located in /usr/lib/systemd/network/ folder, the volatile runtime network directory in /run/systemd/network/ folder and the local administration network directory in /etc/systemd/network/ folder. Configuration files in /etc/systemd/network/ folder have the highest priority.

There are three types of configuration files and they use a format similar to systemd unit files.

  • .network : These files apply a network configuration to a matching device.

  • .netdev : These files are used to create a virtual network device for a matching environment.

  • .link : When a network device appears, udev looks for the first matching .link file. These link files follow the following rules:

  • Only if all conditions in the [Match] section are matched, the profile will be activated.

  • An empty [Match] section means the profile can apply to any case (can be compared to the * wild card)

  • All configuration files are collectively sorted and processed in lexical order, regardless of the directory it resides in.

  • Files with identical names replace each other.

##Dupliate Matches If we have multiple configuration files matching an interface, the first (in lexical order) network file matching a given device is applied. All other files are ignored even if they match. The following is an example of matching configuration files:

builder@localhost [ ~ ]$ cat /etc/systemd/network/10-eth0.network
[Match]
Name=eth0
[Network]
DHCP=yes
  
builder@localhost [ ~ ]$ cat /etc/systemd/network/99-dhcp-en.network
[Match]
Name=e*
 
[Network]
DHCP=yes
IPv6AcceptRA=no

##Network Files These files are used to set network configuration variables for servers and containers. .network files have the following sections:

###[Match]

ParameterDescriptionAccepted Values
Name=Matches device names. For example: en*. By using ! prefix the list can be inverted.Device names separated by a white space, logical negation (!).
MACAddress=Matches MAC addresses. For example: MACAddress=01:23:45:67:89:ab 00-11-22-33-44-55 AABB.CCDD.EEFFMAC addresses with full colon-, hyphen- or dot-delimited hexadecimal separated by a white space.
Host=Matches the host name or the machine ID of the host.Hostname string or Machine ID
Virtualization=Checks whether the system is running in a virtual environment. Virtualization=false will only match your host machine, while Virtualization=true matches containers or VMs. It is also possible to check for a specific virtualization type or implementation.boolean, logical negation (!), type (vm, container), implementation (qemu, kvm, zvm, vmware, microsoft, oracle, xen, bochs, uml, bhyve, qnx, openvz, lxc, lxc-libvirt, systemd-nspawn, docker, podman, rkt, wsl, acrn)

###[Link]

  • MACAddress= : Used to spoof MAC address.
  • MTUBytes= : Setting a larger MTU value (For example: when using jumbo frames) can significantly speed up your network transfers.
  • Multicast : Enables the use of multicast on interface(s).

###[Network]

ParameterDescriptionAccepted ValuesDefault Value
DHCP=Controls DHCPv4 and/or DHCPv6 client support.Boolean, ipv4, ipv6false
DHCPServer=If enabled, a DHCPv4 server will be started.Booleanfalse
MulticastDNS=Enables multicast DNS support. When set to resolve, only resolution is enabled.Boolean, resolvefalse
DNSSEC=Controls the DNSSEC DNS validation support on the link. When set to allow-downgrade, compatibility with non-DNSSEC capable networks is increased, by automatically turning off DNSSEC.Boolean, allow-downgradefalse
DNS=Configures static DNS addresses. can be specified more than once.inet_pton
Domains=Indicates domains which must be resolved using the DNS servers.domain name, optionally prefixed with a ~
IPForward=If enabled, incoming packets on any network interface will be forwarded to any other interfaces according to the routing table.Boolean, ipv4, ipv6false
IPMasquerade=If enabled, packets forwarded from the network interface appear as if they are coming from the local host.Booleanfalse
IPv6PrivacyExtensions=Configures use of stateless temporary addresses that change over time. When set to prefer-public, the privacy extensions are enabled, but prefers public addresses over temporary addresses. When set to kernel, the kernel’s default setting will be left in place.Boolean, prefer-public, kernelfalse

###[Address] Address= option is mandatory unless DHCP is used.

###[Route]

  • Gateway= option is mandatory unless DHCP is used.
  • Destination= option defines the destination prefix of the route, possibly followed by a slash and the prefix length. If Destination is not present in [Route] section it is treated as a default route. Note: You can add the Address= and Gateway= keys in the [Network] section as a short-hand, if the [Address] section contains only an Address key and [Route] section contains only a Gateway key.

###DHCP

ParameterDescriptionAccepted ValuesDefault Value
UseDNS=Defines the DHCP server to be used.Booleantrue
Anonymize=When set to true, the options sent to the DHCP server will follow RFC7844 (Anonymity Profiles for DHCP Clients) to minimize disclosure of identifying information.Booleanfalse
UseDomains=Defines the DHCP server to be used as the DNS search domain. If set to route, the domain name received from the DHCP server will be used for routing DNS queries only and not for searching. This option can sometimes fix local name resolving when using systemd-resolved.Boolean, routefalse

###[DHCPServer] The following is an example of a DHCP server configuration which works well with hostapd to create a wireless hotspot. IPMasquerade adds the firewall rules for NAT and IPForward enables packet forwarding.

/etc/systemd/network/wlan0.network
[Match]
Name=wlan0

[Network]
Address=10.1.1.1/24
DHCPServer=true
IPMasquerade=true
IPForward=true

[DHCPServer]
PoolOffset=100
PoolSize=20
EmitDNS=yes
DNS=9.9.9.9

##Netdev Files These files create virtual network devices. They have the following two sections:

###[Match]

  • Host= : The host name.
  • Virtualization= : Checks if it is running in a virtual environment.

###[NetDev]

  • Name= : The interface’s name. This is a mandatory field.
  • Kind= : For example: bridge, bond, vlan, veth, sit, etc. This is a mandatory field.

##Link Files These files are an alternative to custom udev rules and will be applied by udev as the device appears. They have the following two sections:

###[Match]

  • MACAddress= : The MAC address.
  • Host= : The host name.
  • Virtualization= : Checks if it is running in a virtual environment.
  • Type= : the device type. For example: vlan.

###[Link]

  • MACAddressPolicy= : Persistent or random addresses.
  • MACAddress= : The MAC address. Note: The system /usr/lib/systemd/network/99-default.link file is sufficient for most cases.

##Debugging Systemd-networkd The log can be generated by creating a drop-in config. For example:

# /etc/systemd/system/systemd-networkd.service.d/override.conf
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

3.4 - Cloud-init

Cloud-init is mixture of Python and Shell scripts that initialize cloud instances of Linux machines. Cloud-init performs boot time configuration of a system. We can configure users, hostname, host network, write files to disk, manage packages, run custom scripts and so on.

##DataSources Datasource is the source of configuration data for cloud-init that is typically given by a user (For example: userdata) or obtained from the cloud that created the configuration drive (For example: metadata). Userdata includes files, YAML configuration files and shell scripts. Metadata includes server name, instance id, display name and other cloud specific details.

Currently there are two datasources used in Photon OS, it’s usage is described in the following sections:

  • DataSourceOVF - Used for GuestOS customization in vSphere.
  • VMwareGuestInfo - Used to read meta, user, and vendor data from VMware vSphere’s GuestInfo interface and initialize the system.

###DataSourceOVF The OVF (Open Virtualization Format) Datasource provides a datasource for reading data from an OVF transport ISO. The vmtoolsd service extracts the customization spec cab file from the OVF and calls either cloud-init or the GuestOS customization scripts. The disable_vmware_customization flag in /etc/cloud/cloud.cfg file determines if GOSC scripts or cloud-init is used.

  • disable_vmware_customization: false : Cloud-init is used for Guest OS customization
  • disable_vmware_customization: true : GuestOS customization scripts is used for Guest OS customization

Note: The default value for disable_vmware_customization is set to true in the /etc/cloud/cloud.cfg file

###VMwareGuestInfo VMwareGuestInfo data source is configured by setting guestinfo properties on a VM. This can be set by performing one of the following:

  • Using the vmware-rpctool provided by open-vmtools.
  • Modifying the vmx file to set the guestinfo properties.

##Debugging Cloud-init Failures Cloud-init has four services which are started in the following sequence:

  1. cloud-init-local - This service locates local data sources and applies networking configurations provided n the metadata (If there is no metadata it applies Fallback). Use $ systemctl status cloud-init-local command to check its status.
  2. cloud-init - This service processes any user-data that is found and runs the cloud_init_modules in /etc/cloud/cloud.cfg. Use $ systemctl status cloud-init command to check its status.
  3. cloud-config - This service runs the cloud_config_modules in /etc/cloud/cloud.cfg file. Use $ systemctl status cloud-config command to check its status.
  4. cloud-final - This service runs any script that a user is accustomed to running after logging into a system (For example: package installations, configs, user-scripts) and runs cloud_final_modules in /etc/cloud/cloud.cfg file. Use $ systemctl status cloud-final command to check its status.

Cloud-init logs are available in the /var/log/cloud-init.log file. Logs for GuestOS customization using DataSourceOVF are available in the /var/log/vmware-imc/toolsDeployPkg.log and /var/log/cloud-init.log files.

To analyze the cloud-init boot time performance, run the following commands:

  • $ cloud-init analyze blame - The blame command prints in descending order, the units that took the longest to run. This output is useful for observe where cloud-init is spending its time during execution.
  • $ cloud-init analyze show - The show command prints a list of units, the time they started and how long they took to complete. It also prints a summary of total time per boot.
  • $ cloud-init analyze dump - The dump command dumps the cloud-init logs for the analyze modules and displays a list of dictionaries that can be consumed for other reporting needs.
  • $ cloud-init status - To know the overall status of clouf-init.

Cloud-init doesn’t configure the network if /etc/cloud/cloud.cfg.d/99-disable-networking-config.cfg file is present and has the following content:

  • network:Item
  • config: disabled

Take a backup of /etc/cloud/cloud.cfg.d/99-disable-networking-config.cfg file and remove it from it’s location. Reconfigure the machine using metadata, userdata and vendordata. Once the configurations are done copy the backup file to the same location. Cloud-init will push it’s fallback configuration when service is restarted or rebooted and there is no local datasource to configure. To avoid this /etc/cloud/cloud.cfg.d/99-disable-networking-config.cfg file is required.

##Run Cloud-init Manually To run cloud-init manually, run the following commands:

/usr/bin/cloud-init -d init  (-d for debug)
/usr/bin/cloud-init -d modules (run all modules)
/usr/bin/cloud-init --file <config-yaml-file-path> init (if you want to run cloud-init with a configuration yaml file)

When cloud-init is running, to force it to run with all configs engaged run the following command:

rm -rf /var/lib/cloud/*

For more information about cloud-init, see https://cloudinit.readthedocs.io/en/latest/index.htmlhttps://cloudinit.readthedocs.io/en/latest/index.html

For more information about cloud-init CLI, see https://cloudinit.readthedocs.io/en/latest/topics/cli.htmlhttps://cloudinit.readthedocs.io/en/latest/topics/cli.html

Note:Include the cloud-init log tarball and the vmtoolsd logs when you raise an issue.

  1. Collect cloud-init log tarball by running the cloud-init collect-logs command.
  2. Collect the vmtoolsd logs from /var/log/vmware-imc/toolsDeployPkg.log file.
  3. Attach the collected logs to the issue ticket.

3.5 - Open-vm-tools/Vmtoolsd

Vmtoolsd is a systemd service, using which we can set guestinfo properties metadata, userdata and vendordata etc., which in turn are consumed by cloud-init. VMwareGuestInfo Datasource uses this guestinfo properties and applies them to the system.

vmware-rpctool is a utility provided by open-vm-tools to set metadata, userdata and vendordata. vmware-rpctool provides info.set and info.get options to set and get the guestinfo properties respectively.

##Debugging To check the status of the vmtoolsd service (vmtoolsd is dependant on vgauthd), run the following commands:

$ systemctl status vmtoolsd vgauthd

$ journalctl -u vmtoolsd

$ journalctl -u vgauthd

To set and get metadata, userdata and vendordata, run the following commands:

$ /usr/bin/vmware-rpctool 'info-get guestinfo.metadata'

$ /usr/bin/vmware-rpctool 'info-get guestinfo.userdata'

$ /usr/bin/vmware-rpctool 'info-get guestinfo.vendordata'

A YAML file can be used as input to the rpctool using following commands:

vmware-rpctool "info-set guestinfo.userdata.encoding base64"
vmware-rpctool "info-set guestinfo.metadata.encoding base64"

vmware-rpctool "info-set guestinfo.metadata ${metadata file contents}"

vmware-rpctool "info-set guestinfo.userdata ${userdata file contents}"

Note:Include the cloud-init log tarball and the vmtoolsd logs when you raise an issue.

  1. Collect cloud-init log tarball by running the cloud-init collect-logs command.
  2. Collect the vmtoolsd logs from /var/log/vmware-imc/toolsDeployPkg.log file.
  3. Attach the logs collected to the issue ticket.

4 - Troubleshooting Tools

Photon OS includes tools that help troubleshoot problems. These tools are installed by default on the full version of Photon OS. On the minimal version of Photon OS, you might have to install a tool before you can use it.

There is a man page on Photon OS for all the tools covered in this section. The man pages provide more information about each tool’s commands, options, and output. To view a tool’s man page, on the Photon OS command line, type man and then the name of the tool. Example:

man strace

4.1 - Common Tools

The following are some tools that you can use to troubleshoot:

Note: Some of the examples in this section are marked as abridged with ellipsis (...).

top

The top tool monitors system resources, workloads, and performance. It can unmask problems caused by processes or applications overconsuming CPUs, time, or RAM.

To view a textual display of resource consumption, run the top command:

top

Use can use ’top’ to kill a runaway or stalled process by typing k followed by its process ID (PID).

Top on Photon OS

If the percent of CPU utilization is consistently high with little idle time, there might be a runaway process overconsuming CPUs. Restarting the service might solve the problem.

To troubleshoot an unknown issue, run Top in the background in batch mode to write its output to a file and collect data about performance:

top d 120 b >> top120second.output

For a list of options that filter top output and other information, see the man page for top.

ps

The ps tool shows the processes running on the machine. The ps tool derives flexibility and power from its options, all of which are covered in the tool’s Photon OS man page:

man ps

You can use the following options of ps for troubleshooting:

  • Show processes by user:

    ps aux

  • Show processes and child processes by user:

    ps auxf

  • Show processes containing the string ssh: ps aux | grep ssh

  • Show processes and the command and options with which they were started: ps auxww

Example abridged output:

ps auxww
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.9  32724  3300 ?        Ss   07:51   0:32 /lib/systemd/systemd --switched-root --system --deserialize 22

netstat

The netstat command can identify bottlenecks causing performance issues. It lists network connections, listening sockets, port information, and interface statistics for different protocols. Examples:

netstat --statistics
netstat --listening

find

Use the find command to troubleshoot a Photon OS machine that has stopped working. The following command lists the files in the root directory that have changed in the past day:

	find / -mtime -1 

See the find manual. Take note of the security considerations listed in the find manual if you are using find to troubleshoot an appliance running on Photon OS.

locate

The locate command is a fast way to find files and directories you onlay have a keyword. This command is similar to find and part of the same findutils package preinstalled on the full version of Photon OS by default. It finds file names in the file names database.

Before you can use locate accurately, update its database:

updatedb

Then run locate to quickly find a file, such as any file name containing .network, which can be helpful to see all the system’s .network configuration files. The following is an abridged example:

locate .network
/etc/dbus-1/system.d/org.freedesktop.network1.conf
/etc/systemd/network/10-dhcp-en.network
/usr/lib/systemd/network/80-container-host0.network
/usr/lib/systemd/network/80-container-ve.network
/usr/lib/systemd/system/busnames.target.wants/org.freedesktop.network1.busname
/usr/lib/systemd/system/dbus-org.freedesktop.network1.service
/usr/lib/systemd/system/org.freedesktop.network1.busnname
/usr/share/dbus-1/system-services/org.freedesktop.network1.service

The locate command is also a quick way to see whether a troubleshooting tool is installed on Photon OS. Examples:

locate strace
/usr/bin/strace
/usr/bin/strace-graph
/usr/bin/strace-log-merge
/usr/share/man/man1/strace.1.gz
/usr/share/vim/vim74/syntax/strace.vim

locate traceroute

In this example, the strace tool is installed but traceroute is not.

You can install traceroute from the Photon OS repository:

tdnf install traceroute

df

The df command reports the disk space available on the file system. Running out of disk space can lead an application to fail and a quick check of the available space makes sense as an early troubleshooting step:

df -h

The -h option prints out the available and used space in human-readable sizes. After checking the space, you should also check the number of available inodes. Too few available inodes can lead to difficult-to-diagnose problems:

df -i

md5sum

The md5sum tool calculates 128-bit RSA Data Security, Inc. MD5 Message Digest Algorithm hashes (a message digest, or digital signature, of a file) to uniquely identify a file and verify its integrity after file transfers, downloads, or disk errors when the security of the file is not in question.

md5sum can help troubleshooting installation issues by verifying that the version of Photon OS being installed matches the version on the Bintray download page. If, for instance, bytes were dropped during the download, the checksums will not match. Try downloading it again.

sha256sum

The sha256sum tool calculates the authenticity of a file to prevent tampering when security is a concern. Photon OS also includes shasum, sha1sum, sha384sum, and sha512sum. See the man pages for md3sum, sha256sum, and the other SHA utilities.

strace

The strace utility follows system calls and signals as they are executed so that you can see what an application, command, or process is doing. strace can trace failed commands, identify where a process obtains its configuration, monitor file activity, and find the location of a crash.

By tracing system calls, strace can help troubleshoot a broad range of problems, including issues with input-output, memory, interprocess communication, network usage, and application performance.

For troubleshooting a problem that gives off few or no clues, the following command displays every system call:

strace ls -al

With strace commands, you can route the output to a file to make it easier to analyze:

strace -o output.txt ls -al

strace can reveal the files that an application tries to open with the -eopen option. This combination can help troubleshoot an application that is failing because it is missing files or being denied access to a file it needs. If, for example, you see “No such file or directory” in the results of strace -eopen, something might be wrong:

strace -eopen sshd
open("/usr/lib/x86_64/libpam.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libpam.so.0", O_RDONLY|O_CLOEXEC) = 3

The results above indicate that the first file is missing because it is found in the next line. In other cases, the application might be unable to open one of its configuration files or it might be reading the wrong one. If the results say “permission denied” for one of the files, check the permissions of the file with ls -l or stat.

When troubleshooting with strace, you can include the process ID in its commands. Here’s an example of how to find a process ID:

ps -ef | grep apache

You can then use strace to examine the file a process is working with:

strace -e trace=file -p 1719

A similar command can trace network traffic:

strace -p 812 -e trace=network

If an application is crashing, use strace to trace the application and then analyze what happens right before the application crashes.

You can also trace the child processes that an application spawns with the fork system call, and you can do so with systemctl commands that start a process to identify why an application crashes immediately or fails to start:

strace -f -o output.txt systemctl start httpd

Example: If journalctl is showing that networkd is failing, you can run strace to troubleshoot:

strace -o output.txt systemctl restart systemd-networkd

Then grep inside the results for something, such as exit or error:

grep exit output.txt

If the results indicate systemd-resolved is going wrong, you can then strace it:

strace -f -o output.txt systemctl restart systemd-resolved

file

The file command determines the file type, which can help troubleshoot problems when an application mistakes one type of file for another, leading it to errors. Example:

file /usr/sbin/sshd
/usr/sbin/sshd: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, stripped

stat

The stat command can help troubleshoot problems with files or the file system by showing the last date it was modified and other information. Example:

stat /dev/sda1
File: '/dev/sda1'
Size: 0               Blocks: 0          IO Block: 4096   block special file
Device: 6h/6d   Inode: 6614        Links: 1     Device type: 8,1
Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    8/    disk)
Access: 2016-09-02 12:23:56.135999936 +0000
Modify: 2016-09-02 12:23:52.879999981 +0000
Change: 2016-09-02 12:23:52.879999981 +0000
Birth: -

On Photon OS, stat is handy to show permissions for a file or directory in both their absolute octal notation and their read-write-execute abbreviation; truncated example:

chmod 777 tester.md
stat tester.md
  File: 'tester.md'
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 801h/2049d      Inode: 316385      Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)

watch

The watch utility runs a command at regular intervals so you can observe how its output changes over time. watch can help dynamically monitor network links, routes, and other information when you are troubleshooting networking or performance issues. Examples:

watch -n0 --differences ss
watch -n1 --differences ip route

The following is an example with a screenshot of the output. This command monitors the traffic on your network links. The highlighted numbers are updated every second so you can see the traffic fluctuating:

watch -n1 --differences ip -s link show up

The dynamic output of the watch utility

vmstat and fdisk

The vmstat tool displays statistics about virtual memory, processes, block input-output, disks, and CPU activity. This tool can help diagnose performance problems, especially system bottlenecks.

Its output on a Photon OS virtual machine running in VMware Workstation 12 Pro without a heavy load looks like this:

vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0   5980  72084 172488    0    0    27    44  106  294  1  0 98  1  0

These codes are explained in the vmstat man page.

- If `r`, the number of runnable processes, is higher than 10, the machine is under stress; consider intervening to reduce the number of processes or to distribute some of the processes to other machines. In other words, the machine has a bottleneck in executing processes.
- If `cs`, the number of context switches per second, is really high, there may be too many jobs running on the machine.
- If `in`, the number of interrupts per second, is relatively high, there might be a bottleneck for network or disk IO.

You can investigate disk IO further by using vmstat’s -d option to report disk statistics. The following is an abridged example on a machine with little load:

vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
       total merged sectors      ms  total merged sectors      ms    cur    sec
ram0       0      0       0       0      0      0       0       0      0      0
ram1       0      0       0       0      0      0       0       0      0      0
loop0      0      0       0       0      0      0       0       0      0      0
loop1      0      0       0       0      0      0       0       0      0      0
sr0        0      0       0       0      0      0       0       0      0      0
sda    22744    676  470604   12908  72888  24949  805224  127692      0    130

The -D option summarizes disk statistics:

vmstat -D
           26 disks
            2 partitions
        22744 total reads
          676 merged reads
       470604 read sectors
        12908 milli reading
        73040 writes
        25001 merged writes
       806872 written sectors
       127808 milli writing
            0 inprogress IO
          130 milli spent IO

You can also get statistics about a partition. First, run the fdisk -l command to list the machine’s devices. Then run vmstat -p with the name of a device to view its stats:

fdisk -l
Disk /dev/ram0: 4 MiB, 4194304 bytes, 8192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
...
Device        Start      End  Sectors Size Type
/dev/sda1      2048 16771071 16769024   8G Linux filesystem
/dev/sda2  16771072 16777182     6111   3M BIOS boot

vmstat -p /dev/sda1
sda1          reads   read sectors  writes    requested writes
               22579     473306      78510     866088

See the vmstat man page for more options.

lsof

The lsof command lists open files. The tool’s definition of an open file includes directories, libraries, streams, domain sockets, and Internet sockets. THis enables it to identify the files a process is using. Because a Linux system like Photon OS uses files to do its work, you can run lsof as root to see how the system is using them and to see how an application works.

If you cannot unmount a disk because it is in use, you can run lsof to identify the files on the disk that are being used.

The following is an example that shows the processes that are using the root directory:

lsof /root
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
bash       879 root  cwd    DIR    8,1     4096 262159 /root
bash      1265 root  cwd    DIR    8,1     4096 262159 /root
sftp-serv 1326 root  cwd    DIR    8,1     4096 262159 /root
gdb       1351 root  cwd    DIR    8,1     4096 262159 /root
bash      1395 root  cwd    DIR    8,1     4096 262159 /root
lsof      1730 root  cwd    DIR    8,1     4096 262159 /root

You can do the same with an application or virtual appliance by running lsof with the user name or process ID of the app. The following example lists the open files used by the Apache HTTP Server:

lsof -u apache

Running the command with the -i option lists all the open network and Internet files, which can help troubleshoot network problems:

lsof -i

See the Unix socket addresses of a user like zookeeper:

lsof -u zookeeper -U

The following example shows the processes running on Ports 1 through 80:

lsof -i TCP:1-80
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
httpd    403   root    3u  IPv6  10733      0t0  TCP *:http (LISTEN)
httpd    407 apache    3u  IPv6  10733      0t0  TCP *:http (LISTEN)
httpd    408 apache    3u  IPv6  10733      0t0  TCP *:http (LISTEN)
httpd    409 apache    3u  IPv6  10733      0t0  TCP *:http (LISTEN)
sshd     820   root    3u  IPv4  11336      0t0  TCP *:ssh (LISTEN)
sshd     820   root    4u  IPv6  11343      0t0  TCP *:ssh (LISTEN)
sshd    1258   root    3u  IPv4  48040      0t0  TCP 198.51.100.143:ssh->198.51.100.1:49759 (ESTABLISHED)
sshd    1319   root    3u  IPv4  50866      0t0  TCP 198.51.100.143:ssh->198.51.100.1:51054 (ESTABLISHED)
sshd    1388   root    3u  IPv4  56438      0t0  TCP 198.51.100.143:ssh->198.51.100.1:60335 (ESTABLISHED)

You can also inspect the files opened by a process ID. The following example queries the files open by the systemd network service:

lsof -p 1917
COMMAND    PID            USER   FD      TYPE             DEVICE SIZE/OFF   NODE NAME
systemd-n 1917 systemd-network  cwd       DIR                8,1     4096      2 /
systemd-n 1917 systemd-network  txt       REG                8,1   887896 272389 /usr/lib/systemd/systemd-networkd
systemd-n 1917 systemd-network  mem       REG                8,1   270680 262267 /usr/lib/libnss_files-2.22.so
systemd-n 1917 systemd-network    0r      CHR                1,3      0t0   5959 /dev/null
systemd-n 1917 systemd-network    1u     unix 0x0000000000000000      0t0  45734 type=STREAM
systemd-n 1917 systemd-network    3u  netlink                         0t0   6867 ROUTE
systemd-n 1917 systemd-network    4u     unix 0x0000000000000000      0t0  45744 type=DGRAM
systemd-n 1917 systemd-network    9u  netlink                         0t0  45754 KOBJECT_UEVENT
systemd-n 1917 systemd-network   12u  a_inode               0,11        0   5955 [timerfd]
systemd-n 1917 systemd-network   13u     IPv4             104292      0t0    UDP 198.51.100.143:bootpc

fuser

The fuser command identifies the process IDs of processes using files or sockets. The term process is, in this case, synonymous with user. To identify the process ID of a process using a socket, run fuser with its namespace option and specify tcp or udp and the name of the process or port. Examples:

fuser -n tcp ssh
ssh/tcp:               940  1308
fuser -n tcp http
http/tcp:              592   594   595   596
fuser -n tcp 80
80/tcp:                592   594   595   596

ldd

By revealing the shared libraries that a program depends on, ldd can help troubleshoot an application that is missing a library or finding the wrong one.

For example, if you get a “file not found” output, check the path to the library.

ldd /usr/sbin/sshd
linux-vdso.so.1 (0x00007ffc0e3e3000)
libpam.so.0 => (file not found)
libcrypto.so.1.0.0 => /usr/lib/libcrypto.so.1.0.0 (0x00007f624e570000)

You can also use the objdump command to show dependencies for a program’s object files; example:

objdump -p /usr/sbin/sshd | grep NEEDED

gdb

The gdb tool is the GNU debugger. It lets you see inside a program while it executes or when it crashes so that you can catch errors as they occur. The gdb tool is typically used to debug programs written in C and C++. On Photon OS, gdb can help you determine why an application crashed. See the man page for gdb for instructions on how to run it.

For an extensive example on how to use gdb to troubleshoot Photon OS running on a VM when you cannot login to Photon OS, see the section on troubleshooting boot and logon problems.

4.2 - Troubleshooting Tools Installed by Default

The following troubleshooting tools are included in the full version of Photon OS:

  • grep. Searches files for patterns.
  • ping. Tests network connectivity.
  • strings. Displays the characters in a file to identify its contents.
  • lsmod. Lists loaded modules.
  • ipcs. Shows data about the inter-process communication (IPC) resources to which a process has read access. This data includes shared memory segments, message queues, and semaphore arrays.
  • nm. Lists symbols from object files.
  • diff. Compares files side by side. This tool is useful to compare configuration files of two versions when one version works and the other does not.

4.3 - Installing Tools from Repositories

You can install several troubleshooting tools from the Photon OS repositories by using the default package management system, tdnf.

If a tool you require is not installed, search the repositories to see if it is available.

For example, the traceroute tool is not installed by default. You can search for it in the repositories as follows:

tdnf search traceroute
traceroute : Traces the route taken by packets over an IPv4/IPv6 network

The results of the above command show that traceroute exists in the repository. You install it with tdnf:

tdnf install traceroute

The following tools are not installed by default but are in the repository and can be installed with tdnf:

  • net-tools. Networking tools.

  • ltrace. Tool for intercepting and recording dynamic library calls. It can identify the function an application was calling when it crashed, making it useful for debugging.

  • nfs-utils. Client tools for the kernel Network File System, or NFS, including showmount. These are installed by default in the full version of Photon OS but not in the minimal version.

  • pcstat. A tool that inspects which pages of a file or files are being cached by the Linux kernel.

  • sysstat and sar. Utilities to monitor system performance and usage activity. Installing sysstat also installs sar.

  • systemtap and crash. The systemtap utility is a programmable instrumentation system for diagnosing problems of performance or function. Installing systemtap also installs crash, which is a kernel crash analysis utility for live systems and dump files.

  • dstat. Tool for viewing and analyzing statistics about system resources.

    The dstat tool can help troubleshoot system performance. The tool shows live, running list of statistics about system resources:

      dstat
      You did not select any stats, using -cdngy by default.
      ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
      usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
        1   0  98   1   0   0|4036B   42k|   0     0 |   0     0 |  95   276
        1   0  98   1   0   0|   0    64k|  60B  940B|   0     0 | 142   320
        1   1  98   0   0   0|   0    52k|  60B  476B|   0     0 | 149   385
    

4.4 - Linux Troubleshooting Tools

The following Linux troubleshoot tools are neither installed on Photon OS by default nor available in the Photon OS repositories:

  • iostat
  • telnet (use SSH instead)
  • Iprm
  • hdparm
  • syslog (use journalctl instead)
  • ddd
  • ksysmoops
  • xev
  • GUI tools (because Photon OS has no GUI)

5 - Systemd

Photon OS manages services with systemd and systemctl, its command-line utility for inspecting and controlling the system. It does not use the deprecated commands of init.d.

Basic system administration commands on Photon OS differ from those on operating systems that use SysVinit. Since Photon OS uses systemd instead of SysVinit, you must use systemd commands to manage services.

For example, instead of running the /etc/init.d/ssh script to stop and start the OpenSSH server on a init.d-based Linux system, you control the service by running the following systemctl commands on Photon OS:

systemctl stop sshd
systemctl start sshd

For an overview of systemd, see systemd System and Service Manager and the man page for systemd. The systemd man pages are listed at https://www.freedesktop.org/software/systemd/man/.

5.1 - Enabling `systemd` Debug Shell During Boot

To diagnose systemd related boot issues, you can enable early shell access during boot.

Perform the following steps to enable early shell access:

  1. Restart the Photon OS machine or the virtual machine running Photon OS.

    When the Photon OS splash screen appears, as it restarts, type the letter e quickly.

  2. Append systemd.debug-shell=1 to the kernel command line.

    Optionally, to change logging level to debug, you can append systemd.log_level=debug.

  3. Press F10 to proceed with the boot.

  4. Press Alt+Ctrl+F9 to switch to tty9 to access the debug shell.

5.2 - Troubleshooting Services With 'systemctl`

To view a description of all the active, loaded units, execute the systemctl command without any options or arguments:

systemctl

To see all the loaded, active, and inactive units and their description, run this command:

systemctl --all

To see all the unit files and their current status but no description, run this command:

systemctl list-unit-files

The grep command filters the services by a search term, a helpful tactic to recall the exact name of a unit file without looking through a long list of names. Example:

systemctl list-unit-files | grep network
org.freedesktop.network1.busname           static
dbus-org.freedesktop.network1.service      enabled
systemd-networkd-wait-online.service       enabled
systemd-networkd.service                   enabled
systemd-networkd.socket                    enabled
network-online.target                      static
network-pre.target                         static
network.target  

For example, to list all the services that you can manage on Photon OS, you run the following command instead of ls /etc/rc.d/init.d/:

systemctl list-unit-files --type=service

Similarly, to check whether the sshd service is enabled, on Photon OS you run the following command instead of chkconfig sshd:

systemctl is-enabled sshd

The chkconfig --list command that shows which services are enabled for which runlevel on a SysVinit computer becomes substantially different on Photon OS because there are no runlevels, only targets:

ls /etc/systemd/system/*.wants

You can also display similar information with the following command:

systemctl list-unit-files --type=service

The following is list of some of the systemd commands that take the place of SysVinit commands on Photon OS:

USE THIS SYSTEMD COMMAND 	INSTEAD OF THIS SYSVINIT COMMAND
systemctl start sshd 		service sshd start
systemctl stop sshd 		service sshd stop
systemctl restart sshd 		service sshd restart
systemctl reload sshd 		service sshd reload
systemctl condrestart sshd 	service sshd condrestart
systemctl status sshd 		service sshd status
systemctl enable sshd 		chkconfig sshd on
systemctl disable sshd 		chkconfig sshd off
systemctl daemon-reload		chkconfig sshd --add

5.3 - Analyzing System Logs with `journalctl`

The journalctl tool queries the contents of the systemd journal. On Photon OS, all the system logs except the installation log and the cloud-init log are written into the systemd journal.

When you run the journalctl command without any parameters, it displays all the contents of the journal, beginning with the oldest entry.

To display the output in reverse order with new entries first, include the -r option in the command:

journalctl -r

The journalctl command includes many options to filter its output. For help troubleshooting systemd, two journalctl queries are particularly useful:

  • Showing the log entries for the last boot.

    The following command displays the messages that systemd generated during the last time the machine started:

    journalctl -b

  • Showing the log entries for a systemd service unit.Item

    The following command reveals the messages for only the systemd service unit specified by the -u option, which in the following example is the auditing service:

    journalctl -u auditd

You can look at the messages for systemd itself or for the network service:

journalctl -u systemd
journalctl -u systemd-networkd

Example:

root@photon-1a0375a0392e [ ~ ]# journalctl -u systemd-networkd
-- Logs begin at Tue 2016-08-23 14:35:50 UTC, end at Tue 2016-08-23 23:45:44 UTC. --
Aug 23 14:35:52 photon-1a0375a0392e systemd[1]: Starting Network Service...
Aug 23 14:35:52 photon-1a0375a0392e systemd-networkd[458]: Enumeration completed
Aug 23 14:35:52 photon-1a0375a0392e systemd[1]: Started Network Service.
Aug 23 14:35:52 photon-1a0375a0392e systemd-networkd[458]: eth0: Gained carrier
Aug 23 14:35:53 photon-1a0375a0392e systemd-networkd[458]: eth0: DHCPv4 address 198.51.100.1
Aug 23 14:35:54 photon-1a0375a0392e systemd-networkd[458]: eth0: Gained IPv6LL
Aug 23 14:35:54 photon-1a0375a0392e systemd-networkd[458]: eth0: Configured

For more information, see journalctl or the journalctl man page by running this command: man journalctl

5.4 - Inspecting Services with `systemd-analyze`

The systemd-analyze command reveals performance statistics for boot times, traces system services, and verifies unit files. It can help troubleshoot slow system boots and incorrect unit files. See the man page for a list of options.

Examples:

systemd-analyze blame

systemd-analyze dump

5.5 - Inspecting Services with `systemd-analyze`systemd

systemd is a suite of basic building blocks for a Linux system. It provides a system and service manager that runs as Process ID 1 and starts the rest of the system.

To manage the services run the following commands:

  • systemctl or systemctl list-units : This command lists the running units.
  • systemctl --failed : This command lists failed units.
  • systemctl list-unit-files : This command lists all the installed unit files. The unit files are usually present in /usr/lib/systemd/system/ and /etc/systemd/system/.
  • systemctl status pid : This command displays the cgroup slice, memory and parent for a PID.
  • systemctl start unit : This command starts a unit immediately.
  • systemctl stop unit : This command stops a unit.
  • systemctl restart unit : This command restarts a unit.
  • systemctl reload unit : This command asks a unit to reload its configuration.
  • systemctl status unit : This command displays the status of a unit.
  • systemctl enable unit : This command enables a unit to run on startup.
  • systemctl enable --now unit : This command enables a unit to run on startup and start immediately.
  • systemctl disable unit : This command disables a unit and removes it from the startup program.
  • systemctl mask unit : This command masks a unit to make it impossible to start.
  • systemctl unmask unit : This command unmasks a unit.

To get an overview of the system boot-up time, run the following command:

systemd-analyze

To view a list of all running units, sorted by the time they took to initialize (highest time on top), run the following command:

systemd-analyze blame

6 - Network Troubleshooting

Use the systemd suite of commands and not deprecated init.d commands or other deprecated commands, to manage networking.

For information about tcpdump and netcat, see Installing the Packages for tcpdump and netcat with tdnf

6.1 - Managing the Network Configuration

The network service, which is enabled by default, starts when the system boots. You manage the network service by using systemd commands, such as systemd-networkd, systemd-resolvd, and networkctl.

You can check the status of the network service by running the following command:

systemctl status systemd-networkd

The following is a result of the command:

* systemd-networkd.service - Network Service
   Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2016-04-29 15:08:51 UTC; 6 days ago
     Docs: man:systemd-networkd.service(8)
 Main PID: 291 (systemd-network)
   Status: "Processing requests..."
   CGroup: /system.slice/systemd-networkd.service
           `-291 /lib/systemd/systemd-networkd

6.2 - Inspecting IP Addresses

VMware recommends that you use the ip or ss commands as the ifconfig and netstat commands are deprecated.

To display a list of network interfaces, run the ss command. Similarly, to display information for IP addresses, run the ip addr command.

Examples:

USE THIS IPROUTE COMMAND 	INSTEAD OF THIS NET-TOOL COMMAND
ip addr 					ifconfig -a
ss 							netstat
ip route 					route
ip maddr 					netstat -g
ip link set eth0 up 		ifconfig eth0 up
ip -s neigh					arp -v
ip link set eth0 mtu 9000	ifconfig eth0 mtu 9000

Use the ip route version of a command instead of the net-tools to get accurate information:

ip neigh
198.51.100.2 dev eth0 lladdr 00:50:56:e2:02:0f STALE
198.51.100.254 dev eth0 lladdr 00:50:56:e7:13:d9 STALE
198.51.100.1 dev eth0 lladdr 00:50:56:c0:00:08 DELAY

arp -a
? (198.51.100.2) at 00:50:56:e2:02:0f [ether] on eth0
? (198.51.100.254) at 00:50:56:e7:13:d9 [ether] on eth0
? (198.51.100.1) at 00:50:56:c0:00:08 [ether] on eth0

Important: If you modify an IPv6 configuration or add an IPv6 interface, you must restart systemd-networkd. Traditional methods of using ifconfig commands will be inadequate to register the changes. Run the following command instead:

systemctl restart systemd-networkd

6.3 - Inspecting the Status of Network Links with `networkctl`

The networkctl command displays information about network connections that helps you configure networking services and troubleshoot networking problems.

You can progressively add options and arguments to the networkctl command to move from general information about network connections to specific information about a network connection.

Running networkctl without options defaults to the list command:

networkctl
IDX LINK             TYPE               OPERATIONAL SETUP
  1 lo               loopback           carrier     unmanaged
  2 eth0             ether              routable    configured
  3 docker0          ether              routable    unmanaged
 11 vethb0aa7a6      ether              degraded    unmanaged
 4 links listed.

Run the networkctl with the status command to display active network links with IP addresses for not only the Ethernet connection, but also the Docker container.

root@photon-rc [ ~ ]# networkctl status
*      State: routable
     Address: 198.51.100.131 on eth0
              172.17.0.1 on docker0
              fe80::20c:29ff:fe55:3ca6 on eth0
              fe80::42:f0ff:fef7:bd81 on docker0
              fe80::4c84:caff:fe76:a23f on vethb0aa7a6
     Gateway: 198.51.100.2 on eth0
         DNS: 198.51.100.2

You can add a network link, such as the Ethernet connection, as the argument of the status command to show specific information about the link:

root@photon-rc [ ~ ]# networkctl status eth0
* 2: eth0
       Link File: /usr/lib/systemd/network/99-default.link
    Network File: /etc/systemd/network/10-dhcp-en.network
            Type: ether
           State: routable (configured)
            Path: pci-0000:02:01.0
          Driver: e1000
      HW Address: 00:0c:29:55:3c:a6 (VMware, Inc.)
             MTU: 1500
         Address: 198.51.100.131
                  fe80::20c:29ff:fe55:3ca6
         Gateway: 198.51.100.2
             DNS: 198.51.100.2
        CLIENTID: ffb6220feb00020000ab116724f520a0a77337

You can add a Docker container as follows:

networkctl status docker0
* 3: docker0
       Link File: /usr/lib/systemd/network/99-default.link
    Network File: n/a
            Type: ether
           State: routable (unmanaged)
          Driver: bridge
      HW Address: 02:42:f0:f7:bd:81
             MTU: 1500
         Address: 172.17.0.1
                  fe80::42:f0ff:fef7:bd81

In the example above, the output indicates that state of the Docker container is unmanaged. Docker uses the bridge drive to handle managing the networking for the containers and not systemd-resolved or systemd-networkd.

For more information about networkctl commands and options, see https://www.freedesktop.org/software/systemd/man/networkctl.html.

6.4 - Network Debugging

You can set systemd-networkd to work in debug mode so that you can analyze log files with debugging information to help troubleshoot networking problems.

The following procedure turns on network debugging by adding a drop-in file in /etc/systemd to customize the default systemd configuration in /usr/lib/systemd.

  1. Run the following command as root to create a directory with this exact name, including the .d extension:

    mkdir -p /etc/systemd/system/systemd-networkd.service.d/

  2. Run the following command as root to establish a systemd drop-in unit with a debugging configuration for the network service:

    cat > /etc/systemd/system/systemd-networkd.service.d/10-loglevel-debug.conf << "EOF"
    	[Service]
    	Environment=SYSTEMD_LOG_LEVEL=debug
    	EOF
    
  3. Reload the systemctl daemon and restart the systemd-networkd service for the changes to take effect:

    systemctl daemon-reload
    	systemctl restart systemd-networkd
    
  4. Verify that your changes took effect:

    `systemd-delta --type=extended`
    
  5. View the log files by running this command:

    `journalctl -u systemd-networkd`
    
  6. After debugging the network connections, turn debugging off by deleting the drop-in file:

    rm /etc/systemd/system/systemd-networkd.service.d/10-loglevel-debug.conf

6.5 - Checking Firewall Rules

The design of Photon OS emphasizes security. On the minimal and full versions of Photon OS, the default security policy turns on the firewall and drops packets from external interfaces and applications. As a result, you might need to add rules to iptables to permit forwarding, allow protocols like HTTP, and open ports. In other words, you must configure the firewall for your applications and requirements.

The default iptables settings on the full version look like this:

iptables --list
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh

Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere

To find out how to adjust the settings, see the man page for iptables.

Although the default iptables policy accepts SSH connections, the sshd configuration file on the full version of Photon OS is set to reject SSH connections. See Permitting Root Login with SSH.

If you are unable to ping a Photon OS machine, check the firewall rules. Verify if the rules allow connectivity for the port and protocol.

You can supplement the iptables commands by using lsof to, for instance, see the processes listening on ports:

lsof -i -P -n

6.6 - Inspect Network Settings with `netmgr`

If you are running a VMware appliance on Photon OS and the VAMI module has problems or if there are networking issues, you can use the Photon OS netmgr utility to inspect the networking settings. Make sure that the IP addresses for the DNS server and other infrastructure are correct. Use tcpdump to analyze the issues.

The error code that you get from netmgr is a standard Unix error code. Enter it into a search engine to obtain more information on the error.

7 - File System Troubleshooting

Photon OS includes commands to check and troubleshoot file systems.

7.1 - Checking Disk Space

One of the first simple steps to take while troubleshooting is to check how much disk space is available by running the df command:

df -h

7.2 - Adding a Disk and Partitioning It

If the df command shows that the file system is indeed nearing capacity, you can add a new disk on the fly and partition it to increase capacity.

  1. Add a new disk.

    For example, you can add a new disk to a virtual machine by using the VMware vSphere Client. After adding a new disk, check for the new disk by using fdisk. In the following example, the new disk is named /dev/sdb:

     fdisk -l
     Device        Start      End  Sectors Size Type
     /dev/sda1      2048 16771071 16769024   8G Linux filesystem
     /dev/sda2  16771072 16777182     6111   3M BIOS boot
    
     Disk /dev/sdb: 1 GiB, 1073741824 bytes, 2097152 sectors
     Units: sectors of 1 * 512 = 512 bytes
     Sector size (logical/physical): 512 bytes / 512 bytes
     I/O size (minimum/optimal): 512 bytes / 512 bytes
    
  2. Partition it with the parted wizard.

    The command to partition the disk on Photon OS is as follows:

    parted /dev/sdb

    Use the parted wizard to create it as follows:

    mklabel gpt mkpart ext3 1 1024

  3. Create a file system on the partition:

    mkfs -t ext3 /dev/sdb1
    
  4. Make a directory where you will mount the new file system:

    mkdir /newdata
    
  5. Open /etc/fstab and add the new file system with the options that you require:

       	#system mnt-pt  type    options dump    fsck
       	/dev/sda1       /       ext4    defaults,barrier,noatime,noacl,data=ord$
       	/dev/cdrom      /mnt/cdrom      iso9660 ro,noauto       0       0
       	/dev/sdb1       /newdata        ext3    defaults        0		0
    
  6. Mount it using the following command:

    `mount /newdata`
    

    Verify the results:

df -h Filesystem Size Used Avail Use% Mounted on /dev/root 7.8G 4.4G 3.1G 59% / devtmpfs 172M 0 172M 0% /dev tmpfs 173M 0 173M 0% /dev/shm tmpfs 173M 664K 172M 1% /run tmpfs 173M 0 173M 0% /sys/fs/cgroup tmpfs 173M 36K 173M 1% /tmp tmpfs 35M 0 35M 0% /run/user/0 /dev/sdb1 945M 1.3M 895M 1% /newdata

7.3 - Expanding Disk Partition

If you require more space, you can expand the last partition of your disk after resizing the disk.

The commands in this section assume sda as disk device.

  1. After the disk is resized in the virtual machine, use the following command to enable the system to recognize the new disk ending boundary without rebooting:

    echo 1 > /sys/class/block/sda/device/rescan
    
  2. Install the parted package to resize the disk partition by running the following command to install it:

     `tdnf install parted`.
    
    	# parted /dev/sda
    	GNU Parted 3.2
    	Using /dev/sda
    	Welcome to GNU Parted! Type 'help' to view a list of commands.
    
  3. List all partitions available to fix the GPT and check the last partition number:

    
    (parted) print
    
    Warning: Not all of the space available to /dev/sda appears to be used, you can
    fix the GPT to use all of the space (an extra 4194304 blocks) or continue with
    the current setting? 
    Fix/Ignore?
    
    Press `f` to fix the GPT layout.
    
    Model: VMware Virtual disk (scsi)
    Disk /dev/sda: 34.4GB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    Disk Flags: 
    
    Number  Start   End     Size    File system  Name  Flags
    1      1049kB  3146kB  2097kB                     bios_grub
    2      3146kB  8590MB  8587MB  ext4
    

    In this case we have the partition `2` as last, then we extend the partition to 100% of the remaining size:
    
    	(parted) resizepart 2 100%

1. Expand the filesystem to the new size:
	
    ```
    resize2fs /dev/sda2
    	resize2fs 1.42.13 (17-May-2015)
    	Filesystem at /dev/sda2 is mounted on /; on-line resizing required
    	old_desc_blocks = 1, new_desc_blocks = 2
    	The filesystem on /dev/sda2 is now 8387835 (4k) blocks long.
    ```

    The new space is already available in the system:
    
    	df -h
    	Filesystem      Size  Used Avail Use% Mounted on
    	/dev/root        32G  412M   30G   2% /
    	devtmpfs       1001M     0 1001M   0% /dev
    	tmpfs          1003M     0 1003M   0% /dev/shm
    	tmpfs          1003M  252K 1003M   1% /run
    	tmpfs          1003M     0 1003M   0% /sys/fs/cgroup
    	tmpfs          1003M     0 1003M   0% /tmp
    	tmpfs           201M     0  201M   0% /run/user/0

7.4 - List Disk Partitions with `fdisk`

The fdisk command manipulates the disk partition table. You can, for example, use fdisk to list the disk partitions so that you can identify the root Linux file system.

The following example shows /dev/sda1 to be the root Linux partition:

fdisk -l
Disk /dev/ram0: 4 MiB, 4194304 bytes, 8192 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
...
Disk /dev/sda: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3CFA568B-2C89-4290-8B52-548732A3972D

Device        Start      End  Sectors Size Type
/dev/sda1      2048 16771071 16769024   8G Linux filesystem
/dev/sda2  16771072 16777182     6111   3M BIOS boot

7.5 - File System Consistency Check Tool

You can manually check the file system by using the file system consistency check tool, fsck, after you unmount the file system.

The Photon OS file system includes btrfs and ext4. The default root file system is ext4, which you can see by looking at the file system configuration file, /etc/fstab:

```
cat /etc/fstab
	#system mnt-pt  type    options dump    fsck
	/dev/sda1       /       ext4    defaults,barrier,noatime,noacl,data=ordered     1       1
	/dev/cdrom      /mnt/cdrom      iso9660 ro,noauto       0       0
```

The 1 in the fifth column, under fsck, indicates that fsck checks the file system when the system boots.

You can also perform a read-only check without unmounting it:

```
fsck -nf /dev/sda1
	fsck from util-linux 2.27.1
	e2fsck 1.42.13 (17-May-2015)
	Warning!  /dev/sda1 is mounted.
	Warning: skipping journal recovery because doing a read-only filesystem check.
	Pass 1: Checking inodes, blocks, and sizes
	Pass 2: Checking directory structure
	Pass 3: Checking directory connectivity
	Pass 4: Checking reference counts
	Pass 5: Checking group summary information
	Free blocks count wrong (1439651, counted=1423942).
	Fix? no
	Free inodes count wrong (428404, counted=428397).
	Fix? no
	/dev/sda1: 95884/524288 files (0.3% non-contiguous), 656477/2096128 blocks
```

The inodes count might be wrong because the file system is mounted and in use.

To fix errors, you must first unmount the file system and then run fsck again:

```
umount /dev/sda1
umount: /: target is busy
```

You can find information about processes that use the device by using lsof or fuser.

```
lsof | grep ^jbd2/sd
	jbd2/sda1   99                root  cwd       DIR                8,1     4096          2 /
	jbd2/sda1   99                root  rtd       DIR                8,1     4096          2 /
	jbd2/sda1   99                root  txt   unknown                                        /proc/99/exe
```

The above example indicates that file system is in use.

7.6 - Fixing File System Errors When fsck Fails

Sometimes when fsck runs during startup, it encounters an error that prevents the system from fully booting until you fix the issue by running fsck manually. This error might occur when Photon OS is the operating system for a VM running an appliance.

If fsck fails when the computer boots and an error message says to run fsck manually, you can troubleshoot by restarting the VM, altering the GRUB edit menu to enter emergency mode before Photon OS fully boots, and running fsck.

Perform the following steps:

  1. Take a snapshot of the virtual machine.

  2. Restart the virtual machine running Photon OS.

    When the Photon OS splash screen appears as it restarts, type the letter e quickly to go to the GNU GRUB edit menu.

    Note: You must type e quickly as Photon OS reboots quickly. Also, in VMware vSphere or VMware Workstation Pro, you might have to give the console focus by clicking in its window before it will register input from the keyboard.

  3. In the GNU GRUB edit menu, go to the end of the line that starts with linux, add a space, and then add the following code exactly as it appears below:

    systemd.unit=emergency.target

  4. Type F10.

  5. In the bash shell, run one of the following commands to fix the file system errors, depending on whether sda1 or sda2 represents the root file system:

    e2fsck -y /dev/sda1

    or

    e2fsck -y /dev/sda2

  6. Restart the virtual machine.

8 - Troubleshooting Packages

On Photon OS, tdnf is the default package manager. The standard syntax for tdnf commands is the same as that for DNF and Yum:

tdnf [options] <command> [<arguments>...]

The main configuration files reside in /etc/tdnf/tdnf.conf. The repositories appear in /etc/yum.repos.d/ with .repo file extensions. For more information, see the Photon OS Administration Guide.

The cache files for data and metadata reside in /var/cache/tdnf. The local cache is populated with data from the repository:

ls -l /var/cache/tdnf/photon
	total 8
	drwxr-xr-x 2 root root 4096 May 18 22:52 repodata
	d-wxr----t 3 root root 4096 May  3 22:51 rpms

You can clear the cache to help troubleshoot a problem, but doing so might slow the performance of tdnf until the cache becomes repopulated with data. Cleaning the cache can remove stale information. Clear the cache as follows:

tdnf clean all
	Cleaning repos: photon photon-extras photon-updates lightwave
	Cleaning up everything

Some tdnf commands can help you troubleshoot problems with packages:

  • makecache

    This command updates the cached binary metadata for all known repositories. You can run it after you clean the cache to make sure you are working with the latest repository data as you troubleshoot.

    Example:

    tdnf makecache
           	Refreshing metadata for: 'VMware Lightwave 1.0(x86_64)'
           	Refreshing metadata for: 'VMware Photon Linux 1.0(x86_64)Updates'
           	Refreshing metadata for: 'VMware Photon Extras 1.0(x86_64)'
           	Refreshing metadata for: 'VMware Photon Linux 1.0(x86_64)'
           	Metadata cache created.
    
  • tdnf check-local

    This command resolves dependencies by using the local RPMs to help check RPMs for quality assurance before publishing them. To check RPMs with this command, you must create a local directory and place your RPMs in it. The command, which includes no options, takes the path to the local directory containing the RPMs as its argument. The command does not, however, recursively parse directories; it checks the RPMs only in the directory that you specify.

    For example, after creating a directory named /tmp/myrpms and placing your RPMs in it, you can run the following command to check them:

      tdnf check-local /tmp/myrpms
      Checking all packages from: /tmp/myrpms
      Found 10 packages
      Check completed without issues
    
  • tdnf provides

    This command finds the packages that provide the package that you supply as an argument. If you are used to a package name for another system, you can use tdnf provides to find the corresponding name of the package on Photon OS.

    Example:

      tdnf provides docker
      docker-1.11.0-1.ph1.x86_64 : Docker
      Repo     : photon
      docker-1.11.0-1.ph1.x86_64 : Docker
      Repo     : @System
    

    For a file, you must provide the full path. Example:

    tdnf provides /usr/include/stdio.h glibc-devel-2.22-8.ph1.x86_64 : Header files for glibc Repo : photon glibc-devel-2.22-8.ph1.x86_64 : Header files for glibc Repo : @System

    The following example shows you how to find the package that provides a pluggable authentication module, which you might need to find if the system is mishandling passwords.

    tdnf provides /etc/pam.d/system-account
    	shadow-4.2.1-7.ph1.x86_64 : Programs for handling passwords in a secure way
    	Repo     : photon
    	shadow-4.2.1-8.ph1.x86_64 : Programs for handling passwords in a secure way
    	Repo     : photon-updates
    

    For more commands see the Photon OS Administration Guide.

If a package that is installed is not working, try re-installing it. Example:

```
tdnf reinstall shadow
	Reinstalling:
	shadow 	x86_64 	4.2.1-7.ph1   3.85 M
```

9 - Kernel Problems and Boot and Login Errors

Photon OS includes commands to troubleshoot kernel problems and boot and login errors.

9.1 - Kernel Overview

You can use dmesg command to troubleshooting kernel errors. The dmesg command prints messages from the kernel ring buffer.

The following command, for example, presents kernel messages in a human-readable format:

dmesg --human --kernel

To examine kernel messages as you perform actions, such as reproducing a problem, in another terminal, you can run the command with the --follow option, which waits for new messages and prints them as they occur:

dmesg --human --kernel --follow

The kernel buffer is limited in memory size. As a result, the kernel cyclically overwrites the end of the information in the buffer from which dmesg pulls information. The systemd journal, however, saves the information from the buffer to a log file so that you can access older information.

To view it, run the following command:

journalctl -k

If required, you can check the modules that are loaded on your Photon OS machine by running the lsmod command. For example:

lsmod
Module                  Size  Used by
vmw_vsock_vmci_transport    28672  1
vsock                  36864  2 vmw_vsock_vmci_transport
coretemp               16384  0
hwmon                  16384  1 coretemp
crc32c_intel           24576  0
hid_generic            16384  0
usbhid                 28672  0
hid                   106496  2 hid_generic,usbhid
xt_conntrack           16384  1
iptable_nat            16384  0
nf_conntrack_ipv4      16384  2
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
nf_nat                 24576  1 nf_nat_ipv4
iptable_filter         16384  1
ip_tables              24576  2 iptable_filter,iptable_nat

9.2 - Boot Process Overview

When a Photon OS machine boots, the BIOS initializes the hardware and uses a boot loader to start the kernel. After the kernel starts, systemd takes over and boots the rest of the operating system.

The BIOS checks the memory and initializes the keyboard, the screen, and other peripherals. When the BIOS finds the first hard disk, the boot loader–GNU GRUB 2.02–takes over. From the hard disk, GNU GRUB loads the master boot record (MBR) and initializes the root partition of the random-access memory by using initrd. The device manager, udev, provides initrd with the drivers it needs to access the device containing the root file system. Here’s what the GNU GRUB edit menu looks like in Photon OS with its default commands to load the boot record and initialize the RAM disk:

The GNU GRUB edit menu in the full and minimal versions of Photon OS

At this point, the Linux kernel in Photon OS, which is kernel version 4.4.8, takes control. Systemd kicks in, initializes services in parallel, mounts the rest of the file system, and checks the file system for errors.

9.3 - Blank Screen on Reboot

If the Photon OS kernel enters a state of panic during a reboot and all you see is a blank screen, note the name of the virtual machine running Photon OS and then power off the VM.

In the host, open the vmware.log file for the VM. When a kernel panics, the guest VM prints the entire kernel log in vmware.log in the host directory containing the VM. This log file contains the output of the dmesg command from the guest, and you can analyze it to help identify the cause of the boot problem.

Example

After searching for Guest: in the following abridged vmware.log, this line appears, identifying the root cause of the reboot problem:

```
2016-08-30T16:02:43.220-07:00| vcpu-0| I125: Guest: 
	<0>[1.125804] Kernel panic - not syncing: 
	VFS: Unable to mount root fs on unknown-block(0,0)
```

Further inspection finds the following lines:

2016-08-30T16:02:43.217-07:00| vcpu-0| I125: Guest: 
<4>[    1.125782] VFS: Cannot open root device "sdc1" or unknown-block(0,0): error -6
2016-08-30T16:02:43.217-07:00| vcpu-0| I125: Guest: 
<4>[    1.125783] Please append a correct "root=" boot option; 
here are the available partitions: 
2016-08-30T16:02:43.217-07:00| vcpu-0| I125: Guest: 
<4>[    1.125785] 0100            4096 ram0  (driver?)
...
0800         8388608 sda  driver: sd
2016-08-30T16:02:43.220-07:00| vcpu-0| I125: Guest: 
<4>[    1.125802]   0801         8384512 sda1 611e2d9a-a3da-4ac7-9eb9-8d09cb151a93
2016-08-30T16:02:43.220-07:00| vcpu-0| I125: Guest: 
<4>[    1.125803]   0802            3055 sda2 8159e59c-b382-40b9-9070-3c5586f3c7d6

In this unlikely case, the GRUB configuration points to a root device named sdc1 instead of the correct root device, sda1. You can resolve the problem by restoring the GRUB GNU edit screen and the GRUB configuration file (/boot/grub/grub.cfg) to their original configurations.

9.4 - Investigating Unexpected Behavior

If you rebooted to address unexpected behavior before the reboot or if you encountered unexpected behavior during the reboot but have reached the shell, you must analyze what happened since the previous boot.

  1. Run the following command to check the logs:

       journalctl
    
  2. Run the following command to look at what happened since the penultimate reboot:

    journalctl --boot=-1
    

    Look at the log from the reboot:

    journalctl -b
    
  3. If required, examine the logs for the kernel:

    journalctl -k
    
  4. Check which kernel is in use:

    uname -r
    

    As example for Photon OS 1.0, the kernel version in the full version is 4.4.8. The kernel version of in the OVA version is 4.4.8-esx. With the ESX version of the kernel, some services might not start.

  5. Run this command to check the overall status of services:

    systemctl status
    

    If a service is in red, check it:

    systemctl status service-name
    

    Start it if required:

    systemctl start service-name
    
  6. If looking at the journal and checking the status of services does not resolve your error, run the following systemd-analyze commands to examine the boot time and the speed with which services start.

    systemd-analyze time
    systemd-analyze blame
    systemd-analyze critical-chain
    

Note: The output of these commands might be misleading because one service might just be waiting for another service to finish initializing.

9.5 - Investigating the Guest Kernel

If a VM running Photon OS and an application or virtual appliance is behaving preventing you from logging in to the machine, you can troubleshoot by extracting the kernel logs from the guest’s memory and analyzing them with gdb.

This advanced troubleshooting method works when you are running Photon OS as the operating system for an application or appliance on VMware Workstation, Fusion, or ESXi. The procedure in this section assumes that the virtual machine running Photon OS is functioning normally.

The process to use this troubleshooting method varies by environment. The examples in this section assume that the troublesome Photon OS virtual machine is running in VMware Workstation 12 Pro on a Microsoft Windows 8 Enterprise host. The examples also use an additional, fully functional Photon OS virtual machine running in Workstation.

You can use other hosts, hypervisors, and operating systems–but you will have to adapt the example process below to them. Directory paths, file names, and other aspects might be different on other systems.

Prerequisites

Verify that you have the following resources:

  • Root access to a Linux machine other than the one you are troubleshooting. It can be another Photon OS machine, Ubuntu, or another Linux variant.
  • The vmss2core utility from VMware. It is installed by default in VMware Workstation and some other VMware products. If your system doesn’t already contain it, you can download it for free from https://labs.vmware.com/flings/vmss2core.
  • A local copy of the Photon OS ISO of the exact same version and release number as the Photon OS machine that you are troubleshooting.

Procedure Overview

The process to apply this troubleshooting method is as follows:

  • On a local computer, you open a file on the Photon OS ISO that contains Linux debugging information. Then you suspend the troublesome Photon OS VM and extract the kernel memory logs from the VMware hypervisor running Photon OS.
  • Next, you use the vmss2core tool to convert the memory logs into core dump files. The vmss2core utility converts VMware checkpoint state files into formats that third-party debugging tools understand. It can handle both suspend (.vmss) and snapshot (.vmsn) checkpoint state files (hereafter referred to as a vmss file) as well as monolithic and non-monolithic (separate .vmem file) encapsulation of checkpoint state data. See Debugging Virtual Machines with the Checkpoint to Core Tool.
  • Finally, you prepare to run the gdb tool by using the debug info file from the ISO to create a .gdbinit file, which you can then analyze with the gdb shell on your local Linux machine.

All three components must be in the same directory on a Linux machine.

Procedure

  1. Obtain a local copy of the Photon OS ISO of the exact same version and release number as the Photon OS machine that you are troubleshooting and mount the ISO on a Linux machine (or open it on a Windows machine):

    mount /mnt/cdrom
    
  2. Locate the following file. (If you opened the Photon OS ISO on a Windows computer, copy the following file to the root folder of a Linux machine.)

    /RPMS/x86_64/linux-debuginfo-4.4.8-6.ph1.x86_64.rpm
    
  3. On a Linux machine, run the following rpm2cpio command to convert the RPM file to a cpio file and to extract the contents of the RPM to the current directory:

    rpm2cpio /mnt/cdrom/RPMS/x86_64/linux-debuginfo-4.4.8-6.ph1.x86_64.rpm | cpio -idmv
    
  4. From the extracted files, copy the following file to your current directory:

    cp usr/lib/debug/lib/modules/4.4.8/vmlinux-4.4.8.debug
    
  5. Run the following command to download the dmesg functions that will help extract the kernel log from the coredump:

    wget https://www.kernel.org/doc/Documentation/kdump/gdbmacros.txt
    wget https://github.com/vmware/photon/blob/master/tools/scripts/gdbmacros-for-linux.txt
    
  6. Move the file as follows:

    mv gdbmacros-for-linux.txt .gdbinit
    
  7. Switch to your host machine so you can get the kernel memory files from the VM. Suspend the troublesome VM and locate the .vmss and .vmem files in the virtual machine’s directory on the host.

    Example:

    C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit (7)>dir
    	 Volume in drive C is Windows
    	 Directory of C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit
    	 (7)
    	09/20/2016  12:22 PM    <DIR>          .
    	09/20/2016  12:22 PM    <DIR>          ..
    	09/19/2016  03:39 PM       402,653,184 VMware Photon 64-bit (7)-f6b070cd.vmem
    	09/20/2016  12:11 PM         5,586,907 VMware Photon 64-bit (7)-f6b070cd.vmss
    	09/20/2016  12:11 PM     1,561,001,984 VMware Photon 64-bit (7)-s001.vmdk
    	...
    	09/20/2016  12:11 PM           300,430 vmware.log
    	...
    
  8. Now that you have located the .vmss and .vmem files, convert them to one or more core dump files by using the vmss2core tool that comes with Workstation. Here is an example of how to run the command. Be careful with your pathing, escaping, file names, and so forth–all of which might be different from this example on your Windows machine.

    
    	C:\Users\shoenisch\Documents\Virtual Machines\VMware Photon 64-bit (7)>C:\"Program Files (x86)\VMware\VMware Workstation"\vmss2core.exe "VMware Photon 64-bit (7)-f6b070cd.vmss" "VMware Photon 64-bit (7)-f6b070cd.vmem"
    
    The result of this command is one or more files with a `.core` extension plus a digit. Truncated example: 
    
    	C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit (7)>dir
    	 Directory of C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit(7)
    	09/20/2016  12:22 PM       729,706,496 vmss.core0
    
  9. Copy the .core file or files to the your current directory on the Linux machine where you so that you can analyze it with gdb.

    Run the following gdb command to enter the gdb shell attached to the memory core dump file. You might have to change the name of the vmss.core file in the example to match your .core file:

gdb vmlinux-4.4.8.debug vmss.core0

	GNU gdb (GDB) 7.8.2
	Copyright (C) 2014 Free Software Foundation, Inc.
	License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
	This is free software: you are free to change and redistribute it. 
	There is NO WARRANTY, to the extent permitted by law. ...
	Type "show configuration" for configuration details.
	For bug reporting instructions, please see:
	<http://www.gnu.org/software/gdb/bugs/>.
	Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>.
	For help, type "help".
	Type "apropos word" to search for commands related to "word"...
	Reading symbols from vmlinux-4.4.8.debug...done.
	warning: core file may not match specified executable file.
	[New LWP 12345]
	Core was generated by `GuestVM'.
	Program terminated with signal SIGSEGV, Segmentation fault.
	#0  0xffffffff813df39a in insb (count=0, addr=0xffffc90000144000, port=<optimized out>)
	    at arch/x86/include/asm/io.h:316
	316     arch/x86/include/asm/io.h: No such file or directory.
	(gdb)

Result

In the results above, the (gdb) of the last line is the prompt of the gdb shell. You can now analyze the core dump by using commands like bt, to perform a backtrace, and dmesg, to view the Photon OS kernel log and see Photon OS kernel error messages.

9.6 - Kernel Log Replication with VProbes

Replicating the Photon OS kernel logs on the VMware ESXi host is an advanced but powerful method of troubleshooting a kernel problem.

Replication Method

This method is applicable when the virtual machine running Photon OS is hanging or inaccessible because, for instance, the hard disk has failed.

As a prerequisite, you must have preemptively enabled the VMware VProbes facility on the VM before an error rendered it inaccessible. You must also create a VProbes script on the ESXi host, but you can do that after the error.

The method is useful in analyzing kernel issues when testing an application or appliance that is running on Photon OS.

There are two similar ways in which you can replicate the Photon OS kernel logs on ESXi by using VProbes.

  • The first modifies the VProbes script so that it works only for the VM that you set. It uses a hard-coded address.

  • The second uses an abstraction instead of a hard-coded address so that the same VProbes script can be used for any VM on an ESXi host that you have enabled for VProbe and copied its kernel symbol table (kallsyms) to ESXi.

For more information on VMware VProbes, see Archived VProbe Toolkit and the VProbes Programming Reference.

Using VProbes Script with a Hard-Coded Address

Perform the following steps to set a VProbe for an individual VM:

  1. Power off the VM so that you can turn on the VProbe facility.

    Edit the .vmx configuration file for the VM. The file resides in the directory that contains the VM in the ESXi data store. Add the following line of code to the .vmx file and then power the VM on:

     vprobe.enable = "TRUE"
    

    When you edit the .vmx file to add the above line of code, you must first turn off the VM–otherwise, your changes will not persist.

  2. Obtain the kernel log_store function address by connecting to the VM with SSH and running the following commands as root.

    Photon OS uses the kptr_restrict setting to place restrictions on the kernel addresses exposed through /proc and other interfaces. This setting hides exposed kernel pointers to prevent attackers from exploiting kernel write vulnerabilities. When you are done using VProbes, you should return kptr_restrict to the original setting of 2 by rebooting.)

     echo 0 > /proc/sys/kernel/kptr_restrict
     grep log_store /proc/kallsyms
    

    The output of the grep command will look similar to the following string. The first set of characters (without the t) is the log_store function address:

     ffffffff810bb680 t log_store
    
  3. Connect to the ESXi host with SSH so that you can create a VProbes script.

    Below is the template for the script. log_store in the first line is a placeholder for the VM’s log_store function address:

    GUEST:ENTER:log_store {
               string dst;
               getgueststr(dst, getguest(RSP+16) & 0xff, getguest(RSP+8));
               printf("%s\n", dst);
            }
    

    On the ESXi host, create a new file, add the template to it, and then change log_store to the function address that was the output from the grep command on the VM.

  4. Add a 0x prefix to the function address. In this example, the modified template looks like this:

    GUEST:ENTER:0xffffffff810bb680 {
           string dst;
           getgueststr(dst, getguest(RSP+16) & 0xff, getguest(RSP+8));
           printf("%s\n", dst);
        }
    
  5. Save your VProbes script as console.emt in the /tmp directory. (The file extension for VProbe scripts is .emt.)

    While still connected to the ESXi host with SSH, run the following command to obtain the ID of the virtual machine that you want to troubleshoot:

    vim-cmd vmsvc/getallvms

    This command lists all the VMs running on the ESXi host. Find the VM you want to troubleshoot in the list and make a note of its ID.

  6. Run the following command to print all the kernel messages from Photon OS in your SSH console; replace <VM ID> with the ID of your VM:

    vprobe -m <VM ID> /tmp/console.emt

    When you’re done, type Ctrl-C to stop the loop.

A Reusable VProbe Script Using the kallsyms File

Perform the following steps to create one VProbe script and use for all the VMs on your ESXi host.

  1. Power off the VM and turn on the VProbe facility on each VM that you want to be able to analyze.

    Add vprobe.enable = "TRUE" to the VM’s .vmx configuration file. See the instructions above.

  2. Power on the VM, connect to it with SSH, and run the following command as root:

    `echo 0 > /proc/sys/kernel/kptr_restrict`
    
  3. Connect to the ESXi host with SSH to create the following VProbes script and save it as /tmp/console.emt:

    GUEST:ENTER:log_store {
           string dst;
           getgueststr(dst, getguest(RSP+16) & 0xff, getguest(RSP+8));
           printf("%s\n", dst);
        }
    
  4. From the ESXi host, run the following command to copy the VM’s kallysms file to the tmp directory on the ESXi host:

    `scp root@<vm ip address>:/proc/kallsyms /tmp`
    

    While still connected to the ESXi host with SSH, run the following command to obtain the ID of the virtual machine that you want to troubleshoot:

     `vim-cmd vmsvc/getallvms`
    

    This command lists all the VMs running on the ESXi host. Find the VM you want to troubleshoot in the list and make a note of its ID.

  5. Run the following command to print all the kernel messages from Photon OS in your SSH console.

    Replace <VM ID> with the ID of your VM. When you’re done, type Ctrl-C to stop the loop.

    vprobe -m <VM ID> -k /tmp/kallysyms /tmp/console.emt

    You can use a directory other than tmp if you want.

9.7 - Linux Kernel

The Linux kernel is the main component of Photon OS and is the core interface between a computer’s hardware and its processes. It communicates between the two, managing resources as efficiently as possible.

##Kernel Flavours and Versions The following list contains the different Linux kernel flavours available:

  • linux - A generic kernel designed to run everywhere and support everything.
  • linux-esx - Optimized to run only on VMware hypervisor (ESXi, WS, Fusion). It has minimal set of device drivers to support VMware virtual devices. uname -r displays Linux . For additional features switch to the generic flavour.
  • linux-secure - Security hardened variant of the generic kernel. uname -r displays -secure suffix.
  • linux-rt - This is a Photon Real Time kernel. uname -r displays -rt suffix.
  • linux-aws - Optimized for AWS hypervisor kernel. uname -r displays -aws suffix.

To see the version of kernel installed, run the following command:

# rpm -qa | grep -e "^linux\(\|-esx\|-secure\|rt\|aws\)-[[:digit:]]"
linux-4.9.111-1.ph2.x86_64
linux-esx-4.9.111-1.ph2.x86_64

To see the version of the Kernel that is running currently, run the following command:

# uname -r
4.9.107-1.ph2-esx

From the output, you can see that the kernel running currently doesn’t match the installer. This happens when linux-* rpms were updated but was not restarted. Restart is required.

##Configuration

To find the configurations of the installed Kernel, check the /boot directory by running the following command:

# ls /boot/config-*
config-4.9.111-1.ph2 config-4.9.111-1.ph2-esx

To get a copy of the kernel configuration (Not all flavours support this feature), run the zcat /proc/config.gz command.

##Boot Parameters and initrd Several kernel flavors can be installed on the system, but only one is used during boot. /boot/photon.cfg symlink points to the kernel which is used for boot.

# ls -l /boot/photon.cfg
lrwxrwxrwx 1 root root 23 Jun 12  2018 /boot/photon.cfg -> linux-4.9.111-1.ph2.cfg

Its contents can be checked by running the following command:

# cat /boot/photon.cfg

# GRUB Environment Block

photon_cmdline=init=/lib/systemd/systemd ro loglevel=3 quiet no-vmw-sta

photon_linux=vmlinuz-4.9.111-1.ph2

photon_initrd=initrd.img-4.9.111-1.ph2

Where:

  • photon_cmdline - Kernel parameters. This list will be extended by values from /boot/systemd.cfg file and the values are hardcoded to /boot/grub2/grub.cfg file (For example: root=).
  • photon_linux - Kernel image to boot.
  • photon_initrd - Initrd to use at boot.

Parameters of the kernel loading currently can be found by running the /proc/cmdline command:

# cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-4.9.107-1.ph2-esx root=PARTUUID=29194d05-4a6e-4e0c-b1f4-5020e5e8472c net.ifnames=0 init=/lib/systemd/systemd ro loglevel=3 quiet no-vmw-sta

##Dmesg

To view message buffer of the kernel run the dmesg command.

##Sysctl State

To view a list of all active units run the systemctl list-units command.

##Kernel Statistics

The kernel statitics can be found by running the following commands:

  • procfs
  • sysfs
  • debugfs

##Kernel Modules

To view the kernel log buffer run the journalctl -k command.

To view a list of available kernel modules run the lsmod command.

To view detailed information about all connected PCI buses run the lspci command.

10 - Performance Issues

Performance issues can be difficult to troubleshoot because so many variables play a role in overall system performance. Interpreting performance data often depends on the context and the situation. To better identify and isolate variables and to gain insight into performance data, you can use the troubleshooting tools on Photon OS to diagnose the system.

10.1 - General Performance Guidelines

If you have no indication what the cause of a performance degradation might be, start by getting a high-level picture of the system’s state. Then look for signs in the data that might point to a cause.

Use the following guidelines to gain insight into performance data:

  • Start with the systemd journal.

  • The top tool can unmask problems caused by processes or applications overconsuming CPUs, time, or RAM. If the percent of CPU utilization is consistently high with little idle time, for example, there might be a runaway process. Restart it.

  • The netstat --statistics command can identify bottlenecks causing performance issues. It lists interface statistics for different protocols.

  • If top and netstat reveal no errors, run the strace ls -al to view every system call.

  • The watch command can help dynamically monitor a command to help troubleshoot performance issues:

    watch -n0 --differences <command>

    You can also combine watch with the vmstat command to dig deeper into statistics about virtual memory, processes, block input-output, disks, and CPU activity. Are there any bottlenecks?

  • You can use the dstat utility to see the live, running list of statistics about system resources.

  • The systemd-analyze reveals performance statistics for boot time and can help troubleshoot slow system boots and incorrect unit files.

The additional tools that you select depend on the clues that your initial investigation reveals. The following tools can also help troubleshoot performance: sysstat, sar, systemtap, and crash.

10.2 - Throughput Performance

Throughtput performance over TCP might be reduced.

This might occur because timestamps are enabled by default and the parameter net.ipv4.tcp_timestamps has a value of 1.

Setting a value of 1 or 2 for this parameter may impact performance. Setting a value of 0 or 2 for this parameter might cause a security vulnerability.