1 - Kernel Overview

You can use dmesg command to troubleshooting kernel errors. The dmesg command prints messages from the kernel ring buffer.

The following command, for example, presents kernel messages in a human-readable format:

dmesg --human --kernel

To examine kernel messages as you perform actions, such as reproducing a problem, in another terminal, you can run the command with the --follow option, which waits for new messages and prints them as they occur:

dmesg --human --kernel --follow

The kernel buffer is limited in memory size. As a result, the kernel cyclically overwrites the end of the information in the buffer from which dmesg pulls information. The systemd journal, however, saves the information from the buffer to a log file so that you can access older information.

To view it, run the following command:

journalctl -k

If required, you can check the modules that are loaded on your Photon OS machine by running the lsmod command. For example:

lsmod
Module                  Size  Used by
vmw_vsock_vmci_transport    28672  1
vsock                  36864  2 vmw_vsock_vmci_transport
coretemp               16384  0
hwmon                  16384  1 coretemp
crc32c_intel           24576  0
hid_generic            16384  0
usbhid                 28672  0
hid                   106496  2 hid_generic,usbhid
xt_conntrack           16384  1
iptable_nat            16384  0
nf_conntrack_ipv4      16384  2
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
nf_nat                 24576  1 nf_nat_ipv4
iptable_filter         16384  1
ip_tables              24576  2 iptable_filter,iptable_nat

2 - Boot Process Overview

When a Photon OS machine boots, the BIOS initializes the hardware and uses a boot loader to start the kernel. After the kernel starts, systemd takes over and boots the rest of the operating system.

The BIOS checks the memory and initializes the keyboard, the screen, and other peripherals. When the BIOS finds the first hard disk, the boot loader–GNU GRUB 2.02–takes over. From the hard disk, GNU GRUB loads the master boot record (MBR) and initializes the root partition of the random-access memory by using initrd. The device manager, udev, provides initrd with the drivers it needs to access the device containing the root file system. Here’s what the GNU GRUB edit menu looks like in Photon OS with its default commands to load the boot record and initialize the RAM disk:

The GNU GRUB edit menu in the full and minimal versions of Photon OS

At this point, the Linux kernel in Photon OS, which is kernel version 4.4.8, takes control. Systemd kicks in, initializes services in parallel, mounts the rest of the file system, and checks the file system for errors.

3 - Blank Screen on Reboot

If the Photon OS kernel enters a state of panic during a reboot and all you see is a blank screen, note the name of the virtual machine running Photon OS and then power off the VM.

In the host, open the vmware.log file for the VM. When a kernel panics, the guest VM prints the entire kernel log in vmware.log in the host directory containing the VM. This log file contains the output of the dmesg command from the guest, and you can analyze it to help identify the cause of the boot problem.

Example

After searching for Guest: in the following abridged vmware.log, this line appears, identifying the root cause of the reboot problem:

```
2016-08-30T16:02:43.220-07:00| vcpu-0| I125: Guest: 
	<0>[1.125804] Kernel panic - not syncing: 
	VFS: Unable to mount root fs on unknown-block(0,0)
```

Further inspection finds the following lines:

2016-08-30T16:02:43.217-07:00| vcpu-0| I125: Guest: 
<4>[    1.125782] VFS: Cannot open root device "sdc1" or unknown-block(0,0): error -6
2016-08-30T16:02:43.217-07:00| vcpu-0| I125: Guest: 
<4>[    1.125783] Please append a correct "root=" boot option; 
here are the available partitions: 
2016-08-30T16:02:43.217-07:00| vcpu-0| I125: Guest: 
<4>[    1.125785] 0100            4096 ram0  (driver?)
...
0800         8388608 sda  driver: sd
2016-08-30T16:02:43.220-07:00| vcpu-0| I125: Guest: 
<4>[    1.125802]   0801         8384512 sda1 611e2d9a-a3da-4ac7-9eb9-8d09cb151a93
2016-08-30T16:02:43.220-07:00| vcpu-0| I125: Guest: 
<4>[    1.125803]   0802            3055 sda2 8159e59c-b382-40b9-9070-3c5586f3c7d6

In this unlikely case, the GRUB configuration points to a root device named sdc1 instead of the correct root device, sda1. You can resolve the problem by restoring the GRUB GNU edit screen and the GRUB configuration file (/boot/grub/grub.cfg) to their original configurations.

4 - Investigating Unexpected Behavior

If you rebooted to address unexpected behavior before the reboot or if you encountered unexpected behavior during the reboot but have reached the shell, you must analyze what happened since the previous boot.

  1. Run the following command to check the logs:

       journalctl
    
  2. Run the following command to look at what happened since the penultimate reboot:

    journalctl --boot=-1
    

    Look at the log from the reboot:

    journalctl -b
    
  3. If required, examine the logs for the kernel:

    journalctl -k
    
  4. Check which kernel is in use:

    uname -r
    

    As example for Photon OS 1.0, the kernel version in the full version is 4.4.8. The kernel version of in the OVA version is 4.4.8-esx. With the ESX version of the kernel, some services might not start.

  5. Run this command to check the overall status of services:

    systemctl status
    

    If a service is in red, check it:

    systemctl status service-name
    

    Start it if required:

    systemctl start service-name
    
  6. If looking at the journal and checking the status of services does not resolve your error, run the following systemd-analyze commands to examine the boot time and the speed with which services start.

    systemd-analyze time
    systemd-analyze blame
    systemd-analyze critical-chain
    

Note: The output of these commands might be misleading because one service might just be waiting for another service to finish initializing.

5 - Investigating the Guest Kernel

If a VM running Photon OS and an application or virtual appliance is behaving preventing you from logging in to the machine, you can troubleshoot by extracting the kernel logs from the guest’s memory and analyzing them with gdb.

This advanced troubleshooting method works when you are running Photon OS as the operating system for an application or appliance on VMware Workstation, Fusion, or ESXi. The procedure in this section assumes that the virtual machine running Photon OS is functioning normally.

The process to use this troubleshooting method varies by environment. The examples in this section assume that the troublesome Photon OS virtual machine is running in VMware Workstation 12 Pro on a Microsoft Windows 8 Enterprise host. The examples also use an additional, fully functional Photon OS virtual machine running in Workstation.

You can use other hosts, hypervisors, and operating systems–but you will have to adapt the example process below to them. Directory paths, file names, and other aspects might be different on other systems.

Prerequisites

Verify that you have the following resources:

  • Root access to a Linux machine other than the one you are troubleshooting. It can be another Photon OS machine, Ubuntu, or another Linux variant.
  • The vmss2core utility from VMware. It is installed by default in VMware Workstation and some other VMware products. If your system doesn’t already contain it, you can download it for free from https://labs.vmware.com/flings/vmss2core.
  • A local copy of the Photon OS ISO of the exact same version and release number as the Photon OS machine that you are troubleshooting.

Procedure Overview

The process to apply this troubleshooting method is as follows:

  • On a local computer, you open a file on the Photon OS ISO that contains Linux debugging information. Then you suspend the troublesome Photon OS VM and extract the kernel memory logs from the VMware hypervisor running Photon OS.
  • Next, you use the vmss2core tool to convert the memory logs into core dump files. The vmss2core utility converts VMware checkpoint state files into formats that third-party debugging tools understand. It can handle both suspend (.vmss) and snapshot (.vmsn) checkpoint state files (hereafter referred to as a vmss file) as well as monolithic and non-monolithic (separate .vmem file) encapsulation of checkpoint state data. See Debugging Virtual Machines with the Checkpoint to Core Tool.
  • Finally, you prepare to run the gdb tool by using the debug info file from the ISO to create a .gdbinit file, which you can then analyze with the gdb shell on your local Linux machine.

All three components must be in the same directory on a Linux machine.

Procedure

  1. Obtain a local copy of the Photon OS ISO of the exact same version and release number as the Photon OS machine that you are troubleshooting and mount the ISO on a Linux machine (or open it on a Windows machine):

    mount /mnt/cdrom
    
  2. Locate the following file. (If you opened the Photon OS ISO on a Windows computer, copy the following file to the root folder of a Linux machine.)

    /RPMS/x86_64/linux-debuginfo-4.4.8-6.ph1.x86_64.rpm
    
  3. On a Linux machine, run the following rpm2cpio command to convert the RPM file to a cpio file and to extract the contents of the RPM to the current directory:

    rpm2cpio /mnt/cdrom/RPMS/x86_64/linux-debuginfo-4.4.8-6.ph1.x86_64.rpm | cpio -idmv
    
  4. From the extracted files, copy the following file to your current directory:

    cp usr/lib/debug/lib/modules/4.4.8/vmlinux-4.4.8.debug
    
  5. Run the following command to download the dmesg functions that will help extract the kernel log from the coredump:

    wget https://www.kernel.org/doc/Documentation/kdump/gdbmacros.txt
    wget https://github.com/vmware/photon/blob/master/tools/scripts/gdbmacros-for-linux.txt
    
  6. Move the file as follows:

    mv gdbmacros-for-linux.txt .gdbinit
    
  7. Switch to your host machine so you can get the kernel memory files from the VM. Suspend the troublesome VM and locate the .vmss and .vmem files in the virtual machine’s directory on the host.

    Example:

    C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit (7)>dir
    	 Volume in drive C is Windows
    	 Directory of C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit
    	 (7)
    	09/20/2016  12:22 PM    <DIR>          .
    	09/20/2016  12:22 PM    <DIR>          ..
    	09/19/2016  03:39 PM       402,653,184 VMware Photon 64-bit (7)-f6b070cd.vmem
    	09/20/2016  12:11 PM         5,586,907 VMware Photon 64-bit (7)-f6b070cd.vmss
    	09/20/2016  12:11 PM     1,561,001,984 VMware Photon 64-bit (7)-s001.vmdk
    	...
    	09/20/2016  12:11 PM           300,430 vmware.log
    	...
    
  8. Now that you have located the .vmss and .vmem files, convert them to one or more core dump files by using the vmss2core tool that comes with Workstation. Here is an example of how to run the command. Be careful with your pathing, escaping, file names, and so forth–all of which might be different from this example on your Windows machine.

    
    	C:\Users\shoenisch\Documents\Virtual Machines\VMware Photon 64-bit (7)>C:\"Program Files (x86)\VMware\VMware Workstation"\vmss2core.exe "VMware Photon 64-bit (7)-f6b070cd.vmss" "VMware Photon 64-bit (7)-f6b070cd.vmem"
    
    The result of this command is one or more files with a `.core` extension plus a digit. Truncated example: 
    
    	C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit (7)>dir
    	 Directory of C:\Users\tester\Documents\Virtual Machines\VMware Photon 64-bit(7)
    	09/20/2016  12:22 PM       729,706,496 vmss.core0
    
  9. Copy the .core file or files to the your current directory on the Linux machine where you so that you can analyze it with gdb.

    Run the following gdb command to enter the gdb shell attached to the memory core dump file. You might have to change the name of the vmss.core file in the example to match your .core file:

gdb vmlinux-4.4.8.debug vmss.core0

	GNU gdb (GDB) 7.8.2
	Copyright (C) 2014 Free Software Foundation, Inc.
	License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
	This is free software: you are free to change and redistribute it. 
	There is NO WARRANTY, to the extent permitted by law. ...
	Type "show configuration" for configuration details.
	For bug reporting instructions, please see:
	<http://www.gnu.org/software/gdb/bugs/>.
	Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>.
	For help, type "help".
	Type "apropos word" to search for commands related to "word"...
	Reading symbols from vmlinux-4.4.8.debug...done.
	warning: core file may not match specified executable file.
	[New LWP 12345]
	Core was generated by `GuestVM'.
	Program terminated with signal SIGSEGV, Segmentation fault.
	#0  0xffffffff813df39a in insb (count=0, addr=0xffffc90000144000, port=<optimized out>)
	    at arch/x86/include/asm/io.h:316
	316     arch/x86/include/asm/io.h: No such file or directory.
	(gdb)

Result

In the results above, the (gdb) of the last line is the prompt of the gdb shell. You can now analyze the core dump by using commands like bt, to perform a backtrace, and dmesg, to view the Photon OS kernel log and see Photon OS kernel error messages.

6 - Kernel Log Replication with VProbes

Replicating the Photon OS kernel logs on the VMware ESXi host is an advanced but powerful method of troubleshooting a kernel problem.

Replication Method

This method is applicable when the virtual machine running Photon OS is hanging or inaccessible because, for instance, the hard disk has failed.

As a prerequisite, you must have preemptively enabled the VMware VProbes facility on the VM before an error rendered it inaccessible. You must also create a VProbes script on the ESXi host, but you can do that after the error.

The method is useful in analyzing kernel issues when testing an application or appliance that is running on Photon OS.

There are two similar ways in which you can replicate the Photon OS kernel logs on ESXi by using VProbes.

  • The first modifies the VProbes script so that it works only for the VM that you set. It uses a hard-coded address.

  • The second uses an abstraction instead of a hard-coded address so that the same VProbes script can be used for any VM on an ESXi host that you have enabled for VProbe and copied its kernel symbol table (kallsyms) to ESXi.

For more information on VMware VProbes, see Archived VProbe Toolkit and the VProbes Programming Reference.

Using VProbes Script with a Hard-Coded Address

Perform the following steps to set a VProbe for an individual VM:

  1. Power off the VM so that you can turn on the VProbe facility.

    Edit the .vmx configuration file for the VM. The file resides in the directory that contains the VM in the ESXi data store. Add the following line of code to the .vmx file and then power the VM on:

     vprobe.enable = "TRUE"
    

    When you edit the .vmx file to add the above line of code, you must first turn off the VM–otherwise, your changes will not persist.

  2. Obtain the kernel log_store function address by connecting to the VM with SSH and running the following commands as root.

    Photon OS uses the kptr_restrict setting to place restrictions on the kernel addresses exposed through /proc and other interfaces. This setting hides exposed kernel pointers to prevent attackers from exploiting kernel write vulnerabilities. When you are done using VProbes, you should return kptr_restrict to the original setting of 2 by rebooting.)

     echo 0 > /proc/sys/kernel/kptr_restrict
     grep log_store /proc/kallsyms
    

    The output of the grep command will look similar to the following string. The first set of characters (without the t) is the log_store function address:

     ffffffff810bb680 t log_store
    
  3. Connect to the ESXi host with SSH so that you can create a VProbes script.

    Below is the template for the script. log_store in the first line is a placeholder for the VM’s log_store function address:

    GUEST:ENTER:log_store {
               string dst;
               getgueststr(dst, getguest(RSP+16) & 0xff, getguest(RSP+8));
               printf("%s\n", dst);
            }
    

    On the ESXi host, create a new file, add the template to it, and then change log_store to the function address that was the output from the grep command on the VM.

  4. Add a 0x prefix to the function address. In this example, the modified template looks like this:

    GUEST:ENTER:0xffffffff810bb680 {
           string dst;
           getgueststr(dst, getguest(RSP+16) & 0xff, getguest(RSP+8));
           printf("%s\n", dst);
        }
    
  5. Save your VProbes script as console.emt in the /tmp directory. (The file extension for VProbe scripts is .emt.)

    While still connected to the ESXi host with SSH, run the following command to obtain the ID of the virtual machine that you want to troubleshoot:

    vim-cmd vmsvc/getallvms

    This command lists all the VMs running on the ESXi host. Find the VM you want to troubleshoot in the list and make a note of its ID.

  6. Run the following command to print all the kernel messages from Photon OS in your SSH console; replace <VM ID> with the ID of your VM:

    vprobe -m <VM ID> /tmp/console.emt

    When you’re done, type Ctrl-C to stop the loop.

A Reusable VProbe Script Using the kallsyms File

Perform the following steps to create one VProbe script and use for all the VMs on your ESXi host.

  1. Power off the VM and turn on the VProbe facility on each VM that you want to be able to analyze.

    Add vprobe.enable = "TRUE" to the VM’s .vmx configuration file. See the instructions above.

  2. Power on the VM, connect to it with SSH, and run the following command as root:

    `echo 0 > /proc/sys/kernel/kptr_restrict`
    
  3. Connect to the ESXi host with SSH to create the following VProbes script and save it as /tmp/console.emt:

    GUEST:ENTER:log_store {
           string dst;
           getgueststr(dst, getguest(RSP+16) & 0xff, getguest(RSP+8));
           printf("%s\n", dst);
        }
    
  4. From the ESXi host, run the following command to copy the VM’s kallysms file to the tmp directory on the ESXi host:

    `scp root@<vm ip address>:/proc/kallsyms /tmp`
    

    While still connected to the ESXi host with SSH, run the following command to obtain the ID of the virtual machine that you want to troubleshoot:

     `vim-cmd vmsvc/getallvms`
    

    This command lists all the VMs running on the ESXi host. Find the VM you want to troubleshoot in the list and make a note of its ID.

  5. Run the following command to print all the kernel messages from Photon OS in your SSH console.

    Replace <VM ID> with the ID of your VM. When you’re done, type Ctrl-C to stop the loop.

    vprobe -m <VM ID> -k /tmp/kallysyms /tmp/console.emt

    You can use a directory other than tmp if you want.

7 - Linux Kernel

The Linux kernel is the main component of Photon OS and is the core interface between a computer’s hardware and its processes. It communicates between the two, managing resources as efficiently as possible.

##Kernel Flavours and Versions The following list contains the different Linux kernel flavours available:

  • linux - A generic kernel designed to run everywhere and support everything.
  • linux-esx - Optimized to run only on VMware hypervisor (ESXi, WS, Fusion). It has minimal set of device drivers to support VMware virtual devices. uname -r displays Linux . For additional features switch to the generic flavour.
  • linux-secure - Security hardened variant of the generic kernel. uname -r displays -secure suffix.
  • linux-rt - This is a Photon Real Time kernel. uname -r displays -rt suffix.
  • linux-aws - Optimized for AWS hypervisor kernel. uname -r displays -aws suffix.

To see the version of kernel installed, run the following command:

# rpm -qa | grep -e "^linux\(\|-esx\|-secure\|rt\|aws\)-[[:digit:]]"
linux-4.9.111-1.ph2.x86_64
linux-esx-4.9.111-1.ph2.x86_64

To see the version of the Kernel that is running currently, run the following command:

# uname -r
4.9.107-1.ph2-esx

From the output, you can see that the kernel running currently doesn’t match the installer. This happens when linux-* rpms were updated but was not restarted. Restart is required.

##Configuration

To find the configurations of the installed Kernel, check the /boot directory by running the following command:

# ls /boot/config-*
config-4.9.111-1.ph2 config-4.9.111-1.ph2-esx

To get a copy of the kernel configuration (Not all flavours support this feature), run the zcat /proc/config.gz command.

##Boot Parameters and initrd Several kernel flavors can be installed on the system, but only one is used during boot. /boot/photon.cfg symlink points to the kernel which is used for boot.

# ls -l /boot/photon.cfg
lrwxrwxrwx 1 root root 23 Jun 12  2018 /boot/photon.cfg -> linux-4.9.111-1.ph2.cfg

Its contents can be checked by running the following command:

# cat /boot/photon.cfg

# GRUB Environment Block

photon_cmdline=init=/lib/systemd/systemd ro loglevel=3 quiet no-vmw-sta

photon_linux=vmlinuz-4.9.111-1.ph2

photon_initrd=initrd.img-4.9.111-1.ph2

Where:

  • photon_cmdline - Kernel parameters. This list will be extended by values from /boot/systemd.cfg file and the values are hardcoded to /boot/grub2/grub.cfg file (For example: root=).
  • photon_linux - Kernel image to boot.
  • photon_initrd - Initrd to use at boot.

Parameters of the kernel loading currently can be found by running the /proc/cmdline command:

# cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-4.9.107-1.ph2-esx root=PARTUUID=29194d05-4a6e-4e0c-b1f4-5020e5e8472c net.ifnames=0 init=/lib/systemd/systemd ro loglevel=3 quiet no-vmw-sta

##Dmesg

To view message buffer of the kernel run the dmesg command.

##Sysctl State

To view a list of all active units run the systemctl list-units command.

##Kernel Statistics

The kernel statitics can be found by running the following commands:

  • procfs
  • sysfs
  • debugfs

##Kernel Modules

To view the kernel log buffer run the journalctl -k command.

To view a list of available kernel modules run the lsmod command.

To view detailed information about all connected PCI buses run the lspci command.