Monitoring IBM AIX Servers with Hardware Sentry

How to monitor IBM AIX Servers with Hardware Sentry KM.

Related Topics

This document covers the IBM AIX servers running on the PowerPC processor architecture. This includes:

  • IBM RS/6000
  • IBM pSeries
  • IBM eServer p5
  • IBM eServer p6
  • IBM eServer p7

This guide partially covers the IBM Regatta product line, as well as the IBM pSeries p690 and eServer p5 595, also known as IBM UNIX mainframes.

About IBM AIX servers

Since 1990, AIX has served as the primary operating system for the RS/6000 series (later renamed IBM eServer pSeries, then IBM System p, and now IBM Power Systems). Hardware Sentry KM supports AIX versions from 4.2.x.

The internal parts of IBM AIX servers have always been a basic set of:

  • PowerPC processors
  • Memory modules
  • Ethernet cards
  • Fiber cards
  • SCSI disk controllers

A few of these servers also included a RAID adapter (IBM SSA RAID controllers).

Hardware instrumentation

In-band: AIX system utilities (lsdev, dd, entstat, machstat, etc.)

The IBM AIX operating system comes with several command line utilities that provide useful information about the underlying hardware. However, it is important to note that AIX does not offer any way to retrieve the actual value of environment sensors in the system.

The following utilities are used by Hardware Sentry KM to discover, monitor, and process the various hardware components of IBM AIX systems:

  • uname, prtconf, lsdev, lscfg (general device discovery, status of processors and memory modules)
  • entstat (network card discovery and status)
  • fcstat (HBA discovery and status)
  • uesensor (environment on a few IBM pSeries systems)
  • bootinfo, machstat (environment on CHRP systems)
  • dd, errpt, lspv (disk monitoring)
  • awk, tail, head Bootinfo, machstat and dd require sudo / root access.

Setting up Hardware Sentry on IBM AIX servers

Pre-requisites

The server must be running IBM AIX 4.x or above

Configuration

Some system utilities used by Hardware Sentry require root privileges. To ensure that Hardware Sentry can use these utilities to discover and monitor the hardware components of an IBM AIX server, you can either configure Hardware Sentry to execute all of its external commands as root or configure it to use the sudo utility for a specified list of commands.

The list of commands that will require root privileges is the following:

  • /usr/sbin/bootinfo (on CHRP systems, i.e. most of AIX 5.x and 6.x systems)
  • /usr/sbin/machstat (on CHRP systems)
  • /usr/bin/dd

Please note that the sudo utility must have been installed on the system and configured to allow the PATROL Agent’s default account to execute the selected commands as root. This can be done in the /etc/sudoers file.

In Monitoring Studio X

To add the monitoring of a new AIX system in Monitoring Studio X:

  1. Log in to the Monitoring Studio X Web UI

Accessing the Monitoring Studio X Web Interface

  1. Go to KMs > Hardware Sentry

Configuring Hardware Sentry in Monitoring Studio X

  1. Click Monitored Systems… > New System…

List of Systems Monitored by Hardware Sentry

  1. Enter the hostname or IP address of the IBM AIX server to be monitored

Configuring the monitoring of IBM AIX servers

  1. In the System Properties section, select IBM AIX

Selecting IBM AIX as the device type

  1. Enable SSH and provide the root login and password.

Enabling SSH for IBM AIX Server monitoring

  1. Scroll-down to the Connectors section and select Automatic

Selecting the automatic connector detection

  1. Click Create. Click Console to check the hardware health status of your IBM AIX server:

    Monitoring IBM AIX servers in Monitoring Studio X

In Truesight

To add a new AIX system in Truesight, specify the following in the infrastructure policy’s monitoring configuration.

  1. The Device Type should be set to IBM AIX.
  2. Credentials for the device should be entered in the SSH section of the configuration.
  3. Check the box under Sudo Options labelled Use When Root Privileges are Needed, if root credentials have not been specified.

TrueSight Device Configuration

  1. Click Save. After a few minutes, your device will be displayed in TrueSight, under Monitoring > Hardware Devices.

Monitoring IBM AIX servers in TrueSight Presentation Server (Hardware Devices View)

  1. Click the IBM AIX server to access its details:

    Monitoring IBM AIX servers in TrueSight Presentation Server

Troubleshooting

If Hardware Sentry KM does not seem to monitor the power supplies and fans of IBM AIX 5.x or later systems, it probably means that you have not configured the product with the root account or sudo as explained above.

It is normal not to have distinctive instances for each sensor, power supplies and fans. IBM AIX systems are not able to report the status of the environment with a per-sensor granularity. You only get a general “System cooling” instance, and a general “System power” instance. These objects will trigger a warning or an alarm when a fan or a power supply fails or when the temperature goes too high.

If Hardware Sentry reports the status of the disks as “Unknown”, it is likely that the access rights on the /dev/hdiskN device files do not allow the PATROL Agent’s default account on read access and Hardware Sentry has not been configured to execute external commands as root or use the sudo utility for the dd command.

Discovered components and monitored parameters

When configured properly, the following connectors should be automatically selected by Hardware Sentry in order to monitor an IBM AIX server:

  • IBM AIX - Common
  • IBM AIX - CHRP Environment
  • IBM AIX - SCSI disks
  • IBM AIX - Environment (uesensor) (only on a few pSeries servers)

In turn, the following components and parameters are discovered and monitored:

  • Server model
  • Overall cooling status
  • Overall powering status
  • Memory modules, size, status, error count
  • Processors, type and frequency, status
  • Physical disks, vendor, size, serial number, error count and status
  • Network cards, vendor, model, status, link status, speed and duplex, input and output (bytes, packets and error percentage), bandwidth utilization
  • HBA, model, WWN, serial number, device type, bandwidth, link status, errorcount, total packets.

VIO Servers

Warning! Because installing third-party software invalidates support from IBM, it is highly recommended to perform hardware monitoring from a remote PATROL Agent.

Principles

On IBM pSeries systems partitioned in several LPARs, the monitoring of the hardware needs to be configured in a specific way. Typically, one of the LPARs is dedicated to the processing of the I/Os and is called the VIO Server.

The monitoring of CPU, Overall Cooling Status, Overall Power Status and Memory is done at the LPAR level. As each LPAR is only able to see components that have been exclusively dedicated to it or those components that are being shared with another LPAR, you will need to monitor several LPARs to be able to see all components.

The monitoring of Network Cards, Physical disks and HBAs requires access to the VIO. The LPARs are only able to see the virtual versions of these components. To get their real status we require access to the VIOs.

Configuration

As LPARs can be dynamically allocated / de-allocated server components / resources, we recommend turning off missing device detection when monitoring LPARs.

  • In TrueSight, this setting is within the Monitoring Policy > Monitoring Configuration > Missing Device Detection. Missing device detection can be left on for the VIO Server.
  • In Monitoring Studio X, the Missing Device Detection setting is available through KMs > Hardware Sentry > KM Settings > Parameter Alerts

Turning off missing detection in Monitoring Studio X when monitoring LPARs

System utilities used by Hardware Sentry when monitoring the VIO require padmin privileges. To ensure that Hardware Sentry can use these utilities to discover and monitor the hardware components of an IBM AIX server, you can either configure Hardware Sentry to execute all external commands as padmin, or use a user account with equal permission settings.

Using Hardware Management Console

If an IBM Hardware Management Console is available, AIX systems can instead be monitored through the HMC. The status of the System Attention LED of all IBM AIX servers connected to the Hardware Management Console will be reported. An ALARM alert will be triggered for a hardware problem based on the LED status. These alarms are automatically cleared at midnight of the day they occur.

This is a less-detailed form of monitoring for these systems as it reports only LED status and system presence, but may be easier to configure as it requires only monitoring one device, the Hardware Management Console. This would be monitored as a Linux system.