Monitoring Sun SPARC T1 and T2 systems (sun4v) with Hardware Sentry

How to monitor Sun SPARC T1 and T2 systems with Hardware Sentry KM.

Related Topics

This section covers the Sun Solaris systems running on an UltraSPARC T1 and T2 processor, including Sun Fire T2000 and all of its evolutions, also called “sun4v” systems.

About Sun SPARC T1 and T2 systems (sun4v)

Sun Solaris systems running on SPARC T1 and T2 processors are a new class of Sun servers (sun4v). They are actually very similar to the classic UltraSPARC (II, III, IV) architectures except for the processor which has been designed for massively parallel operations: Each SPARC T1 processors has 24 individual cores clocked at 1GHz.

Hardware Instrumentation

In-band: Solaris system utilities (prtdiag, psrinfo, kstat, etc.)

Unlike in the classic sun4u architecture, there is no Solaris built-in utility that provides information about the environment of the server (temperature sensors, voltage sensors, fans and power supplies): neither prtdiag nor prtpicl is able to report such information.

However, all of the other Solaris utilities used by Hardware Sentry in the sun4u architecture work the very same way in sun4v systems:

  • psrinfo for the processors
  • ifconfig, kstat and ndd (if required) for the network cards
  • iostat and dd for the internal disks

Some of these utilities require root privileges for proper execution: ndd and dd (in some cases). Please read carefully the section about sun4u servers to learn more about how Hardware Sentry deals with these utilities.

In-band: Sun Management Center (SMC)

The Sun Management Center agent (“SunMC” or “SMC” agent) actually uses Solaris system commands and logs to monitor the health of the server. As such, it doesn’t add any value to the existing. Even worse: the installation of the SMC environment requires deep knowledge of the SMC architecture and internals.
Hardware Sentry does not need and thus does not use the SMC agent to discover and monitor the hardware components of Sun Solaris systems (sun4v).

Out-of-band: Advanced Lights-Out Management card (ALOM)

Sun SPARC T1 and T2 systems are equipped with a management card called “Advanced Lights-Out Management” card (ALOM). This management card provides much detailed information about the internal hardware components of sun4v systems, including the environment (temperature sensors, voltage sensors, current sensors, fans, power supplies and internal hard drives).

The ALOM card is able to communicate through various protocols including telnet and SSH, which are supported by Hardware Sentry.

The ALOM card’s firmware needs to be updated to the latest version in order to get support for SSH.

In-band: Sun Explorer

When installed on sun4v systems, Sun Explorer’s package (SUNWexplo) comes with the snapshot utility which is able to communicate with the ALOM card “in-band” and thus retrieve various hardware information, like temperatures, voltages, currents, fans, power supplies and internal physical drives.
In order to communicate with the ALOM card “in-band”, Sun Explorer requires root privileges.

Setting up Hardware Sentry on Sun Solaris servers (sun4v with Sun Explorer)

Principle

Hardware Sentry will be installed on the managed system and will operate “in-band” only, i.e. there is no need for the ALOM card to be connected to the network.

Pre-requisites

  • The server is running Sun Solaris 9 or 10
  • Sun Explorer has been properly installed (SUNWexplo package) and the system has been rebooted after the installation
  • There is no need for the SMC agent to be installed

Installation procedure

  • Install the PATROL Agent on the server (versions 3.5.00 and upward are supported, version 3.7.00 minimum is recommended) if it has not been already done.
  • Install Hardware Sentry KM for PATROL on the server (this can be done at the same time as the PATROL Agent). Please follow the instructions of the Installation Guide of Hardware Sentry.

Configuration

Some system utilities used by Hardware Sentry require root privileges. To ensure that Hardware Sentry can use these utilities to discover and monitor the hardware components of a Sun server (sun4u), you can either configure Hardware Sentry to execute all of its external commands as root or configure it to use the sudo utility for a specified list of commands.

To configure Hardware Sentry to impersonate as root for all of its external commands, [right-click] on the main “Hardware” icon › [KM Commands] › [This System’s Settings] › [Connection, Credentials and Connectors…] and enter the root login and password in the first step of the wizard.

To configure Hardware Sentry to use sudo, follow the same procedure but click on the “Sudo options” in the first step of the wizard. Then select which commands Hardware Sentry will use sudo for.

The list of commands that will require root privileges depends on the platform being monitored:

  • /opt/SUNWexplo/bin/snapshot will be used to retrieve hardware information from the ALOM card “in-band”
  • cediag will be used on pre-Solaris 10 systems only if the SUNWcest package has been installed (cediag and cestat need to be added to the sudoers file)
  • dd will be used on all Solaris systems but requires root (actually “sys”) privileges only on pre-Solaris 10 systems. Pre-Solaris 10 systems can be configured not to require such privileges to access /dev/rdsk/cXtYdZsN devices.
  • ndd will be used on some models of Solaris systems equipped with network cards whose driver is dmfe, bge or e1000g. But this list is not exhaustive and also depends on the version of Solaris. The best way to know for sure if the ndd will be used on a Solaris system is to check whether Hardware Sentry is able to collect the LinkStatus parameter without root privileges.

Please note that the sudo utility must have been installed on the system and configured to allow the PATROL Agent’s default account to execute the selected commands as root. This can be done in the /etc/sudoers file.

Discovered components and monitored parameters

Connectors

When configured properly, the following connectors should be detected by Hardware Sentry:

  • Sun Solaris - ALOM-SC in-band (snapshot)
  • Sun Solaris - Processors (psrinfo)
  • Sun Solaris - Sun Disks
  • Sun Solaris - Network

Components and monitored parameters

In turn, the following components and parameters are discovered and monitored:

  • Server model
  • Temperature sensors, actual temperature if available and status
  • Fans, speed if available and status of each fan
  • Voltage sensors, actual voltage if available and status
  • Current sensors, status
  • Power supplies, status
  • Processors, type and frequency, status
  • Physical disks, vendor, size, serial number, error count and status
  • Network cards, vendor, model, connection speed, status, link status and error percentage

Troubleshooting

Hardware Sentry does not report any environment information

If Hardware Sentry doesn’t report any environment information (temperature, fans, power supplies, etc.) about your Sun T1/T2-based system, it probably means that:

  • The SUNWexplo package hasn’t been installed
  • The system hasn’t been rebooted since the installation of the SUNWexplo package (which in turn installs a new device driver to communicate with the ALOM card “in-band”)
  • Hardware Sentry hasn’t been configured to use the root account or use sudo for the snapshot command
  • The ALOM driver is locked up by another snapshot process; in this case, you will need to kill any remaining snapshot process.

If you cannot see the memory modules (or at least a general “Overall” instance), it means that:

  • The SUNWcest package hasn’t been installed (on pre-Solaris 10 systems)
  • Hardware Sentry hasn’t been configured to use the root account or use sudo for the cediag command
  • The server is running Solaris 10 and doesn’t report the status of memory modules individually. In this case, there is nothing that can be done to fix the problem. Sentry Software however plans to release a new connector based on the fmstat Solaris utility for Solaris 10 systems.

Internal disk cannot be seen

If you cannot see internal disks, it may be that your server has non-Sun disks which are excluded by default (this is done to ensure that Hardware Sentry only keeps internal disks). You can force Hardware Sentry to discover non-Sun disks by manually selecting the “Sun Solaris - Non-Sun Disks” through the graphical user interface of the KM. The potential drawback of this is that Hardware Sentry could take some external disks (in an external SAN array for example) as real physical disks and thus create dozens of instances for all of these disks.

Hardware Sentry reports an "Unknown" disk status

If Hardware Sentry reports the status of the disks as “Unknown”, it probably means that the access rights on the /dev/rdsk/cXtYdZsN device files don’t allow the PATROL Agent’s default account on read access and Hardware Sentry hasn’t been configured to execute external commands as root or use the sudo utility for the dd command.

Hardware Sentry reports a class for core of each processor

Hardware Sentry creates an instance of the MS_HW_PROCESSOR class for each of the cores in the SPARC T1 or T2 processors. In the console, it looks like if there were 24 individual processors while there is only one SPARC T1 processor. Since there is a status reported for each core of each processor, Hardware Sentry has been designed to report these status parameters for each core, instead of each physical processor package. This is a design choice and there is not much to do to correct it.

Setting up Hardware Sentry on Sun Solaris servers (sun4v with ALOM)

Principle

Hardware Sentry will be installed on the managed system. One instance of the KM will be configured to monitor the system “in-band” (processors, network cards and physical disks) and another instance of the KM will be configured to monitor the system “out-of-band”, i.e. by connecting to the ALOM card over the network and gather the environment information.

Pre-requisites

  • The server is running Sun Solaris 9 or 10
  • The ALOM card is configured to operate on the IP network and accept telnet or SSH connections. The Solaris server must be able to communicate with the ALOM card through TCP/IP
  • A Java Run-time Environment (JRE) version 1.4 or greater has been installed on the server
  • There is no need for the SMC agent to be installed

Installation procedure

  • Install the PATROL Agent on the server (versions 3.5.00 and upward are supported, version 3.7.00 minimum is recommended) if it has not been already done.
  • Install Hardware Sentry KM for PATROL on the server (this can be done at the same time as the PATROL Agent). Please follow the instructions of the Installation Guide of Hardware Sentry.

Configuration

By default, once installed, Hardware Sentry starts monitoring the server through the available Solaris utilities: this is the in-band part of the hardware monitoring. Please read carefully the previous section about sun4u systems to learn more about how to configure Hardware Sentry to properly monitor the processors, physical disks and network cards of Sun Solaris systems.

Then, you need to configure another “instance” of Hardware Sentry to monitor the hardware of the server through its ALOM card:

  • Right-click on the main Hardware icon › KM Commands › Add a Remote System to Monitor
  • Enter the name of the ALOM card and its IP address (the name is what will be shown in the PATROL Console). Specify “Out-of-band Management Card” for the element type and click Next.
  • Select “Sun Advanced Lights-Out Management (ALOM) card” in the connector list and click Next
  • Select telnet or SSH as the connection protocol (depending on what has been configured on the ALOM card) and enter valid credentials to connect to the management card. Click Next and Finish.

The “Hardware” icon is renamed “Hardware on localhost” and represents the in-band part of the hardware monitoring. This is the “local” instance of Hardware Sentry. Under this icon, you will find the hardware components as seen through the Solaris utilities.

Another “Hardware on ‹ALOM name›” icon is created which represents the out-of-band part of the hardware monitoring. Under this icon, you will find the hardware components as seen through the ALOM card.

Discovered components and monitored parameters

Connectors

When configured properly, the following connectors should be detected by Hardware Sentry for the in-band part (under the “Hardware on localhost” icon):

  • Sun Solaris - ALOM-SC in-band (snapshot)
  • Sun Solaris - Processors (psrinfo)
  • Sun Solaris - Sun Disks
  • Sun Solaris - Network

The following connector also should appear under the “Hardware on ‹ALOM name›” icon:

  • “Sun Advanced Lights-Out Management (ALOM) card”

Components and monitored parameters

In turn, the following components and parameters are discovered and monitored:

  • Server model
  • Temperature sensors, actual temperature if available and status
  • Fans, speed if available and status of each fan
  • Voltage sensors, actual voltage if available and status
  • Current sensors, status
  • Power supplies, status
  • Processors, type and frequency, status
  • Physical disks, vendor, size, serial number, error count and status
  • Network cards, vendor, model, connection speed, status, link status and error percentage

Troubleshooting

Check the previous section about troubleshooting sun4u systems regarding the processors, internal disks and network cards.

Prompted for the path to Java

If the “Add a Remote System to Monitor” wizard asks you about the path to Java, it means that Hardware Sentry was not able to find a Java Runtime Environment by itself. Java is required for telnet and SSH connections, therefore Java (JRE 1.4.00 or higher) must be installed on the server. Specify the path to the java bin directory when asked about it.

This path can be changed later by setting the /SENTRY/HARDWARE/javaPath configuration variable in the PATROL Agent’s configuration (with WPCONFIG, xpconfig or PCM).

The ALOM card connector icon triggers an alarm

If the “Sun Advanced Lights-Out Management (ALOM) card” connector icon triggers an alarm, it could mean that:

  • The Java environment hasn’t been properly configured in Hardware Sentry
  • The telnet or SSH protocols haven’t been enabled on the ALOM card
  • Hardware Sentry is using telnet to connect to the ALOM card while only SSH is enabled
  • The user credentials are not valid on the ALOM card (you can check that very easily by connecting to the ALOM card manually with the telnet or ssh utility)

Hardware Sentry reports a class for core of each processor

Hardware Sentry creates an instance of the MS_HW_PROCESSOR class for each of the cores in the SPARC T1 or T2 processors. In the console, it looks like if there were 24 individual processors while there is only one SPARC T1 processor. Since there is a status reported for each core of each processor, Hardware Sentry has been designed to report these status parameters for each core, instead of each physical processor package. This is a design choice and there is not much to do to correct it.