Extracting information from the cluster

Next: The Client Side Up: The Graphical User Interface Previous: Remote Method Invocation Contents

Extracting information from the cluster

The "Monitor" class is run on the cluster. This class is a thread which calls two other classes every five seconds. The first class, "Status", checks to see which nodes are not operational. It does this by calling the pbsnodes -l command. This command outputs a two columned list. The left-hand column is a list of all nodes that are "down", the right hand side gives their exact status. An example of this command running is the following;

$\begin{lstlisting}[frame=trbl,caption=Sample pbsnodes -l output]{} poprad state-unknown,down peanut state-unknown,down \end{lstlisting}$

All of the command-line programs that are called and parsed in this project, are called in the same way. This involves instantiating a Runtime object, and calling it's exec method with the command name as a parameter. The output of this command is captured in an ArrayList object using an InputStream object. The following is a code extract from "Status.java";

$\begin{lstlisting}[frame=trbl,caption=Extract from Status.java]{} // Launch cmd.... ...ay String[] tempArray = (String [])list.toArray(new String[0]); \end{lstlisting}$

The second class that the "Monitor" class calls, "Load", is the main class which extracts information from the cluster. The first thing this class does, is to get a list of the names of the node machines in the cluster. It does this by querying the /etc/bind/db.cluster file. This file contains a list of the node names in a column on the left. This information is extracted using the following shell script command, which is run from the java program;
$\begin{lstlisting}[frame=trbl,caption=Extract from Load.java]{} String[] cmds = {/bin/sh'', ''-c'', ''cat /etc/bind/db.cluster \vert cut -f1}; \end{lstlisting}$

The next thing the "Load" class does, is to parse the uptime command to find the 1 minute load on each node. This is done by going through the list of node names extracted in the previous example, and running the uptime command on each of them. This is done using the rsh or remote shell command.
$\begin{lstlisting}[frame=trbl,caption=Extract from Load.java]{} String[] cmds = ... ...node, ''uptime''}; Process p = Runtime.getRuntime().exec(cmds); \end{lstlisting}$

The next method that is called parses the uptime command again, this time to get the number of users on each node and the uptime of each node, ie how long it has been since the node was last rebooted. To find out the memory usage of each node, the following command is run on each node;
$\begin{lstlisting}[frame=trbl,caption=Extract from Load.java]{} rsh <node> cat /proc/meminfo \end{lstlisting}$

The /proc/meminfo file may not be used by other versions of Unix than the Linux kernel, so if this program is being run on another OS other than Linux, it might need to be changed. Another Linux-dependent file is the /proc/cpuinfo file. This file contains information about the central processing unit of the current processor. This output of this file is parsed on each node, to determine the processor model, it's speed in megahertz, and it's cache size. The next command the "Load" class issues, is to find out what the highest running process is on each node. The output of the ps aux command on Linux is sorted according to the percentage of cpu time each process is taking up, and then the greatest entry is returned. This is done with the following command;
$\begin{lstlisting}[frame=trbl,caption=Extract from Load.java]{} rsh <node> /bin/sh -c ''ps aux \vert sort +2n \vert tail -1'' \end{lstlisting}$

The next command that is run is to find out the list of users logged on to each node. This can be done by parsing the who command on Unix, in the following way;
$\begin{lstlisting}[frame=trbl,caption=Extract from Load.java]{} rsh <node> /bin/sh -c 'who \vert cut -f1 -d\uml \uml \vert sort -u' \end{lstlisting}$

The final command run by the "Load" class is to see which of the users on the head node have their messages turned on or off. This is to complicated to do under java. Instead, a program written in c is called, passing through the terminal number that each user is logged in from. This file is adopted from code written for the hey project written by people on redbrick.

The "Spy" class is initialised by the "Monitor" class with a "Status" and "Load" object, before being bound to a port using Remote Method Invocation. The Spy class is updated by the "Status" and "Load" classes whenever these classes change. The exact method is detailed in the Patterns section below. The "Spy" class provides methods, defined in the "SpyInterface" class, to return various types of cluster information. The client can call these methods on the proxy object to obtain this information. The "Spy" class also parses the unix uname command to find out the operating system name and version. This is in the class, because it only needs to be run once at the start of the server, rather than every five seconds, as this information does not change.

Next: The Client Side Up: The Graphical User Interface Previous: Remote Method Invocation Contents

Colm O hEigeartaigh 2003-05-30