The "Monitor" class is run on the cluster. This class is a thread which calls two other classes every five seconds. The first class, "Status", checks to see which nodes are not operational. It does this by calling the pbsnodes -l command. This command outputs a two columned list. The left-hand column is a list of all nodes that are "down", the right hand side gives their exact status. An example of this command running is the following;
All of the command-line programs that are called and parsed in this project, are called in the same way.
This involves instantiating a Runtime object, and calling it's exec method with the
command name as a parameter. The output of this command is captured in an ArrayList object using an
InputStream object. The following is a code extract from "Status.java";
The second class that the "Monitor" class calls, "Load", is the main
class which extracts information from the cluster. The first thing this
class does, is to get a list of the names of the node machines in the
cluster. It does this by querying the /etc/bind/db.cluster file.
This file contains a list of the node names in a column on the left. This
information is extracted using the following shell script command, which
is run from the java program;
The next thing the "Load" class does, is to parse the uptime command
to find the 1 minute load on each node. This is done by going through the
list of node names extracted in the previous example, and running the uptime command on each of them. This is done using the rsh or remote
shell command.
The next method that is called parses the uptime command again, this time to
get the number of users on each node and the uptime of each node, ie how long
it has been since the node was last rebooted. To find out the memory usage
of each node, the following command is run on each node;
The /proc/meminfo file may not be used by other versions of Unix than
the Linux kernel, so if this program is being run on another OS other than
Linux, it might need to be changed. Another Linux-dependent file is the
/proc/cpuinfo file. This file contains information about the central
processing unit of the current processor. This output of this file is
parsed on each node, to determine the processor model, it's speed in megahertz,
and it's cache size. The next command the "Load" class issues, is to find out
what the highest running process is on each node. The output of the ps aux command on Linux is sorted according to the percentage of cpu time each
process is taking up, and then the greatest entry is returned. This is done
with the following command;
The next command that is run is to find out the list of users logged on to each
node. This can be done by parsing the who command on Unix, in the
following way;
The final command run by the "Load" class is to see which of the users on the
head node have their messages turned on or off. This is to complicated to do
under java. Instead, a program written in c is called, passing
through the terminal number that each user is logged in from. This file is
adopted from code written for the hey project written by people
on redbrick.
The "Spy" class is initialised by the "Monitor" class with a "Status" and "Load" object, before being bound to a port using Remote Method Invocation. The Spy class is updated by the "Status" and "Load" classes whenever these classes change. The exact method is detailed in the Patterns section below. The "Spy" class provides methods, defined in the "SpyInterface" class, to return various types of cluster information. The client can call these methods on the proxy object to obtain this information. The "Spy" class also parses the unix uname command to find out the operating system name and version. This is in the class, because it only needs to be run once at the start of the server, rather than every five seconds, as this information does not change.