Some Quick Terminology
The following terms are used frequently in the remainder of this tutorial:
- Grid - The group of nodes that are running Starfish.
- Node - Each process that is a member of the grid is called a node.
- Algorithm - This is a program that can be run on Starfish. For example, the Mandelbrot Algorithm computes Mandelbrot Set images.
- Problem - This is a task that a user creates on the grid for processing.
It is an incarnation of an algorithm and some user-specified parameters.
- Library - A library is a set of algorithms bundled in to a jar file.
These can be dynamically loaded by Starfish and used for problems.
Web Interface
Starfish can be conveniently controlled from a web browser. Simply point your browser at http://HOST:22222.
If you are running Starfish on the same machine as your browser, then this would be http://localhost:22222.
There are three main sections of the web interface. The first is the Home screen. This shows some statistics for the
grid and shows the problems that are currently loaded on to the grid. Results for completed problems are available here, and new
problems can be created from here.
Next is the Problems section. This shows the details of each problem, including the user-specified parameters for each.
Results for problems are also available here, and new problems can be created.
Last is the Grid section, which presents some statistics for the grid as a whole and allows the user to control remote nodes.
Starting a Problem
To start a new problem, click on the Add Problem... link on either the Home or Problems screen.
Next, select the algorithm you wish to use. Let's choose the Mandelbrot Algorithm.
The web interface now presents a list of parameters. Let's enter the following parameter values:
| Problem Name: | Mandelbrot Test |
| Width: | 800 |
| Height: | 800 |
| LeftX: | -2 |
| RightX: | 2 |
| TopY: | 2 |
| BottomY: | -2 |
| MaxIterations: | 1000 |
Then click
Add Problem. The problem is now added to the grid and its progress can be monitored using the web
interface. Click on either the
Home or
Problems link to do this.
Refresh the browser page to observe the progress. Once it is complete, there will be a link to download the
result. The result in this case will be a PNG image.
Controlling Problems
Controlling problems is fairly self-explanatory. Pause, Resume and Cancel links are available for
each problem. Problems are computed in the order they were added, so if a newly problem is more important than existing
problems, then those can be paused to make way for the high-priority problem.
It is a good idea to delete problems once the answer has been obtained so as to free up the resources it used.
Adding a Library
The above example uses the Mandelbrot Algorithm. This is an example algorithm that comes with Starfish.
However, users will typically have their own algorithms that they wish to run.
These are packaged in to library jar files. See the section below on How to Implement Your Own Algorithm if you
don't yet have a library file.
To run a problem using an algorithm for a new library, click on Add Problem... just as was done in the above example.
Then, click on the Add Library... link. The web interface will ask you to upload the library file. Click Browse...
and then choose the library jar file. Then hit Submit. The library should then be parsed and scanned. A brief report
of what algorithms added will result if the library load was successful. Click OK and the algorithm selection screen
will return. However, this time it will include the newly-loaded algorithms. This library-adding step does not need to be
repeated; subsequent problems can draw on these new algorithms.
Command-line Usage
Starfish can also be controlled from the command line if the web interface is not convenient.
The following commands can be entered into the terminal window:
- quit Shutdown node.
- show Show a list of all resources this node is storing and their IDs.
- view View a list of grid members.
- problems Show a list of problems current being computed or on the queue.
- state Show the state of all running problems including their segment completion status.
- result [name of problem] Retrieve the result from the problem named and allow the grid to clear resources used by that problem.
- restart Restart all nodes in the grid.
An Example Command-Line Session
What follows is an example session of Starfish. In this example, one node has already been started with
the starfish.(sh/bat) file and is waiting to process segments. The second instance is started with runsleep.(sh/bat),
which starts a node and also triggers a SleepAlgorithm problem to start.
What is shown below is the output from the first node started, words in red
are commands entered by the user.
How To Implement Your Own Algorithm
The easiest way to understand what you need to do to run your Algorithm on Starfish is by going through an example.
We'll make use of the MandelbrotAlgorithm to go step by step through the
requirements of any Algorithm to be run on Starfish.
public class MandelbrotAlgorithm extends Algorithm {
Anything that is supposed to run on Starfish needs to extend the
Algorithm class,
which then requires you to implement certain methods. These are, as implemented in the
MandelbrotAlgorithm:
public String getName() {
return "Mandelbrot Algorithm";
}
Simply returns the name of this Algorithm, it will be known as this in the GUI and web interface.
This method can be called at any time.
public void setParameters(ParameterSet ps) {
ps.addParameter(new IntegerType(PARAM_WIDTH, DESC_WIDTH, true, 0,
Integer.MAX_VALUE, null));
ps.addParameter(new IntegerType(PARAM_HEIGHT, DESC_HEIGHT, true, 0,
Integer.MAX_VALUE, null));
ps.addParameter(new DoubleType(PARAM_LEFTX, DESC_LEFTX, true, null));
ps.addParameter(new DoubleType(PARAM_RIGHTX, DESC_RIGHTX, true, null));
ps.addParameter(new DoubleType(PARAM_TOPY, DESC_TOPY, true, null));
ps.addParameter(new DoubleType(PARAM_BOTTOMY, DESC_BOTTOMY, true, null));
ps.addParameter(new IntegerType(PARAM_MAX_ITERATIONS,
DESC_MAX_ITERATIONS,
true, 0,
Integer.MAX_VALUE, null));
}
This method allows your algorithm to specify what parameters it requires. Parameters are added via the
ParameterType class. There are
many convenient implementations of this abstract class, some of which are used by the Mandelbrot algorithm above.
In this case, the Mandelbrot algorithm is specifying the parameters that will govern which part of the set will be generated,
the maximum iterations, etc. This method can be called at any time.
public void initialize(Parameters parameters) {
width = ((Integer)parameters.getParameter(PARAM_WIDTH)).intValue();
height = ((Integer)parameters.getParameter(PARAM_HEIGHT)).intValue();
leftX = ((Double)parameters.getParameter(PARAM_LEFTX)).doubleValue();
rightX = ((Double)parameters.getParameter(PARAM_RIGHTX)).doubleValue();
topY = ((Double)parameters.getParameter(PARAM_TOPY)).doubleValue();
bottomY = ((Double)parameters.getParameter(PARAM_BOTTOMY)).doubleValue();
maxIterations = ((Integer)parameters.getParameter(PARAM_MAX_ITERATIONS))
.intValue();
}
The parameters required by your Algorithm will be passed into the initialize method so you can set
up your object level variables with the correct values. This method will be called before any of the methods below.
public int totalNumberOfSegments() {
return DEFAULT_SEGMENTS;
}
In this method you need to return how many segments your problem is going to break down into for distribution to
the grid. In this algorithm's case there is a fixed number of segments no matter the parameters, but other algorithms
might have a more complicated formula for determining how many segments are going to be required, for example:
public int totalNumberOfSegments() {
return (int) Math.pow(baseChars.length, segmentStartCharacters);
}
In this case the algorithm is going to make a number of segments related to the number of characters that it
needs to search through to find the answer. In many cases you'll decide how many segments are required based on
the size of the input data or other parameters of the problem.
public Serializable processSegment(int segmentNum, Object segmentParameters) {
This is the method that does the actual computation of a segment. You are given the number of the segment the
grid wants you to process and any specific parameters for that segment. In this case of the MandelbrotAlgorithm
it uses the segment number to simply calculate what part of the image it should work on, as in:
int startY = segmentNum * height / DEFAULT_SEGMENTS;
int endY = (segmentNum + 1) * height / DEFAULT_SEGMENTS;
But other types of algorithms will need to do different things with this segment number. For example, a sequence
alignment algorithm would probably want to determine which part of the source data it should be doing it's alignment
against based on the chunk number.
public ProblemResult processResults(Uuid[] results, Resources resources) {
Finally we come to processing the data once the entire range of segments has been computed. This method will be called
on one of the nodes in the grid and is responsible for processing all individual results into the final form of the
problem. For example in the Mandelbrot example this method results in the writing of an image file. In the case of the
sequence alignment example, this method takes all the best matches found and orders them to show the top global matches.
The results are provided as an array of Uuid's which can be turned into actual data by requesting that the grid find
the data for you with a line like
int[][] data = (int[][]) resources.getResource(results[i]);
As the algorithm you know what format your results should be coming back to you in, so in this case the Mandelbrot
algorithm knows to cast it into an array of int's. Note that in most cases during normal operation the grid will
preload the results onto the node that will eventually process the results. This means that most of the calls to
getResource() will result in the data loading from the local cache of the node, making things run
much quicker than streaming them over the network during processing.
See the full documentation for more information about what happens to your Algorithm when it runs.