Homepage
Please be sure to read the Tutorial section before reading this section.

This section contains detailed documentation regarding the configuration of Starfish and the implementation of Algorithms.

Configuration

For most hardware configurations, Starfish should work fine with default settings. However, there are a few special siutations that could arise.

The file starfish.properties contains five configuration properties:
PropertyDefault ValueDescription
channel.multicast.ip 232.11.11.232 This is the mutlicast IP address over which the grid will co-ordinate and communicate. This can be changed (along with channel.group.name) in order to have independent Starfish grids operating on one LAN.
channel.group.name LarvaLab This is the group name used by the underlying JGroups layer to co-ordinate the grid. It is not necessary to change this, but it can be used to run independent Starfish grids on one LAN.
resource.redundancy 1 This parameter sets the desired level of resource redundance on the grid. With the default setting, each node will attempt to maintain a copy of all its resources on a single neighboring node. With a value of 2, it will attempt to maintain two copies, etc. A value of 0 turns all resource redundancy off. This will reduce network traffic, but use this only in a very high-availability environment. For most normal situations, the default redundancy level will suffice.
channel.properties.pre Large value omitted JGroups uses a property string to configure its protocol stack. In Starfish, this string is the concatenation of the values channel.properties.pre, channel.multicast.ip and channel.properties.post. So, this parameter and channel.properties.post can be used to configure the behavior of the JGroups protocol stack. Visit the JGroups site to learn more about these configuration strings.
channel.properties.post Large value omitted See above.
A few other configuration options are available in the file launcher.xml. If a Starfish host has more than one ethernet adapter, then the user can specify which adapater to use by uncommenting the line:

<sysproperty key="bind.address" value="127.0.0.1"/>
And replace 127.0.0.1 with the IP address to which the ethernet adapter is bound.

Finally, the launcher.xml file has configured Starfish for "headless" hosts (hosts with no video). Removing the following line will configure Starfish for hosts with video capability:

<jvmarg value="-Djava.awt.headless=true"/>

Lifecycle of an Algorithm

This section explains the details of how the Algorithm class is used by Starfish. This information should help developers create Algorithm implementations.

Algorithms are instantiated using reflection. They must have no-argument constructors. There are four different ways in which an Algorithm object will be instantiated and used:

Template Instances
When using the web interface, a user will select an Algorithm, and then enter the parameters for their particular problem. These will be validated before the problem is loaded on the grid and run. The Algorithm selection and parameter validation are done using a template instance of each Algorithm. A template instance will result in the following methods being called:

  • java.lang.String getName()
  • void setParameters(ParameterSet parameterSet)
These methods can be called any number of times and in any order. However, this is all that will happen to a template instance. It is otherwise unused.

Input Data Instance
Once the problem parameters have been validated by the template instance, a new instance is created for the purpose of creating the segment input data. The following methods are called on this instance (in order):

  1. void setParameters(ParameterSet parameterSet)
  2. int totalNumberOfSegments()
  3. void initialize(Parameters parameters)
  4. void generateSegmentInputData(SegmentInputData segmentInputData)
Note that this instance is only created once per problem, and only on the node on which the user created the problem.

Processing Instances
Once a problem is operating on the grid, each node will create an instance of the Algorithm and process segments. The processing instance accomplishes this using the methods (called in order):

  1. void initialize(Parameters parameters)
  2. java.io.Serializable processSegment(int segmentNum, java.lang.Object segmentParams)

Co-ordinator Instance
Once all segments of a problem have completed processing, one of the processing instances will become the co-ordinating instance. The grid will ensure that it has all the segment results available to it. Then, it will process the results by calling the method:

  • ProblemResult processResults(Uuid[] resultIds, Resources resources)

Algorithm Implementation Notes
Here are a few additional pointers for Algorithm developers.

Starfish allows the developer to determine how to segment their algorithm. Too few segments will result in poor scalability, as there may be more nodes than segments available for much of the processing time. However, too many segments can introduce a lot of overhead and network traffic. The developer should aspire to achieve roughly the same number of segments regardless of the input to the Algorithm. Typically a good number is somewhere on the order 1000.

Starfish is designed to scale to very large problems. The expectation is that any one segment is manageable on an "average" computer with modest memory size. However, the results of all segments may be larger than can fit in to the main memory of any single host (specifically, the host that is running the co-ordinator instance). Starfish handles this by persisting segment results out to disk when not in use. However, it is up to the developer to ensure that the processResults method does not utilize all segment results at once if they are very large objects. For example, in the Mandelbrot Fly-Through Algorithm, each segment result is a video frame. It is job of the processResults method to combine each of these frames in to a single video file. Instead of loading all frames in to memory before processing, the method streams each frame individually through the video encoder. This allows Starfish to reclaim the memory for segment results that have already been processed.

Starfish utilizes the Apache Commons-Logging framework. You can use this logging framework in your algorithms.

Complete Example Algorithm

For those who learn best from example, the source code for a complete Algorithm implementation is available here.

Credits

Starfish was developed by John Watkinson and Matt Hall.