### Using TreeSearch for Parallel Computation

#### by Derrick Stolee

The previously described process of combinatorial search was extremely abstract on purpose. All of the computational experiments I design are implemented using *TreeSearch*. *TreeSearch* is my own software library for running combinatorial search in a parallel environment. It abstracts the notion of a search tree into a few basic operations, which are then implemented for each individual experiment. This allows the interface for managing thousands of parallel jobs to be common among all of the other projects in my *SearchLib* software collection.

This library is a C++ implementation of the recursive search, including the augmentation step, pruning, and outputting solutions. In addition, the library manages tracking statistics and distributing independent jobs to a supercomputer. Today, we’ll discuss how to use *TreeSearch* for your own computational experiment. If you don’t want to use my code, you can use it as an example for how to manage a large number of parallel jobs.

The distributed nature of *TreeSearch* is somewhat superficial. There is no actual communication or parallel programming occurring in the library itself. Instead, the library is built to manage the input and output files from parallel jobs using an arbitrary scheduler. I specifically use the Condor scheduler, which works for local grids and distributed clusters. One of Condor’s specialities is using regular office machines (as in, the computer in a person’s office) as cluster nodes when the original users are not using them. In particular, the Open Science Grid, a collection of supercomputers around the country, has a running Condor scheduler. My access point is the UNL Campus Grid (designed by Derek Weitzel; see his blog on distributed computing) as part of the Holland Computing Center. Weitzel’s Campus Grid software made submitting jobs to the Open Science Grid much easier than it had previously been. I believe they are integrating his method throughout.

### Subtrees as Jobs

We discussed before how any subtree could be viewed as a “job” for a single computation node, as in Figure 1.

However, we don’t know what these jobs are before we get started, and then we need to solve the problem of *describing* that job to a computation node. It would be infeasible to store all of the data at a given search node, including any previously computed constraint propagations or data tables. Instead, we provide “turn-by-turn directions” for the algorithm to retrace its steps to the search node. Further, we don’t know how long jobs will take, so we’d like to be able to *pause* a job and pick up where we left off.

The tree structure allows for search nodes to be described via the list of children at each node. For instance, when generating graphs by edges, we can simply write an integer pair for which two vertices are the endpoints. Typically, the breadth of the search will be small and these descriptions take very little space. This allows for a method of describing a search node independently of what data is actually stored by the specific search application. Moreover, the application may require visiting the ancestor search nodes in order to have consistent internal data. With the assumption that each subtree is computationally independent of other subtrees at the same level, one can run each subtree in a different process in order to achieve parallelization. These path descriptions make up the input for the specific processes in this scheme.

Each path to a search node qualifies as a single job, where the goal is to expand the entire search tree below that node. A collection of nodes where no pair are in an ancestor-descendant relationship qualifies as a list of independent jobs. Recognizing that the amount of computation required to expand the subtree at a node is not always a uniform value, *TreeSearch* allows a maximum amount of time within a given job. In order to recover the state of the search when the execution times out, the concept of *partial jobs* is defined. A partial job describes the path from the root to the current search node. In addition, it describes which node in this path is the original job node. The goal of a partial job is to expand the remaining nodes in the subtree of the job node, without expanding any nodes to the left of the last node in this path. See Figure 2 for an example partial job and its position in the job subtree.

#### Job Descriptions

The descriptions of jobs and partial jobs are described using text files in order to minimize the I/O constraints on the distributed system. The first is the standard job, given by a line starting with the letter `J`

. Following this letter are a sequence of numbers in hexadecimal. The first two should be the same, corresponding to the depth of the node. The remaining numbers correspond to the child values at each depth from the root to the job node.

A partial job is described by the letter `P`

. Here, the format is the same as a standard job except the first number describes the depth of the job node and the second number corresponds to the depth of the current node. For example, the job and partial job given in Figure 1 are described by the strings below:

J 3 3 10 14 2 P 3 5 10 14 2 4 3

Finally, we can use this same strategy to quickly describe solutions! Start a job description with the letter `S`

and proceed with the typical description. In this way, we can describe in very little data how to reconstruct a solution by following these augmentation steps. If we are searching for a large number of solutions, where each is a relatively large object, this description may be more succint. However, we can also output a direct description of the object, such as an adjacency matrix or `graph6`

string.

#### Reporting Statistics

*TreeSearch* automatically reports and collects statistics. You can add your own custom statistics, if you follow this format:

T [SUM|MIN|MAX] VARIABLE_NAME [NUMERIC_VALUE]

So, lead with a `T`

, then follow with the *type* of statistic (sum, minimize, or maximize), a name for your variable, and a value. The statistics that *TreeSearch* reports include number of search nodes and total search time.

### The *TreeSearch* Algorithm

Algorithm 1 details the full algorithm for *TreeSearch* execution. By running this algorithm after loading a job description (into an array called “job”), it will run the combinatorial search until either:

- It completes and all objects are generated.
- The amount of time spent goes beyond the specified KILLTIME.
- The number of solutions is more than the maximum number of solutions: MAXNUMSOLS.

In the case of early termination, the position of the search is stored as a partial job to be run at a later date (or to be used as the start of a job generation mode).

**Algorithm 1.** `DoSearch()`

: The Recursive algorithm for *TreeSearch*.

Check if there are reasons to halt.ifmode GENERATEanddepth MAXDEPTHthencallwriteJob()return0else ifruntime KILLTIMEthencallSignal early termination.return-1else ifmode LOADJOBthencallpushTo(job[depth]) result doSearch()callpop()ifresult 0ordepth JOBDEPTHthenDo not continue with other augmentations.returnresultend ifIf we did not return, we are in a partial job and must continue augmenting.end ifAttempt all possible augmentations.whilepushNext()doprune()ifisSolution()thennumsols numsolscallwriteSolutionJob()callwriteSolution()ifnumsols MAXNUMSOLSthencallwritePartialJob()Signal early termination.return-1end ifend ifresult doSearch()callpop()ifresult 0thenEarly termination was signaled.returnresultend ifend whilereturn0

### Implementation

To integrate your own search into *TreeSearch*, create a new C++ class which extends the `SearchManager`

class. This class is meant to contain the interface for a few user-defined methods which appear in Algorithm 1. However, some methods from Algorithm 1 are defined already and do not need to be redefined. Depending on your situation, you may want to integrate your data within your new class, or create a new data-only object.

One feature in the `SearchManager`

class is a stack which contains `SearchNode`

objects. These `SearchNode`

objects store that node’s label as well as the current child label. You can use these (or an extension) if you want, and they can help track where you are in the search tree. However, you are responsible for pushing and popping the stack.

#### User-Defined Methods

The *TreeSearch* algorithm requires customization of the following methods:

`pushNext()`

and`pushTo(label)`

: these methods encode the action of augmenting.

The`pushNext()`

method performs the next available augmentation, while the`pushTo(label)`

method performs the augmentation encoded by the given label.`prune()`

: this method attempts to detect if the current object is not a sub-solution.

Returns 1 if it detects a non-sub-solution, 0 otherwise.`isSolution()`

: this method attempts to detect if the current object satisfies the required property.

Returns 1 if the object is a solution.`pop()`

: This method reverses the previous augmentation from a`pushNext()`

or`pushTo()`

method.

#### Optional Methods

`writeSolution()`

: this method writes the necessary data of a solution to standard output. This is typically in the form of a standard format of a graph output, such as an adjacency matrix or string encoding of an adjacency list. If the data is particularly long (and you want to avoid transmission over network), this method can be ignored during execution as the job description of that node is also output and the combinatorial object can be reconstructed from the augmentation steps.`writeStatistics()`

: At the end of a job execution, some basic statistics are automatically written. Using this method, you can output some statistics that fit your specific implementation. I frequently use this to report the amount of computation time used in `nauty`

.### Further Details

There are many other facets to using *TreeSearch*, including:

- Designing a Condor submission file.
- Job management using the included Python scripts.
- Watching jobs complete in the scheduler.
- Best practices for a tree-based search.

I reserve these topics for later updates. Until then, some of these topics are already described in the user guide.

### Example Projects

If you want to see a few examples of code using *TreeSearch*, see the other projects of *SearchLib*, including:

*EarSearch*: Generate 2-connected graphs by ear augmentations.*Saturation*: Generate uniquely -saturated graphs.

### See Also

A Visual Guide to Combinatorial Search

### References

R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, *et al.* The Open Science Grid. In *Journal of Physics: Conference Series*, 78 12–57. IOP Publishing, (2007).

D. Stolee, *TreeSearch* User Guide. *TreeSearch* Source Code.

D. Thain, T. Tannenbaum, M. Livny, Distributed Computing in Practice: The Condor Experience, *Concurrency and Computation: Practice and Experience* 17(2-4), 323-356 (2005).

D. Weitzel, Campus Grids: A Framework to Facilitate Resource Sharing, Masters Thesis *University of Nebraska-Lincoln* (2011).