Using TreeSearch for Parallel Computation
by Derrick Stolee
The previously described process of combinatorial search was extremely abstract on purpose. All of the computational experiments I design are implemented using TreeSearch. TreeSearch is my own software library for running combinatorial search in a parallel environment. It abstracts the notion of a search tree into a few basic operations, which are then implemented for each individual experiment. This allows the interface for managing thousands of parallel jobs to be common among all of the other projects in my SearchLib software collection.
This library is a C++ implementation of the recursive search, including the augmentation step, pruning, and outputting solutions. In addition, the library manages tracking statistics and distributing independent jobs to a supercomputer. Today, we’ll discuss how to use TreeSearch for your own computational experiment. If you don’t want to use my code, you can use it as an example for how to manage a large number of parallel jobs.
The distributed nature of TreeSearch is somewhat superficial. There is no actual communication or parallel programming occurring in the library itself. Instead, the library is built to manage the input and output files from parallel jobs using an arbitrary scheduler. I specifically use the Condor scheduler, which works for local grids and distributed clusters. One of Condor’s specialities is using regular office machines (as in, the computer in a person’s office) as cluster nodes when the original users are not using them. In particular, the Open Science Grid, a collection of supercomputers around the country, has a running Condor scheduler. My access point is the UNL Campus Grid (designed by Derek Weitzel; see his blog on distributed computing) as part of the Holland Computing Center. Weitzel’s Campus Grid software made submitting jobs to the Open Science Grid much easier than it had previously been. I believe they are integrating his method throughout.
Subtrees as Jobs
We discussed before how any subtree could be viewed as a “job” for a single computation node, as in Figure 1.
However, we don’t know what these jobs are before we get started, and then we need to solve the problem of describing that job to a computation node. It would be infeasible to store all of the data at a given search node, including any previously computed constraint propagations or data tables. Instead, we provide “turn-by-turn directions” for the algorithm to retrace its steps to the search node. Further, we don’t know how long jobs will take, so we’d like to be able to pause a job and pick up where we left off.
The tree structure allows for search nodes to be described via the list of children at each node. For instance, when generating graphs by edges, we can simply write an integer pair for which two vertices are the endpoints. Typically, the breadth of the search will be small and these descriptions take very little space. This allows for a method of describing a search node independently of what data is actually stored by the specific search application. Moreover, the application may require visiting the ancestor search nodes in order to have consistent internal data. With the assumption that each subtree is computationally independent of other subtrees at the same level, one can run each subtree in a different process in order to achieve parallelization. These path descriptions make up the input for the specific processes in this scheme.
Each path to a search node qualifies as a single job, where the goal is to expand the entire search tree below that node. A collection of nodes where no pair are in an ancestor-descendant relationship qualifies as a list of independent jobs. Recognizing that the amount of computation required to expand the subtree at a node is not always a uniform value, TreeSearch allows a maximum amount of time within a given job. In order to recover the state of the search when the execution times out, the concept of partial jobs is defined. A partial job describes the path from the root to the current search node. In addition, it describes which node in this path is the original job node. The goal of a partial job is to expand the remaining nodes in the subtree of the job node, without expanding any nodes to the left of the last node in this path. See Figure 2 for an example partial job and its position in the job subtree.
The descriptions of jobs and partial jobs are described using text files in order to minimize the I/O constraints on the distributed system. The first is the standard job, given by a line starting with the letter
J. Following this letter are a sequence of numbers in hexadecimal. The first two should be the same, corresponding to the depth of the node. The remaining numbers correspond to the child values at each depth from the root to the job node.
A partial job is described by the letter
P. Here, the format is the same as a standard job except the first number describes the depth of the job node and the second number corresponds to the depth of the current node. For example, the job and partial job given in Figure 1 are described by the strings below:
J 3 3 10 14 2 P 3 5 10 14 2 4 3
Finally, we can use this same strategy to quickly describe solutions! Start a job description with the letter
S and proceed with the typical description. In this way, we can describe in very little data how to reconstruct a solution by following these augmentation steps. If we are searching for a large number of solutions, where each is a relatively large object, this description may be more succint. However, we can also output a direct description of the object, such as an adjacency matrix or
TreeSearch automatically reports and collects statistics. You can add your own custom statistics, if you follow this format:
T [SUM|MIN|MAX] VARIABLE_NAME [NUMERIC_VALUE]
So, lead with a
T, then follow with the type of statistic (sum, minimize, or maximize), a name for your variable, and a value. The statistics that TreeSearch reports include number of search nodes and total search time.
The TreeSearch Algorithm
Algorithm 1 details the full algorithm for TreeSearch execution. By running this algorithm after loading a job description (into an array called “job”), it will run the combinatorial search until either:
- It completes and all objects are generated.
- The amount of time spent goes beyond the specified KILLTIME.
- The number of solutions is more than the maximum number of solutions: MAXNUMSOLS.
In the case of early termination, the position of the search is stored as a partial job to be run at a later date (or to be used as the start of a job generation mode).
DoSearch(): The Recursive algorithm for TreeSearch.
Check if there are reasons to halt. if mode GENERATE and depth MAXDEPTH then call writeJob() return 0 else if runtime KILLTIME then call Signal early termination. return -1 else if mode LOADJOB then call pushTo(job[depth]) result doSearch() call pop() if result 0 or depth JOBDEPTH then Do not continue with other augmentations. return result end if If we did not return, we are in a partial job and must continue augmenting. end if Attempt all possible augmentations. while pushNext() do prune() if isSolution() then numsols numsols call writeSolutionJob() call writeSolution() if numsols MAXNUMSOLS then call writePartialJob() Signal early termination. return -1 end if end if result doSearch() call pop() if result 0 then Early termination was signaled. return result end if end while return 0
To integrate your own search into TreeSearch, create a new C++ class which extends the
SearchManager class. This class is meant to contain the interface for a few user-defined methods which appear in Algorithm 1. However, some methods from Algorithm 1 are defined already and do not need to be redefined. Depending on your situation, you may want to integrate your data within your new class, or create a new data-only object.
One feature in the
SearchManager class is a stack which contains
SearchNode objects. These
SearchNode objects store that node’s label as well as the current child label. You can use these (or an extension) if you want, and they can help track where you are in the search tree. However, you are responsible for pushing and popping the stack.
The TreeSearch algorithm requires customization of the following methods:
pushTo(label): these methods encode the action of augmenting.
pushNext()method performs the next available augmentation, while the
pushTo(label)method performs the augmentation encoded by the given label.
prune(): this method attempts to detect if the current object is not a sub-solution.
Returns 1 if it detects a non-sub-solution, 0 otherwise.
isSolution(): this method attempts to detect if the current object satisfies the required property.
Returns 1 if the object is a solution.
pop(): This method reverses the previous augmentation from a
writeSolution(): this method writes the necessary data of a solution to standard output. This is typically in the form of a standard format of a graph output, such as an adjacency matrix or string encoding of an adjacency list. If the data is particularly long (and you want to avoid transmission over network), this method can be ignored during execution as the job description of that node is also output and the combinatorial object can be reconstructed from the augmentation steps.
writeStatistics(): At the end of a job execution, some basic statistics are automatically written. Using this method, you can output some statistics that fit your specific implementation. I frequently use this to report the amount of computation time used in
There are many other facets to using TreeSearch, including:
- Designing a Condor submission file.
- Job management using the included Python scripts.
- Watching jobs complete in the scheduler.
- Best practices for a tree-based search.
I reserve these topics for later updates. Until then, some of these topics are already described in the user guide.
If you want to see a few examples of code using TreeSearch, see the other projects of SearchLib, including:
- EarSearch: Generate 2-connected graphs by ear augmentations.
- Saturation: Generate uniquely -saturated graphs.
R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, et al. The Open Science Grid. In Journal of Physics: Conference Series, 78 12–57. IOP Publishing, (2007).
D. Thain, T. Tannenbaum, M. Livny, Distributed Computing in Practice: The Condor Experience, Concurrency and Computation: Practice and Experience 17(2-4), 323-356 (2005).
D. Weitzel, Campus Grids: A Framework to Facilitate Resource Sharing, Masters Thesis University of Nebraska-Lincoln (2011).