Using TreeSearch for Parallel Computation

by Derrick Stolee

The previously described process of combinatorial search was extremely abstract on purpose. All of the computational experiments I design are implemented using TreeSearch. TreeSearch is my own software library for running combinatorial search in a parallel environment. It abstracts the notion of a search tree into a few basic operations, which are then implemented for each individual experiment. This allows the interface for managing thousands of parallel jobs to be common among all of the other projects in my SearchLib software collection.

This library is a C++ implementation of the recursive search, including the augmentation step, pruning, and outputting solutions. In addition, the library manages tracking statistics and distributing independent jobs to a supercomputer. Today, we’ll discuss how to use TreeSearch for your own computational experiment. If you don’t want to use my code, you can use it as an example for how to manage a large number of parallel jobs.

The distributed nature of TreeSearch is somewhat superficial. There is no actual communication or parallel programming occurring in the library itself. Instead, the library is built to manage the input and output files from parallel jobs using an arbitrary scheduler. I specifically use the Condor scheduler, which works for local grids and distributed clusters. One of Condor’s specialities is using regular office machines (as in, the computer in a person’s office) as cluster nodes when the original users are not using them. In particular, the Open Science Grid, a collection of supercomputers around the country, has a running Condor scheduler. My access point is the UNL Campus Grid (designed by Derek Weitzel; see his blog on distributed computing) as part of the Holland Computing Center. Weitzel’s Campus Grid software made submitting jobs to the Open Science Grid much easier than it had previously been. I believe they are integrating his method throughout.

Subtrees as Jobs

We discussed before how any subtree could be viewed as a “job” for a single computation node, as in Figure 1.

Figure 1. Partitioning the search tree and executing in parallel.

Figure 1. Partitioning the search tree and executing in parallel.

However, we don’t know what these jobs are before we get started, and then we need to solve the problem of describing that job to a computation node. It would be infeasible to store all of the data at a given search node, including any previously computed constraint propagations or data tables. Instead, we provide “turn-by-turn directions” for the algorithm to retrace its steps to the search node. Further, we don’t know how long jobs will take, so we’d like to be able to pause a job and pick up where we left off.

The tree structure allows for search nodes to be described via the list of children at each node. For instance, when generating graphs by edges, we can simply write an integer pair for which two vertices are the endpoints. Typically, the breadth of the search will be small and these descriptions take very little space. This allows for a method of describing a search node independently of what data is actually stored by the specific search application. Moreover, the application may require visiting the ancestor search nodes in order to have consistent internal data. With the assumption that each subtree is computationally independent of other subtrees at the same level, one can run each subtree in a different process in order to achieve parallelization. These path descriptions make up the input for the specific processes in this scheme.

Figure 2. A Partial Job Description.

Figure 2. A Partial Job Description.

Each path to a search node qualifies as a single job, where the goal is to expand the entire search tree below that node. A collection of nodes where no pair are in an ancestor-descendant relationship qualifies as a list of independent jobs. Recognizing that the amount of computation required to expand the subtree at a node is not always a uniform value, TreeSearch allows a maximum amount of time within a given job. In order to recover the state of the search when the execution times out, the concept of partial jobs is defined. A partial job describes the path from the root to the current search node. In addition, it describes which node in this path is the original job node. The goal of a partial job is to expand the remaining nodes in the subtree of the job node, without expanding any nodes to the left of the last node in this path. See Figure 2 for an example partial job and its position in the job subtree.

Job Descriptions

The descriptions of jobs and partial jobs are described using text files in order to minimize the I/O constraints on the distributed system. The first is the standard job, given by a line starting with the letter J. Following this letter are a sequence of numbers in hexadecimal. The first two should be the same, corresponding to the depth of the node. The remaining numbers correspond to the child values at each depth from the root to the job node.

A partial job is described by the letter P. Here, the format is the same as a standard job except the first number describes the depth of the job node and the second number corresponds to the depth of the current node. For example, the job and partial job given in Figure 1 are described by the strings below:

     J 3 3 10 14 2
     P 3 5 10 14 2 4 3

Finally, we can use this same strategy to quickly describe solutions! Start a job description with the letter S and proceed with the typical description. In this way, we can describe in very little data how to reconstruct a solution by following these augmentation steps. If we are searching for a large number of solutions, where each is a relatively large object, this description may be more succint. However, we can also output a direct description of the object, such as an adjacency matrix or graph6 string.

Reporting Statistics

TreeSearch automatically reports and collects statistics. You can add your own custom statistics, if you follow this format:

   T [SUM|MIN|MAX] VARIABLE_NAME [NUMERIC_VALUE]

So, lead with a T, then follow with the type of statistic (sum, minimize, or maximize), a name for your variable, and a value. The statistics that TreeSearch reports include number of search nodes and total search time.

The TreeSearch Algorithm

Algorithm 1 details the full algorithm for TreeSearch execution. By running this algorithm after loading a job description (into an array called “job”), it will run the combinatorial search until either:

  1. It completes and all objects are generated.
  2. The amount of time spent goes beyond the specified KILLTIME.
  3. The number of solutions is more than the maximum number of solutions: MAXNUMSOLS.

In the case of early termination, the position of the search is stored as a partial job to be run at a later date (or to be used as the start of a job generation mode).

Algorithm 1. DoSearch(): The Recursive algorithm for TreeSearch.

    Check if there are reasons to halt.
    if mode \equiv GENERATE and depth \geq MAXDEPTH then
        call writeJob()
        return 0
    else if runtime \geq KILLTIME then
        call 
        Signal early termination.
        return -1
    else if mode \equiv LOADJOB then
        call pushTo(job[depth])
        result \leftarrow doSearch()
        call pop()
        if result < 0 or depth < JOBDEPTH then
            Do not continue with other augmentations.
            return result
        end if
        If we did not return, we are in a partial job and must continue augmenting.
    end if
    Attempt all possible augmentations.
    while pushNext() \not\equiv -1 do
        prune() \equiv 0
        if isSolution() \equiv 1 then
            numsols \leftarrow numsols + 1
            call writeSolutionJob()
            call writeSolution()
            if numsols \geq MAXNUMSOLS then
                 call writePartialJob()
                 Signal early termination.
                 return -1
            end if
        end if
        result \leftarrow doSearch()
        call pop()
        if result < 0 then
            Early termination was signaled.
        		return result
        end if
    end while
    return 0

Implementation

To integrate your own search into TreeSearch, create a new C++ class which extends the SearchManager class. This class is meant to contain the interface for a few user-defined methods which appear in Algorithm 1. However, some methods from Algorithm 1 are defined already and do not need to be redefined. Depending on your situation, you may want to integrate your data within your new class, or create a new data-only object.

One feature in the SearchManager class is a stack which contains SearchNode objects. These SearchNode objects store that node’s label as well as the current child label. You can use these (or an extension) if you want, and they can help track where you are in the search tree. However, you are responsible for pushing and popping the stack.

User-Defined Methods

The TreeSearch algorithm requires customization of the following methods:

  • pushNext() and pushTo(label): these methods encode the action of augmenting.
    The pushNext() method performs the next available augmentation, while the pushTo(label) method performs the augmentation encoded by the given label.
  • prune(): this method attempts to detect if the current object is not a sub-solution.
    Returns 1 if it detects a non-sub-solution, 0 otherwise.
  • isSolution(): this method attempts to detect if the current object satisfies the required property.
    Returns 1 if the object is a solution.
  • pop(): This method reverses the previous augmentation from a pushNext() or pushTo() method.

Optional Methods

  • writeSolution(): this method writes the necessary data of a solution to standard output. This is typically in the form of a standard format of a graph output, such as an adjacency matrix or string encoding of an adjacency list. If the data is particularly long (and you want to avoid transmission over network), this method can be ignored during execution as the job description of that node is also output and the combinatorial object can be reconstructed from the augmentation steps.
  • writeStatistics(): At the end of a job execution, some basic statistics are automatically written. Using this method, you can output some statistics that fit your specific implementation. I frequently use this to report the amount of computation time used in nauty.
  • Further Details

    There are many other facets to using TreeSearch, including:

    • Designing a Condor submission file.
    • Job management using the included Python scripts.
    • Watching jobs complete in the scheduler.
    • Best practices for a tree-based search.

    I reserve these topics for later updates. Until then, some of these topics are already described in the user guide.

    Example Projects

    If you want to see a few examples of code using TreeSearch, see the other projects of SearchLib, including:

    • EarSearch: Generate 2-connected graphs by ear augmentations.
    • Saturation: Generate uniquely K_r-saturated graphs.

    See Also

    A Visual Guide to Combinatorial Search

    References

    R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, et al. The Open Science Grid. In Journal of Physics: Conference Series, 78 12–57. IOP Publishing, (2007).

    D. Stolee, TreeSearch User Guide. TreeSearch Source Code.

    D. Thain, T. Tannenbaum, M. Livny, Distributed Computing in Practice: The Condor Experience, Concurrency and Computation: Practice and Experience 17(2-4), 323-356 (2005).

    D. Weitzel, Campus Grids: A Framework to Facilitate Resource Sharing, Masters Thesis University of Nebraska-Lincoln (2011).

    Advertisements