A Visual Guide to Combinatorial Search

Today, we investigate a way to visualize combinatorial search in a way that touches on most of the computational and mathematical concerns. Throughout the description, we shall refer to a common example of generating graphs using edge augmentations. Since we’ve already seen some combinatorial searches before (see Introduction to Canonical Deletion and Introduction to Orbital Branching for examples), some of this discussion will seem familiar.

Disclaimer: I adapted this post from a chapter of my thesis.

Suppose a combinatorial search is defined by searching for combinatorial objects from a family ${\mathcal L}$ of labeled objects. Under the appropriate definition of isomorphism for those objects, let $\sim$ be the isomorphism relation and ${\mathcal U}$ be the family of unlabeled objects: the equivalence classes under $\sim$. Let $P : {\mathcal L} \to \{0,1\}$ be a property, and we wish to generate all objects $X$ in ${\mathcal L}$ where $P(X) = 1$. We shall assume the property $P$ is invariant under isomorphism ($\sim$): for all unlabeled objects ${\mathcal X} \in {\mathcal U}$ and labeled objects $X, X' \in {\mathcal X}$, $P(X) = P(X')$. In this case, we can define $P({\mathcal X})$ for an unlabeled object ${\mathcal X}$ to be equal to $P(X)$ for any labeled object $X \in {\mathcal X}$.

Example. Let ${\mathcal L}$ be the set of graphs of order $n$. Then ${\mathcal U}$ is the family of unlabeled graphs where the standard relation of isomorphism ($\cong$) is used between graphs. The property $P$ could be $P(G) = 1$ if and only if $G$ is 4-regular and $G$ has chromatic number three. End Example.

Base Objects and Augmentations

A combinatorial search consists of a set $B \subset {\mathcal L}$ of base objects and an augmentation.

Let $B \subset {\mathcal L}$ be a set of labeled objects For a labeled object $X \in {\mathcal L}$, the augmentation defines a set ${\mathcal A}(X)$ of augmented objects. Let ${\mathcal D}(X)$ be the set of deleted objects, defined as ${\mathcal D}(X) = \{ Y \in {\mathcal L} : X \in {\mathcal A}(Y)\}.$ We shall consider our objects as being built up, so the set of augmented objects ${\mathcal A}(X)$ can be called the above objects while the deleted objects ${\mathcal D}(X)$ are the downward objects.

For isomorphism concerns, we shall assume that for an unlabeled object ${\mathcal X} \in {\mathcal U}$, any two labeled objects $X, X' \in {\mathcal X}$ have a bijection $\pi_{X,X'} : {\mathcal A}(X) \to {\mathcal A}(X')$ so that for all $Y \in {\mathcal A}(X)$, $Y \sim \pi_{X,X'}(Y)$. This allows us to define the augmented and deleted objects ${\mathcal A}({\mathcal X})$ and ${\mathcal D}({\mathcal X})$ for unlabeled objects ${\mathcal X} \in {\mathcal U}$ as well.

Example. For enumerating all graphs of order $n$, the empty graph of order $n$ can serve as a base object. The augmentation step from a graph $G \in {\mathcal L}$ may be adding an edge to $E(G)$. Therefore, ${\mathcal A}(G) = \{ G + e : e \in E(\overline{G}) \}$ and ${\mathcal D}(G) = \{ G - e : e \in E(G) \}$. End Example.

Search as a Poset

This relationship between augmented and deleted objects defines a partial order $\preceq$ on objects in ${\mathcal L}$. For $X, Y \in {\mathcal L}$, let $X \preceq Y$ be a cover relation if and only if $Y \in {\mathcal A}(X)$ (equivalently, $X \in {\mathcal D}(Y)$). Extending these cover relations by transitivity makes $\preceq$ a partial order over ${\mathcal L}$. By our assumptions on unlabeled objects, $\preceq$ defines a partial order over ${\mathcal U}$.

The combinatorial search is complete if every unlabeled object ${\mathcal X} \in {\mathcal U}$ with $P({\mathcal X}) = 1$, there is a base object $Y \in B$ so that $Y \preceq X$ for some $X \in {\mathcal X}$. A complete search ensures that every unlabeled object is visited at least once.

Figure 1 visualizes the poset $({\mathcal U}, \preceq)$ and shows the unlabeled objects with property $P$ as dots.

Figure 1. The search space as a poset.

Example. When generating graphs by edge augmentations, two unlabeled graphs $G$ and $H$ have $G \preceq H$ if and only if $G$ is isomorphic to a subgraph of $H$. Since the empty graph is a subgraph of every graph, this search is complete. End Example.

Algorithm Structure

Consider the search space as a directed graph ${\mathcal H}$ with vertex set $V({\mathcal H}) = {\mathcal L}$ (every vertex is a labeled object) and edge set $E({\mathcal H})$ given by an edge from every $X \in {\mathcal L}$ to every $Y \in {\mathcal A}(X)$. This graph ${\mathcal H}$ is also the Hasse diagram of the poset $({\mathcal L}, \preceq)$.

The combinatorial search algorithm is a depth-first search on ${\mathcal H}$ starting at every base object in $B$. This very basic view of the process is given as Algorithm 1.

Algorithm 1. CombinatorialSearch1($X$)

    if $P(X) \equiv 1$ then
output $X$
end if
for all $Y \in {\mathcal A}(X)$
call CombinatorialSearch1($Y$)
end for

Example. Algorithm 2 demonstrates a specific instance of Algorithm 1 where the objects are graphs, the augmentation involves adding edges, and the property $P(G) = 1$ if and only if $G$ is 4-regular and has chromatic number three. End Example.

Algorithm 2. GraphSearch($G$)

    if $\delta(G) \equiv \Delta(G) \equiv 4$ and $\chi(G) \equiv 3$} then
output $G$
end if
for all edges $e \in E(\overline{G})$
call GraphSearch1($G + e$)
end for

Note that this process generates all labeled objects regardless of whether they can eventually lead to solutions. Further, the algorithm must operate on labeled graphs, so it currently does not remove multiple representatives of the same unlabeled object.

Sub-solutions and Pruning

A labeled object $X$ (or unlabeled object ${\mathcal X}$) is a sub-solution if there exists a labeled object $Y$ (or unlabeled object ${\mathcal Y}$) with $X \preceq Y$ and $P(Y) = 1$ (or ${\mathcal X} \preceq {\mathcal Y}$ and $P({\mathcal Y}) = 1$). In the poset, the sub-solutions form a down-set generated by the objects with property $P$. Since these objects have some sequence of augmentations which lead to a solution, we must generate every unlabeled sub-solution at least once.

We also hope that we could immediately detect when our current object is not a sub-solution (there is no sequence of augmentations which leads to a solution). If we could immediately determine if the object is not a sub-solution, we could ignore these objects and generate only the sub-solutions. Unfortunately, the space of objects where we can efficiently detect that the object is not a sub-solution is not the complement of the sub-solution space. Figure 2 shows the regions of sub-solutions and those that are detectably not sub-solutions and there is a gap.

Figure 2. Sub-solutions and pruning space.

Suppose we create a procedure called Detect($X$) which takes a labeled object $X \in {\mathcal L}$ and returns $1$ only if $X$ is not a sub-solution. We can modify the combinatorial search to utilize this procedure, as in Algorithm 3.

Algorithm 3. CombinatorialSearch2($X$)

    if $P(X) \equiv 1$ then
output $X$
end if
if Detect($X$) $\equiv 1$ then
return
end if
for all $Y \in {\mathcal A}(X)$
call CombinatorialSearch2($X$)
end for

Example. For our example of generating 4-regular graphs with chromatic number three, a graph $G$ with maximum degree $\Delta(G)$ at least five cannot extend to a 4-regular graph by adding edges. Similarly, a graph with chromatic number at least four cannot extend to a three-chromatic graph. Algorithm 4 demonstrates a specific instance of Algorithm 3 by detecting non-sub-solutions using $\Delta(G)$ and $\chi(G)$. End Example.

Algorithm 4. GraphSearch2($G$)

    if $\delta(G) \equiv \Delta(G) \equiv 4$ and $\chi(G) \equiv 3$ then
output $G$
end if
if $\Delta(G) \geq 5$ or $\chi(G) \geq 4$ then
return
end if
for all edges $e \in E(\overline{G})$
call GraphSearch1($G + e$)
end for

An ideal path through the search space is to always remain within the sub-solutions and always hit a solution no matter what path is taken. However, this is not always possible. Sometime the augmentation will lead to an object which is not a sub-solution, but it takes a few more steps before reaching an object which is detectably not a sub-solution. Figure 3 demonstrates this issue.

Figure 3. Ideal and non-ideal paths in the search space.

In a non-ideal path, the number of steps between reaching a non-sub-solution and actually detecting that there is no solution greatly changes the run time of an algorithm. In this region, the algorithm is thrashing: augmenting in all possible ways for several steps before backtracking, with no hope of finding a solution. To reduce the run time, the gap between sub-solutions and detectably non-sub-solutions must be reduced. This step requires modifying the augmentation step or proving a theorem.

Number of Paths to Each Unlabeled Object

Consider an unlabeled object ${\mathcal X}$ and a base object $Y \in B$. Let ${\mathcal Y}$ be the unlabeled object for $Y$. The interval between ${\mathcal Y}$ and ${\mathcal X}$ contains all unlabeled object ${\mathcal Z}$ so that ${\mathcal Y} \preceq {\mathcal Z} \preceq {\mathcal X}$ (Figure 4 shows such an interval). As stated before, every object ${\mathcal Z}$ in this interval must be generated at least once in order to guarantee that the search is complete.

Figure 4. An interval of partial solutions.

However, if we are not careful, the Hasse diagram on this interval can be a tightly woven network. The number of times a labeled object $X \in {\mathcal X}$ is generated is equal to the number of paths in this graph.

Example. When generating a 4-regular graph $G$ on $n$ vertices by edge augmentations, there are a total of $2n$ total edges in $G$. If we assume that almost every subgraph of $G$ is distinct up to isomorphism (which is not that much of an assumption, since almost all graphs have no non-trivial automorphisms), then there are $2^{2n}$ objects in the down-set generated by $G$. Not only is that a large number of subobjects, but there are up to $(2n)!$ different ways to order the edges and build $G$ by adding edges in that order. Since $(2n)! = 2^{\theta(n\log n)}$, this is asymptotically worse than just generating all subobjects. Further, this is based on the number of ways to build $G$ as a labeled object. If $G$ is a rigid graph, there are $n!$ different labeled graphs isomorphic to $G$, and hence $n! \cdot (2n)!$ possible ways to generate this isomorphism class. To avoid such drastic costs, something must be done to reduce isomorphic copies of $G$. End Example.

Figure 5. The search space as a tree.

In an ideal world, every unlabeled object is generated at most once. Thus, there is at most one path in the Hasse diagram from any base object to any unlabeled object. This creates a tree structure to the poset, as seen in Figure 5.

There is an important tradeoff to consider when designing computational searches. The total computation time can be modeled as the amount of computation per object: $C(X)$. Thus, Total Time = Number of Objects $\times$ Avg. Computation per Object} Depending on the problem considered, different augmentation steps can change either the number of objects in the space or the average computation per object. Figure 6 models computation time as the amount of black in the figure. In Figure 6.a, there are many nodes but they require very little cost per node. In contrast, Figure 6.b has fewer nodes but each requires more cost. Which has less computation time? How could you tell before implementing and executing the search?

Figure 6. Comparing the tradeoffs for number of nodes and computation per node.

Similar tradeoffs occur with different methods to reduce isomorphic duplicates. One technique may remove all duplicates, but be overwhelmingly expensive to perform the computations. Another technique may be very quick per object, but suffers from combinatorial explosion in the number of isomorphic duplicates that appear. Finding the balance between these costs requires experience and experimentation.

Example. When generating graphs, it is almost always the case that generating isomorphic duplicates leads to combinatorial explosion. Using a technique such as canonical deletion to remove isomorphic duplicates is a well-studied technique. However, this techniques works for augmenting by edges or augmenting by vertices. It is almost always the case that augmenting by edges requires at least as much computation per node as augmenting by vertices, except the number of objects that are visited is significantly more for edge augmentations. This is because edge augmentations visit every subgraph while vertex augmentations only visit induced subgraphs. Since induced subgraphs include knowledge about non-edges, more is known about the supergraphs that can be generated later, and non-sub-solutions may be detected earlier.

However, there are always counterexamples to the common theme: In a later post, we will find that augmenting by ears is very similar to augmenting by edges, but it allows a certain invariant to be monotonic in these augmentations where it would not necessarily be for vertex augmentations. Further, in this example the ear augmentations are intimately tied to the structure of the target graphs and greatly assists the detection of non-sub-solutions.

When we generate uniquely $K_r$-satuated graphs, we also go against the typical pattern by allowing isomorphic duplicates in favor of a shorter computation time per object. While this increases the number of labeled objects in total, the reduced amount of computation is significant and allows the search to expand to graphs of order $20$ while other techniques became intractable around $14$ vertices. End Example.

Partitioning the Search Space and Parallelization

If we are given a tree-like search space, we can partition the search space by sub-trees.

Given a base object $Y \in B$, the objects at the $i$th level (denoted ${\mathcal L}_i$) are those which are generated from $Y$ by performing $i$ augmentations. The objects in ${\mathcal L}_i$ then partition the space above the $i$th level: for ${\mathcal X} \in {\mathcal L}_i$, let ${\mathcal P}({\mathcal X}) = \{ {\mathcal Z} : {\mathcal X} \preceq {\mathcal Z} \}$. That is, the up-sets ${\mathcal P}({\mathcal X})$ are disjoint and hence partition the space.

Figure 7. Partitioning the search tree and executing in parallel.

From this partition, since the recursive algorithm does not preserve information between recursive calls at the same level, these parts of the search space can be run independently. With appropriate communication protocols, every part ${\mathcal P}({\mathcal X})$ can be run in a different process or even a different computation node. This allows for arbitrary parallelism by generating all objects in ${\mathcal L}_i$ for some $i$ then running the search starting at every ${\mathcal X} \in {\mathcal L}_i$.

In a later post, I will discuss the TreeSearch library, which is my own abstract implementation of a combinatorial search tree. The library handles all of the parallelization of the subtrees, so any implementation using TreeSearch has access to these tools.

Introduction to Canonical Deletion

Small Graphs are Reconstructible

Introduction to Orbital Branching

Uniquely $K_r$-Saturated Graphs: Experiment #1

References

D. Stolee, TreeSearch User Guide. Software available with the SearchLib collection.