On Reproducibility

by Derrick Stolee

Recently, Stephen Hartke spoke about our work on uniquely K_r-saturated graphs, and afterwards he spoke to a very famous mathematician, who said, roughly:

“Why did you talk about the computation? You should just talk about the results and the proofs and hide the fact that you used computation.”

While this cannot be an exact quote, I believe it is a common attitude among mathematicians. While the focus is on solving problems, the use of computation is seen as a negative, so the final research papers make little mention of their computational methods. This is a huge problem in my opinion. While computations can report exact results, and sometimes even prove results, every execution is an experiment. It is unknown before the execution what the results will be, or how long it will take.

Computational combinatorics is a combination of mathematics, engineering, and science: We prove things, we build things, and we experiment. Since computer proofs are experiments, it is important that they be reproducible. Today, I want to discuss a bit about how we can improve our presentation of algorithms and computation in order to make results more reproducible.

Reproducibility and Repeatability

There is a subtle, but important difference between reproducibility and repeatability. An experiment is repeatable if it can be executed a second time. This includes someone downloading source code, compiling it, and running the same commands as the authors. This is the absolute minimum I require of papers using computation. To make an experiment reproducible, the paper should include enough detail about the algorithm or method such that someone else could create software that performs similar operations, with similar computation time, as the original code. This ability to create similar software as an independent project is important for several reasons:

  • An independent implementation could reveal different results (and hence an error in the original software).
  • Using different tools, languages, or basic algorithms could provide optimizations that make more cases tractable.
  • An independent implementation can be used to test performance directly against a new method.

Why Reproducibility?

The most important reason to deeply discuss your computational methods is to convey the limits of your experiments. That is, why did you not continue to the next case? What it a questions of resources? Computation time or memory? Did you parallelize? Could you parallelize? Is your method the problem, or is there a mathematical barrier that seems unsurmountable?

An example of a terrible habit I see many authors use is a sentence such as the following:

“We used integer programming to find examples.”

I’m sure you didn’t create your own integer programming solver, so it would be out-of-place to discuss branch-and-bound or other algorithms here. However, the minimum should be a description of the integer programming formulation. What solvers did you use? Did you use any special settings? How long did each test take? Why did you stop there? Did you try anything else? These questions are important to answer if anyone else can hope to attempt extending your work. If you answer these questions, then another researcher can make an educated decision on whether your results can be extended by other methods, or more resources, or other formulations, etc.

Reproducibility Done Right

The computational proof of the Four Color Theorem is an excellent example of reproducibility. In the Robertson, Sanders, Seymour, and Thomas proof, they reproduced the method of Appel and Haken with a completely independent implementation (and some simplifications to the method). Further, there were multiple implementations of the computer proofs. They even included a discussion about the choices they made in making the discharging argument as simple as possible. Someday, I will discuss their proof in more detail.

Reproducing Experiments

It would be good to have a place for people to reproduce computational experiments, or even slightly improve on them. There are not enough reports of people reproducing computer proofs and algorithms, and there is not enough incentive for people to do this! I am guilty of this, too, as pressures to create new, original mathematics trumps any motivation to double-check interesting computer experiments.

What can be done to help? For starters, we should be writing our papers to improve the ability to reproduce our experiments. As for creating a venue for reproduced (and slightly extended) experiments, I’m not sure what to do. Apparently, this is a problem even for experiment-driven sciences such as biology and chemistry. How can we possibly change it for a field as “pure” as mathematics?

Further Reading

K. Appel and W. Haken, “Every planar map is four colorable. Part I: Discharging”, Illinois J. Math. 21(3) (1977) pp 429-490.

A. Casadevall and F. C. Fang, Reproducible Science. Infection and Immunity, 78(12) (2010) 4972-4975.

N. Roberston, D. Sanders, P. Seymour, R. Thomas, The Four-Colour Theorem, J. Comb. Th. Ser. B. 70(1) (1997) pp 2-44.

M. Shuttleworth, Reproducibility Explorable.com (2009).

Wikipedia, Reproducibility.