# Sebastian Pokutta's Blog

Mathematics and related topics

## indirect proofs: contrapositives vs. proofs by contradiction

Last week I read a rather interesting discussion on contrapositives vs. proofs by contradiction as part of Timothy Gowers’ Cambridge Math Tripos, mathoverflow, and Terry Tao’s blog. At first sight these two concepts, the contrapositive and the reductio ad absurdum (proof by contradiction) might appear to be very similar. Suppose we want to prove $A \Rightarrow B$ for some statements $A$ and $B$. Then this is equivalent to showing $\neg B \Rightarrow \neg A$ (at least in classical logic). The latter is the contrapositive and often it is easier to go with the contrapositive. In the case of the indirect proof we do something similar, however there is a slight difference: we assume that $A \wedge \neg B$ and deduce a contradiction. So what the big deal? The difference seems to be more of a formal character. However this is not true. In the first case we remain in the space of “true statements”, i.e., any deduction from $\neg B$ is a consequence of $A$ that we can use later “outside of the proof”. In the case of the proof by contradiction we move in a “contradictory space” (as $A \wedge \neg B$  is contradictory) and everything that we derive in this space is potentially garbage. Its sole purpose is to derive a contradiction however as we work in a contradictory system we cannot guarantee that the statement derived within the proof are true statement; in fact they are likely not be true as they should result in a final contradiction.

Interestingly a similar phenomenon is known for cutting-plane procedures or cutting-plane proof systems (both terms essentially mean the same thing; it is just a different perspective) . Let me give you an ultra-brief introduction of cutting-plane procedures. Given a polytope $P \subseteq [0,1]^n$ we are often interested in the integral hull of that polytope which is defined to be $P_I = \text{conv}(P \cap \{0,1\}^n)$. A cutting-plane procedure $M$ is now a map that assigns to $P$ a new polytope $M(P)$ such that $P_I \subseteq M(P) \subseteq P$ and $M(P)$ hopefully provides a tighter approximation of $P_I$. So what the cutting-plane procedure does, is to derive new valid inequalities for $P_I$ by examining $P$ and usually the derivation is computationally bounded (otherwise we could just guess the integral hull); the exact technical details are not too important at this point.

Now any well-defined cutting-plane procedure $M$ satisfies $M(P \cap \{cx \le \delta\}) \subseteq M(P) \cap \{cx \le \delta\}$. Or put differently, giving the cutting-plane procedure access to an additional inequality can potentially increase the strength of the procedure as compared to let it work on $P$ and then intersect with the half-space $cx \leq \delta$ afterwards. Now what does this have to do with indirect proofs and contrapositives? The connection arises from the following trivial insight: an inequality $cx \leq \delta$ (with integral coefficients and right-hand side) is valid for $P_I$ if and only if $(P \cap \{cx \geq \delta + 1 \})_I = \emptyset$. In particular a sufficient condition for the validity of $cx \leq \delta$ for $P_I$ is $M(P \cap \{cx \geq \delta + 1 \}) = \emptyset$. The key point is that $M(P \cap \{cx \geq \delta + 1 \})$ can be strictly contained in $M(P) \cap \{cx \geq \delta + 1 \}$. The first one is the indirect proof, whereas the second one is the contrapositive, as we verify the validity of $cx \leq \delta$ by testing if  $M(P) \cap \{cx \geq \delta + 1 \} = \emptyset$. However we do not use the inequality $cx \leq \delta$ in the cutting-plane procedure, i.e., the procedure has no a priori knowledge about what to prove, whereas in the case of indirect proofs, we add the negation of $cx \leq \delta$ and the procedure can use this information.

So how much do can you gain? Suppose we have a graph $G= (V,E)$ and we consider the associated fractional stable set polytope $FSTAB(G) = \{ x \in [0,1]^V \mid x_u + x_v \leq 1 \ \forall\; (u,v) \in E\}$. Typically (there are a few exceptions), for a classical cutting-plane procedure the derivation of clique inequalities is involved and we need $\Omega(\log k)$ applications of the cutting-plane procedure to derive the clique inequalities for a clique $C$ of size $k$, i.e., $\sum_{v \in C} x_v \leq 1$. However an indirect proof of the clique inequalities takes only a single application of the most basic cutting-plane operator: Consider

$FSTAB(G) \cap \{\sum_{v \in C} x_v \geq 2\} =: Q$

for a clique $C$. It is not hard to see that $Q \cap \{x_v = 1\} = \emptyset$ for all $v \in C$. A basic derivation that any sensible cutting-plane operator $M$ supports is to derive that $x_i \leq 0$, i.e., $x_i \leq 0$ is valid for $M(Q)$ whenever $x_i < 1$ is valid for $P$. Therefore we obtain that $M(Q) \subseteq \bigcap_{v \in C} \{ x_v = 0\}$. On the other hand $M(Q) \subseteq \{\sum_{v \in C} x_v \geq 2\}$ and so $M(Q) = \emptyset$ holds and thus the indirect proof derived $\sum_{v \in C} x_v \leq 1$.

So what one can see from this example is that indirect proofs (at least in the context of cutting-plane proof systems) can derive strong valid inequalities in rather few rounds and outperform their direct counterpart drastically (constant number of rounds vs. log(n) rounds). However a priori knowledge of what we want to prove is needed in order to apply the indirect proof paradigm. This makes it hard to exploit the power of indirect proofs in cutting-plane algorithms. After all, you need to know the “derivation” before you did the actual “derivation”. Nonetheless, in some cases we can use indirect proofs by guessing good candidates for strong valid inequalities and then verify their validity using an indirect proof.

Written by Sebastian

October 18, 2011 at 8:01 pm

## Mastermind – five questions do suffice

Today I would like to talk about the Mastermind game and related (recreational?!) math problems – the references that I provide in the following are probably not complete. Most of you might know this game from the 70s and 80s. The first player is making up a secret sequence of colored pebbles (of a total of 6 colors) and the other player has to figure out the sequence by asking questions about the code by proposing potential solutions. The first player then indicates the number of color matches.

Mastermind (source: Wikipedia)

More precisely, Wikipedia says:

The codebreaker tries to guess the pattern, in both order and color, within twelve (or ten, or eight) turns. Each guess is made by placing a row of code pegs on the decoding board. Once placed, the codemaker provides feedback by placing from zero to four key pegs in the small holes of the row with the guess. A colored (often black) key peg is placed for each code peg from the guess which is correct in both color and position. A white peg indicates the existence of a correct color peg placed in the wrong position.

If there are duplicate colours in the guess, they cannot all be awarded a key peg unless they correspond to the same number of duplicate colours in the hidden code. For example, if the hidden code is white-white-black-black and the player guesses white-white-white-black, the codemaker will award two colored pegs for the two correct whites, nothing for the third white as there is not a third white in the code, and a colored peg for the black. No indication is given of the fact that the code also includes a second black.

Once feedback is provided, another guess is made; guesses and feedback continue to alternate until either the codebreaker guesses correctly, or twelve (or ten, or eight) incorrect guesses are made.

In a slightly more formal way, we have a string in $\{1,...,6\}^4$ and the “decoder” wants to reconstruct this string by inferring from the provided feedback. One of the natural questions that arise is of course how many questions do suffice. Knuth [Knuth76] then showed that five questions suffice to be able to always reconstruct the secret string. What is interesting about the proof is that it is a “table” – basically output of a computer program. This lookup table can be used so find a next question at any given point. The table is a greedy optimization in some sense: “Figure 1 [the lookup table] was found by choosing at every stage a test pattern that minimizes the maximum number of remaining possibilities, over all 15 responses by the codemaker”.

Later in 1983, Vasicek Chvátal dedicated a paper on the Mastermind game to Paul Erdős for his 70th birthday. Chvátal looked at generalized admissible Mastermind vectors denoted by $V(n,k)$ of vectors of length n with k different colors. It is not too hard to see that the minimum number of questions $f(n,k)$ needed to correctly identify any string in $V(n,k)$ is bounded from below by

$f(n,k) \geq \frac{n \log k}{\log \binom{n+2}{2}}$

which arises from the fact that there are only $\binom{n+2}{2}$ different answers and $n^k$ different strings have to be distinguished. Complementing this bound, Chvátal showed that the number of questions needed to be asked without waiting for the answer (i.e., the questions are asked in one go, then the answers to all questions are provided at once, and then the code has to be uniquely identified) can be bounded from above as follows: the number of questions needed for this static case will be denoted by $g(n,k)$ and for any $\epsilon > 0$ there exists $n(\epsilon)$ so that for all $n > n(\epsilon)$ and $k < n^{1-\epsilon}$ we have

$g(n,k) \leq (2+ \epsilon) n \frac{1+2 \log k}{\log n - \log k}$

and clearly we have $f(n,k) \leq g(n,k)$. The proof uses the probabilistic method in a nice way. Moreover, Chvátal also provides some upper and lower bounds for special cases. Those of you guys that know about my addiction to the Chvátal-Gomory closure and its friends might have already guessed that this is exactly how I came across the problem…

The latter problem where we do not wait for the answers is usually called the static mastermind problem whereas the classical version is called the dynamic mastermind problem. Later in 2003 and 2004 Goddard (see [Godd03,04]) provided optimal values for the minimal number of questions to be asked both in the dynamic as well as static case and also for the average number (denoted by $r(n,k)$) of questions needed whenever the secret string is uniformly picked at random. With the notation from above we have the following number of questions (tables taken from [Godd03,04]):

For the average number of queries needed ($r(n,k)$) we obtain:

 Positions 2 3 4 5 6 7 Colors 2 – 2 2.250 2.750 3.031 3.500 3.875 3 – 2.333 2.704 3.037 3.358 4 – 2.813 3.219 3.535 5 – 3.240 3.608 3.941 6 – 3.667 3.954 4.340 7 – 4.041 4.297 8 – 4.438 9 – 4.790 10- 5.170

and similarly for the dynamic case we have the following minimum number of queries $f(n,k)$:

 Positions 2 3 4 5 6 7 8 Colors 2 – 3 3 4 4 5 5 6 3 – 4 4 4 4 5 <= 6 4 – 4 4 4 5 <= 6 5 – 5 5 5 <= 6 6 – 5 5 5 7 – 6 6 <= 6 8 – 6 9 – 7 10- 7

and for the static case $g(n,k)$ we have  the following table. Note that in the table below the final “query” that states the recovered string is not counted as in comparison to the ones above. Therefore in order to compare the values with the ones above you need to add “1” to each entry.

 Positions 2 3 4 5 6 7 8 Colors 2 – 2 2 3 3 4 5 5 3 – 2 3 3 4 4 <= 5 4 – 3 4 4 5 5 – 4 4 5 6 – 4 5 6 7 – 5 6 <= 7 8 – 6 7 <= 8 9 – 6 8 10– 7 9

(there seems to be a typo for n = 2 and k = 3 in one of the tables, as the static case has a better performance than the dynamic case which is not possible).

In order to be able to actually check (with a computer) whether a certain number of questions suffices, we have to exclude symmetries in a smart way. Otherwise the space of potential candidates is too large. In this context, in particular the orderly generation framework of [McKay98] is very powerful. The idea behind that framework is to incrementally extend the considered structures in such a way that we only add a canonical candidate per orbit. Moreover, after having extended our structure to the next “size” we need to check whether it is isomorphic to one of the previously explored structures. In this case we do not consider it. For each candidate we check whether the number of distinct answers is equal to the total number of possible secret codes. In this case there is a bijection between the two and therefore we can decode the code. However it is not clear that this bijection needs to have a “nice” structure or that it is “compact” in some sense.

References:

1. [Knuth76]: Knuth, D.E. 1976. “The computer as a master mind.” Journal of Recreational Mathematics. http://colorcode.laebisch.com/links/Donald.E.Knuth.pdf (Accessed June 9, 2011).
2. [Chvátal83]: Chvátal, V. 1983. “Mastermind.” Combinatorica 3: 325-329.
3. [McKay98]: McKay, B.D. 1998. “Isomorph-free exhaustive generation.” Journal of Algorithms 26(2): 306–324.
4. [Good03]: Goddard, W. 2003. “Static Mastermind.” Journal of Combinatorial Mathematics and Combinatorial Computing 47: 225-236
5. [Godd04]: Goddard, W. 2004. “Mastermind Revisited.”  Journal of Combinatorial Mathematics and Combinatorial Computing 51: 215-220

Written by Sebastian

October 13, 2011 at 4:20 pm

## Steve Jobs 1955-2011

Thank you for providing us not just with different tools but with a different way to think.

You inspired all of us.

We will miss you very much!

Written by Sebastian

October 6, 2011 at 8:41 am

Posted in things that make me think

Tagged with ,