# Sebastian Pokutta's Blog

Mathematics and related topics

## indirect proofs: contrapositives vs. proofs by contradiction

Last week I read a rather interesting discussion on contrapositives vs. proofs by contradiction as part of Timothy Gowers’ Cambridge Math Tripos, mathoverflow, and Terry Tao’s blog. At first sight these two concepts, the contrapositive and the reductio ad absurdum (proof by contradiction) might appear to be very similar. Suppose we want to prove $A \Rightarrow B$ for some statements $A$ and $B$. Then this is equivalent to showing $\neg B \Rightarrow \neg A$ (at least in classical logic). The latter is the contrapositive and often it is easier to go with the contrapositive. In the case of the indirect proof we do something similar, however there is a slight difference: we assume that $A \wedge \neg B$ and deduce a contradiction. So what the big deal? The difference seems to be more of a formal character. However this is not true. In the first case we remain in the space of “true statements”, i.e., any deduction from $\neg B$ is a consequence of $A$ that we can use later “outside of the proof”. In the case of the proof by contradiction we move in a “contradictory space” (as $A \wedge \neg B$  is contradictory) and everything that we derive in this space is potentially garbage. Its sole purpose is to derive a contradiction however as we work in a contradictory system we cannot guarantee that the statement derived within the proof are true statement; in fact they are likely not be true as they should result in a final contradiction.

Interestingly a similar phenomenon is known for cutting-plane procedures or cutting-plane proof systems (both terms essentially mean the same thing; it is just a different perspective) . Let me give you an ultra-brief introduction of cutting-plane procedures. Given a polytope $P \subseteq [0,1]^n$ we are often interested in the integral hull of that polytope which is defined to be $P_I = \text{conv}(P \cap \{0,1\}^n)$. A cutting-plane procedure $M$ is now a map that assigns to $P$ a new polytope $M(P)$ such that $P_I \subseteq M(P) \subseteq P$ and $M(P)$ hopefully provides a tighter approximation of $P_I$. So what the cutting-plane procedure does, is to derive new valid inequalities for $P_I$ by examining $P$ and usually the derivation is computationally bounded (otherwise we could just guess the integral hull); the exact technical details are not too important at this point.

Now any well-defined cutting-plane procedure $M$ satisfies $M(P \cap \{cx \le \delta\}) \subseteq M(P) \cap \{cx \le \delta\}$. Or put differently, giving the cutting-plane procedure access to an additional inequality can potentially increase the strength of the procedure as compared to let it work on $P$ and then intersect with the half-space $cx \leq \delta$ afterwards. Now what does this have to do with indirect proofs and contrapositives? The connection arises from the following trivial insight: an inequality $cx \leq \delta$ (with integral coefficients and right-hand side) is valid for $P_I$ if and only if $(P \cap \{cx \geq \delta + 1 \})_I = \emptyset$. In particular a sufficient condition for the validity of $cx \leq \delta$ for $P_I$ is $M(P \cap \{cx \geq \delta + 1 \}) = \emptyset$. The key point is that $M(P \cap \{cx \geq \delta + 1 \})$ can be strictly contained in $M(P) \cap \{cx \geq \delta + 1 \}$. The first one is the indirect proof, whereas the second one is the contrapositive, as we verify the validity of $cx \leq \delta$ by testing if  $M(P) \cap \{cx \geq \delta + 1 \} = \emptyset$. However we do not use the inequality $cx \leq \delta$ in the cutting-plane procedure, i.e., the procedure has no a priori knowledge about what to prove, whereas in the case of indirect proofs, we add the negation of $cx \leq \delta$ and the procedure can use this information.

So how much do can you gain? Suppose we have a graph $G= (V,E)$ and we consider the associated fractional stable set polytope $FSTAB(G) = \{ x \in [0,1]^V \mid x_u + x_v \leq 1 \ \forall\; (u,v) \in E\}$. Typically (there are a few exceptions), for a classical cutting-plane procedure the derivation of clique inequalities is involved and we need $\Omega(\log k)$ applications of the cutting-plane procedure to derive the clique inequalities for a clique $C$ of size $k$, i.e., $\sum_{v \in C} x_v \leq 1$. However an indirect proof of the clique inequalities takes only a single application of the most basic cutting-plane operator: Consider

$FSTAB(G) \cap \{\sum_{v \in C} x_v \geq 2\} =: Q$

for a clique $C$. It is not hard to see that $Q \cap \{x_v = 1\} = \emptyset$ for all $v \in C$. A basic derivation that any sensible cutting-plane operator $M$ supports is to derive that $x_i \leq 0$, i.e., $x_i \leq 0$ is valid for $M(Q)$ whenever $x_i < 1$ is valid for $P$. Therefore we obtain that $M(Q) \subseteq \bigcap_{v \in C} \{ x_v = 0\}$. On the other hand $M(Q) \subseteq \{\sum_{v \in C} x_v \geq 2\}$ and so $M(Q) = \emptyset$ holds and thus the indirect proof derived $\sum_{v \in C} x_v \leq 1$.

So what one can see from this example is that indirect proofs (at least in the context of cutting-plane proof systems) can derive strong valid inequalities in rather few rounds and outperform their direct counterpart drastically (constant number of rounds vs. log(n) rounds). However a priori knowledge of what we want to prove is needed in order to apply the indirect proof paradigm. This makes it hard to exploit the power of indirect proofs in cutting-plane algorithms. After all, you need to know the “derivation” before you did the actual “derivation”. Nonetheless, in some cases we can use indirect proofs by guessing good candidates for strong valid inequalities and then verify their validity using an indirect proof.

Written by Sebastian

October 18, 2011 at 8:01 pm

## Mastermind – five questions do suffice

Today I would like to talk about the Mastermind game and related (recreational?!) math problems – the references that I provide in the following are probably not complete. Most of you might know this game from the 70s and 80s. The first player is making up a secret sequence of colored pebbles (of a total of 6 colors) and the other player has to figure out the sequence by asking questions about the code by proposing potential solutions. The first player then indicates the number of color matches.

Mastermind (source: Wikipedia)

More precisely, Wikipedia says:

The codebreaker tries to guess the pattern, in both order and color, within twelve (or ten, or eight) turns. Each guess is made by placing a row of code pegs on the decoding board. Once placed, the codemaker provides feedback by placing from zero to four key pegs in the small holes of the row with the guess. A colored (often black) key peg is placed for each code peg from the guess which is correct in both color and position. A white peg indicates the existence of a correct color peg placed in the wrong position.

If there are duplicate colours in the guess, they cannot all be awarded a key peg unless they correspond to the same number of duplicate colours in the hidden code. For example, if the hidden code is white-white-black-black and the player guesses white-white-white-black, the codemaker will award two colored pegs for the two correct whites, nothing for the third white as there is not a third white in the code, and a colored peg for the black. No indication is given of the fact that the code also includes a second black.

Once feedback is provided, another guess is made; guesses and feedback continue to alternate until either the codebreaker guesses correctly, or twelve (or ten, or eight) incorrect guesses are made.

In a slightly more formal way, we have a string in $\{1,...,6\}^4$ and the “decoder” wants to reconstruct this string by inferring from the provided feedback. One of the natural questions that arise is of course how many questions do suffice. Knuth [Knuth76] then showed that five questions suffice to be able to always reconstruct the secret string. What is interesting about the proof is that it is a “table” – basically output of a computer program. This lookup table can be used so find a next question at any given point. The table is a greedy optimization in some sense: “Figure 1 [the lookup table] was found by choosing at every stage a test pattern that minimizes the maximum number of remaining possibilities, over all 15 responses by the codemaker”.

Later in 1983, Vasicek Chvátal dedicated a paper on the Mastermind game to Paul Erdős for his 70th birthday. Chvátal looked at generalized admissible Mastermind vectors denoted by $V(n,k)$ of vectors of length n with k different colors. It is not too hard to see that the minimum number of questions $f(n,k)$ needed to correctly identify any string in $V(n,k)$ is bounded from below by

$f(n,k) \geq \frac{n \log k}{\log \binom{n+2}{2}}$

which arises from the fact that there are only $\binom{n+2}{2}$ different answers and $n^k$ different strings have to be distinguished. Complementing this bound, Chvátal showed that the number of questions needed to be asked without waiting for the answer (i.e., the questions are asked in one go, then the answers to all questions are provided at once, and then the code has to be uniquely identified) can be bounded from above as follows: the number of questions needed for this static case will be denoted by $g(n,k)$ and for any $\epsilon > 0$ there exists $n(\epsilon)$ so that for all $n > n(\epsilon)$ and $k < n^{1-\epsilon}$ we have

$g(n,k) \leq (2+ \epsilon) n \frac{1+2 \log k}{\log n - \log k}$

and clearly we have $f(n,k) \leq g(n,k)$. The proof uses the probabilistic method in a nice way. Moreover, Chvátal also provides some upper and lower bounds for special cases. Those of you guys that know about my addiction to the Chvátal-Gomory closure and its friends might have already guessed that this is exactly how I came across the problem…

The latter problem where we do not wait for the answers is usually called the static mastermind problem whereas the classical version is called the dynamic mastermind problem. Later in 2003 and 2004 Goddard (see [Godd03,04]) provided optimal values for the minimal number of questions to be asked both in the dynamic as well as static case and also for the average number (denoted by $r(n,k)$) of questions needed whenever the secret string is uniformly picked at random. With the notation from above we have the following number of questions (tables taken from [Godd03,04]):

For the average number of queries needed ($r(n,k)$) we obtain:

 Positions 2 3 4 5 6 7 Colors 2 – 2 2.250 2.750 3.031 3.500 3.875 3 – 2.333 2.704 3.037 3.358 4 – 2.813 3.219 3.535 5 – 3.240 3.608 3.941 6 – 3.667 3.954 4.340 7 – 4.041 4.297 8 – 4.438 9 – 4.790 10- 5.170

and similarly for the dynamic case we have the following minimum number of queries $f(n,k)$:

 Positions 2 3 4 5 6 7 8 Colors 2 – 3 3 4 4 5 5 6 3 – 4 4 4 4 5 <= 6 4 – 4 4 4 5 <= 6 5 – 5 5 5 <= 6 6 – 5 5 5 7 – 6 6 <= 6 8 – 6 9 – 7 10- 7

and for the static case $g(n,k)$ we have  the following table. Note that in the table below the final “query” that states the recovered string is not counted as in comparison to the ones above. Therefore in order to compare the values with the ones above you need to add “1” to each entry.

 Positions 2 3 4 5 6 7 8 Colors 2 – 2 2 3 3 4 5 5 3 – 2 3 3 4 4 <= 5 4 – 3 4 4 5 5 – 4 4 5 6 – 4 5 6 7 – 5 6 <= 7 8 – 6 7 <= 8 9 – 6 8 10– 7 9

(there seems to be a typo for n = 2 and k = 3 in one of the tables, as the static case has a better performance than the dynamic case which is not possible).

In order to be able to actually check (with a computer) whether a certain number of questions suffices, we have to exclude symmetries in a smart way. Otherwise the space of potential candidates is too large. In this context, in particular the orderly generation framework of [McKay98] is very powerful. The idea behind that framework is to incrementally extend the considered structures in such a way that we only add a canonical candidate per orbit. Moreover, after having extended our structure to the next “size” we need to check whether it is isomorphic to one of the previously explored structures. In this case we do not consider it. For each candidate we check whether the number of distinct answers is equal to the total number of possible secret codes. In this case there is a bijection between the two and therefore we can decode the code. However it is not clear that this bijection needs to have a “nice” structure or that it is “compact” in some sense.

References:

1. [Knuth76]: Knuth, D.E. 1976. “The computer as a master mind.” Journal of Recreational Mathematics. http://colorcode.laebisch.com/links/Donald.E.Knuth.pdf (Accessed June 9, 2011).
2. [Chvátal83]: Chvátal, V. 1983. “Mastermind.” Combinatorica 3: 325-329.
3. [McKay98]: McKay, B.D. 1998. “Isomorph-free exhaustive generation.” Journal of Algorithms 26(2): 306–324.
4. [Good03]: Goddard, W. 2003. “Static Mastermind.” Journal of Combinatorial Mathematics and Combinatorial Computing 47: 225-236
5. [Godd04]: Goddard, W. 2004. “Mastermind Revisited.”  Journal of Combinatorial Mathematics and Combinatorial Computing 51: 215-220

Written by Sebastian

October 13, 2011 at 4:20 pm

## Steve Jobs 1955-2011

Thank you for providing us not just with different tools but with a different way to think.

You inspired all of us.

We will miss you very much!

Written by Sebastian

October 6, 2011 at 8:41 am

Posted in things that make me think

Tagged with ,

## Cambridge Mathematical Tripos

with one comment

Timothy Gowers just started a new series of blog posts for first-year mathematics students. While the blog posts will be centered around Cambridge’s courses I am pretty sure that the discussed topics and hints will be valuable to other students as well. In fact, what I find most impressive is the goal of the series: to teach people how to do mathematics! We all learned what mathematics is and results have been presented to us in a nice, cleaned-up fashion. However only very few of us were taught how to solve/approach problems – most of us learned it the hard way at some point. It is as if you go to a restaurant to get great food: this does not teach you how to cook yourself! In particular it does not teach you that the nice result is a product of quite a mess in the kitchen. When doing math, everybody will reach her or his limit sooner or later (as compared to math in school which was easy for many math students) and it is precisely this point in time, when students start to doubt their own potential. In fact some kind of a bias is bound to take place: every math problem that can be solved is “easy” and every problem that is not solved is a small personal crisis – “am I good enough”? As you did not see the mess in the kitchen, one might think that things come easy in a nice form or not at all. In the end there is no positive feedback available anymore, only negative feedback.

I am very much looking forward to this series and I am sure that Tim has some valuable insights to share!

Written by Sebastian

September 25, 2011 at 9:15 pm

## Does IBM care for CPLEX at all?

I just got an email from IBM asking me to participate in the Academic Initiative Survey. I participated with the aim to address a few shortcomings with respect to IBM’s support for, and interest in their optimization products, e.g., that it is quite a hassle to download cplex as one has to go through an uncountably infinite number of pages before one actually reaches the download page – if one reaches it at all. Also there were a few other things that I wanted to address.

But guess what. One of the first question was to what academic field one belongs to. Operations Research? Mathematics? nada. That was already a bad omen. And in fact. There were only two references to optimization at all “Linear Programming” and “Integer Programming” in the courses that I teach / want to teach (out of a gazzillion listed including a lot of voodoo stuff). Effectively, optimization and the optimization products were virtually not present at all. Cplex, OPL, OPL Studio and none of the other optimization tools were even mentioned.

This apparent lack of interest raises serious questions about IBM’s future plans for cplex and their optimization products. In particular, questions about continuity and support. Who knows… 10 years ago I would have been really scared as cplex was the strongest industrial strength solver and therefore choice number one in many applications – however times have changed and fortunately there are alternatives now.

Written by Sebastian

March 11, 2011 at 5:02 pm

## Long time no see

with one comment

It has been quite a while that I wrote my last blog post; the last one that really counts (at least to me) was back in February. As pointed out at some point it was not that I was lacking something to write about but more that I did not want to “touch” certain topics. That in turn made me wonder what a blog is good for when, in fact, one is still concerned about whether to write about certain topics. So I got the feeling that in the end, all this web 2.0 authenticity, all this being really open, direct, authentic, etc. is nothing else but a (self-) deception. On the other hand, I also did not feel like writing about yet another conference. I have to admit that I have been to some really crappy conferences lately and since I did not have anything positive to say I preferred to not say anything at all. There were a few notable exceptions, e.g., the MIP or IPCO. Another thing that bothered me (and still does) is the dilution of real information with non-sense. In fact I have the feeling that the signal-to-noise ratio considerably dropped over the last two years and I didn’t want to add to this further. This feeling of over-stimulation with web 2.0 noise seems to be a global trend (at least this is my perception). Many people gave up their blogs or have been somewhat neglecting them. Also maintaining a blog with say weekly posts (apart from passing on a few links or announcements) takes up a lot of time. Time that arguably could be better invested into doing research and writing papers.

Despite those issues or concerns I do believe that the web with all its possibilities can really enhance the way we do science. As with all new technologies one has to find a modus operandi that provides positive utility. In principle the web can provide an information democracy/diversification, however any “democratic endeavor” on the web has a huge enemy. The Matthew effect (or commonly known as “more gains more”). This term, coined by R.K. Merton, derives its name from the following biblical Gospel of Matthew (see also wikipedia):

For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away. — Matthew 25:29, New Revised Standard Version

In principle it states that the “rich get richer” while the “poor get poorer”. If we think of the different social networks (facebook, myspace, friendster) it refers to the effect that the one that has the largest user basis is going to attract more people than the one with a smaller one. In the next “round” this effect is then even more pronounced until the smaller competitor virtually ceases to exist. In the real-world this effect is often limited due to various kinds of “friction”. There might be geographic limitations, cultural barriers etc., that do wash out the advantage of the larger one so that the compounding nature of the effect is slowed down or non-existent (this hold even true in the highly globalized world we live in). That is the reason why dry cleaners, bakeries, and other forms of local business are not outperformed by globalized companies (ok, some are). In the context of the internet however there is often no inhibitor to the Matthew effect. It often translates into some type of preferential attachment although with the difference that the overall user basis is limited so that the gain of one party is the loss of another (preferential attachment processes are usually not zero-sum).

So what does this mean in the context of blogs? Blog reading is to a certain extent zero-sum. There is some limited amount of time that we are willing to spend reading blogs. Those with a large user basis will have more active discussions and move higher in the priority list for reading. In the end the smaller ones might only have a handful of readers making it hard to justify the amount of time spent writing the posts. Downscaling the frequency of posts might even pronounce the effect as it might be perceived as inactivity. One way out of this dilemma could be any form of joining the smaller units to larger ones, i.e., either “digesting” several blogs to a larger one or alternatively “shared blogging”. I haven’t made up my mind yet what (if!) I am going to do about this. But I guess, in the end some type of consolidation is inevitable.

Having bothered you with this abstruse mixture of folklore, economics, and internet, I actually intended to write about something else but somewhat related today: About deciding whether and when to dump a project. This problem is very much inspired by my previous experiences as a consultant and recent decisions about academic projects. More precisely, suppose that you have a project and you have an estimate for the overall time of the project. At some point you want to review the progress and based on what you see at this point you want to make a call whether or not you will abandon the project. The longer you wait with your review the better your information is that you gain from the review. On the other hand you potentially wasted too much time and resources to increase the confidence in your decision. In fact it might make even sense you not start a project at all. Suppose that you have an a priori estimate for the probability of success of your project, say p. Further let r(t) denote our function of erring, i.e., r(0) = 1/2 and r(1) = 0 which means that at time t= 0 we do not have any information yet and so we can only guess leading to guessing wrong with probability 50% and at time t = 1 we have perfect information. Let t denote the point in time at which we review the project (as a fraction of the overall time, here assumed to be 1). We have four cases to consider (one might opt for a different payoff function; the following one resembles my particular choice):

1. The project is going to be successful and at the point of reviewing we guessed right, i.e., we went through with it. In this case the benefit is s. This happens with probability (1-r(t)) p and expected payoff for this scenario is: (1-r(t)) p s. [alternatively one could consider the benefit s – t; or something else]
2. The project is going to be successful and at the point of reviewing we guessed wrong, i.e., we dropped the project. In this case the benefit is – (t + s), i.e., we lose our investment up to that point (here with unit value) and the overall benefit. Probability is r(t) p and expected payoff – r(t) p (t+s).
3. The project is going to fail and we guessed right: Benefit -t, i.e., the investment so far. Expected payoff – (1-r(t)) (1-p) t.
4. The project is going to fail and we guessed wrong, i.e., we went through with it: Benefit -T, were T is some cost for this scenario. Expected payoff – r(t) (1-p) T.

All in all we have the following expected overall payoff as a function of t:

$\mathbb E(t) = -[(1-r(t))p (-s) + (1-r(t))(1-p) t + r(t)p(t+s) + r(t)(1-p) T]$

Next we have to define our function which models our increase in confidence. I opted for a function that gains information in a logarithm fashion, i.e., in the beginning we gain confidence fast and then we have a tailing-off effect:

$r_k(t) := \frac{1}{2} \frac{(1 + \log(k)}{(-\log(k) + \log(1 + k)))} - \frac{\log(k + t)}{(2 (-\log(k) + \log(1 + k)))}$

The parameter k can be understood as the rate of learning. For example for k = 0.01 it looks like this:

Assuming s = 1 and T = 1, i.e., the payoffs are basically the invested time and p = 30%, the expected payoff as function of the time of review t looks like this (payoff: blue line, error rate: red line):

The maximum payoff is reached for a review after roughly 20% of the estimated overall time. However it is still negative. This suggests that we do not learn fast enough to perform a well-informed decision. For example for k = 0.001, the situation looks different:

The optimal point for a review is after 14% of the estimated project time. Having once estimated your rate of learning, one can also determine which projects one should not get involved with at all. For k = 0.001 this is the case when the probability of success p is less than roughly 27%.

Although this model is somewhat very simple it provides some nice qualitative (and partly quantitative) insights. For example that there are indeed projects that you should not even get involved with; this is somewhat clear from intuition but I was surprised that the probability of success of those is still quite high. Also, when over time your rate of learning increases (due to experience with other projects) you can get involved with more risky endeavors because your higher review confidence allows you to purge more effectively. For example when k goes down to, say, k = 0.00001 (which is not unrealistic as in this case shortly after the beginning of the project you would only err with around 20%) you could get involved with projects that only have a probability of success of 19%.

And no complaints about the abrupt ending – I consumed my allocated blogging time.

Written by Sebastian

September 6, 2010 at 5:28 am

## GLPK 4.44 released

with one comment

It has been a while since my last post… sorry ’bout that. I am already feeling like just announcing new glpk releases etc. It is not that there are not enough issues that deserve attention but it seems to me like each one of them has such a political or controversial component that I am not sure that I want to touch them (e.g., editors forcing citations of their journal, ethical misconduct to sustain growth). Some of them have been lurking in my drafts folder for a while now and still haven’t made up my mind. I hope I will find some more time soon to write more elaborate posts more frequently.

Anyways, a new version of glpk has been released. The new version now allows for explicit querying of dual values within the GMPL language.

GLPK 4.44 Release Information
*****************************

Release date: Jun 03, 2010

GLPK (GNU Linear Programming Kit) is intended for solving large-scale
linear programming (LP), mixed integer linear programming (MIP), and
other related problems. It is a set of routines written in ANSI C and
organized as a callable library.

The following suffixes for variables and constraints were
implemented in the MathProg language:

.lb     (lower bound),
.ub     (upper bound),
.status (status in the solution),
.val    (primal value), and
.dual   (dual value).

Thanks to Xypron <xypron.glpk@gmx.de> for draft implementation
and testing.

Now the MathProg language allows comment records (marked by
‘#’ in the very first position) in CSV data files read with the
table statements. Note that the comment records may appear only
in the beginning of a CSV data file.

The API routine glp_cpp to solve the Critical Path Problem was

See GLPK web page at <http://www.gnu.org/software/glpk/glpk.html>.

GLPK distribution can be ftp’ed from <ftp://ftp.gnu.org/gnu/glpk/> or
from some mirror ftp sites; see <http://www.gnu.org/order/ftp.html>.

MD5 check-sum is the following:

f2ac7013bc0420d730d052e7ba24bdb1 *glpk-4.44.tar.gz

GLPK is also available as a Debian GNU/Linux package. See its web page
at <http://packages.debian.org/etch/glpk>.

Precompiled GLPK binaries (lib, dll, exe) for 32- and 64-bit MS Windows
can be found at <http://winglpk.sourceforge.net/>. Thanks to Xypron
<xypron.glpk@gmx.de>.

For MS Windows users there is also available GLPK Lab, a set of free
software tools and libraries based on the GLPK package. Its web page
can be found at <http://glpklabw.sourceforge.net/>. Thanks to Xypron
<xypron.glpk@gmx.de> and Luiz Bettoni <bettoni@cpgei.ct.utfpr.edu.br>
for development.

Written by Sebastian

June 4, 2010 at 11:13 am

Posted in Software

Tagged with