Sebastian Pokutta's Blog

Mathematics and related topics

Archive for the ‘Mathematics and Optimization’ Category

On the polyhedral inapproximability of the SDP cone

with 2 comments

I was at HPOPT in Delft last week which was absolutely great and I would like to thank Mirijam Dür and Frank Vallentin for the organization and providing us with such an inspirational atmosphere. I talked On the complexity of polytopes and extended formulations which was based on two recent papers (see [1] and [2] below) which are joint work with Gábor BraunSamuel Fiorini, Serge MassarDavid SteurerHans Raj Tiwary and Ronald de Wolf. One of the question that came up repeatedly was the statement for the inapproximability of the SDP cone via Linear Programming and its consequences. This is what this brief post will be about; I simplify and purposefully imprecise a few things for the sake of exposition.

For a closed convex body K containing 0 there is a natural notion of appoximating K within a factor of (1+ \epsilon) with \epsilon > 0. We can simplify define (1+\epsilon) K = \{ (1+\epsilon) x \mid x \in K\}. This notion is compatible with the notion of approximating the maximum value of a (say, nonnegative) linear objective from the outside over K within a factor of 1+\epsilon. For this observe that

\frac{\max_{x \in (1+\epsilon) K} cx}{\max_{x \in K} cx} = \frac{\max_{x \in K} (1+\epsilon) cx}{\max_{x \in K} cx} \leq (1+\epsilon)

What one can show now is that there exists a pair polyhedra P \subseteq Q where P = COR(n) = conv\{bb^T \mid b \in \{0,1\}^n\} is the correlation polytope (also called boolean quadric polytope) and Q is given by the valid inequalities <2 diag(a) - aa^T,x> \leq 1 for all a \in \{0,1\}^n. This pair of polyhedra has the property that for any other polyhedron T so that P \subseteq T \subseteq (1+\epsilon) Q it follows that one needs a super-polynomial number of inequalities in any extended formulation of T and \epsilon can be here as large as O(n^\beta) with \beta < 1/2. This follows from a modification of Razborov’s well-known rectangle corruption lemma.

On the other hand, there exists a spectrahedron S \in \mathbb R^{n \times n} sandwiched between P and Q, i.e., P \subseteq S \subseteq Q where S is the projection of an affine space intersected with the cone of n+1 \times n+1 psd matrices. In particular P \subseteq S \subseteq (1+\epsilon)S \subseteq (1+\epsilon)Q. Now, if K is a polyhedron so that S \subseteq K \subseteq (1+\epsilon) S it follows that K needs a super polynomial number of inequalities in any extended formulation for any \epsilon = O(n^\beta) with \beta < 1/2. In consequence, there exists a spectrahedron S with small SDP extension complexity that cannot be well approximated by any polyhedron. If now the SDP cone could be approximated well (in sense similar to the one for the SOCP cone, see [3]), so could this particular spectrahedron. This is inapproximability is in contrast to the case of SOCP cone which can be approximated arbitrarily well with a polynomial number of inequalities (see [3] for a precise statement). (Note that we can add the nonnegativity constraints to obtain the same statements for polytopes in order to resolve unboundedness issues.)

Concerning the consequences of this statement. First of all it suggests that the expressive power of SDPs can exponentially stronger than the power of linear programs, in the sense that while you need an exponential number of inequalities, a polynomial size SDP formulation can exist. Moreover, it might suggest that for some optimization problems, the optimal approximation ratio might not be attainable via linear programming but with semidefinite programming. What should be also mentioned is that in practice many SDPs can be typically solved rather well with first-order methods (there were several amazing talks at HPOPT discussing recent developments).

References:

  1. Linear vs. Semidefinite Extended Formulations: Exponential Separation and Strong Lower Bounds
    Samuel Fiorini, Serge Massar, Sebastian Pokutta, Hans Raj Tiwary, Ronald de Wolf
    Proceedings of STOC (2012)
  2. Approximation Limits of Linear Programs (Beyond Hierarchies)
    Gábor Braun, Samuel Fiorini, Sebastian Pokutta, David Steurer
    To appear in Proceedings of FOCS (2012)
  3. On polyhedral approximations of the second-order cone
    Aharon Ben-Tal, Arkadi Nemirovski
    Mathematics of Operations Research, vol. 26 no. 2, 193-205 (2001)

Written by Sebastian

June 22, 2012 at 7:58 am

The Scottish Café revisited

leave a comment »

This is a try to bring together mathematicians and alike in a virtual spot to discuss mathematics and research.

The Scottish Café (PolishKawiarnia Szkocka) was the café in Lwów (now Lviv) where, in the 1930s and 1940s, mathematicians from the Lwów School collaboratively discussedresearch problems, particularly in functional analysis and topology.

Stanislaw Ulam recounts that the tables of the café had marble tops, so they could write in pencil, directly on the table, during their discussions. To keep the results from being lost, and after becoming annoyed with their writing directly on the table tops, Stefan Banach‘s wife provided the mathematicians with a large notebook, which was used for writing the problems and answers and eventually became known as the Scottish Book. The book—a collection of solved, unsolved, and even probably unsolvable problems—could be borrowed by any of the guests of the café. Solving any of the problems was rewarded with prizes, with the most difficult and challenging problems having expensive prizes (during the Great Depression and on the eve of World War II), such as a bottle of fine brandy.[1]

For problem 153, which was later recognized as being closely related to Stefan Banach’s “basis problem“, Stanislaw Mazur offered the prize of a live goose. This problem was solved only in 1972 by Per Enflo, who was presented with the live goose in a ceremony that was broadcast throughout Poland.[2]

I have been thinking about “the scottish café” for a while and i was wondering whether it is possible to have “something” like this in a virtual, decentralized fashion. Last night I accidentally participated in a “google hangout” and I thought that this might be a good platform for this.

At this point I do not have any idea what a good setup and rules etc. is. But you are most welcome to join and provide your ideas and feedback.

A proposed initial setup could be either a classic seminar talk + discussion or free discussion about a pre-specified topic. The first one might be easier in terms of getting things started whereas the second one is more in the original spirit of the café. While I personally would prefer the second one in the mid-term it might be better to get things started with the first one.

Technological platform should be google+ with its post/hangout features.
The Scottish Café will be maintained by Lars Schewe and myself at the moment. However if you are interested in helping us to get things started, please drop us a line.

Here are the links:

Written by Sebastian

February 3, 2012 at 12:45 pm

Informs Annual Meeting 2011

leave a comment »

I plan to update this post in the course of the following days and I would have also loved to send out a few more tweets, especially now that I have this fancy twitter ribbon, but I have hard time connecting to the internet at the convention center.

Day 1: So the first day of the Informs Annual Meeting is almost over… What were my personal highlights from today? I listened to quite a lot of talks and most of them were really good so that I feel that it would not be fair to mention only a few selected ones. One personal highlight that I would like to mention though is the Student Paper Prize that Dan Dadush from Georgia Tech got for his work together with Santanu Dey and Juan Pablo Vielma on the Chvátal-Gomory closure for compact convex sets. This result is really amazing as it provides deep insights into the way of how cutting-plane procedures work when applied to non-linear relaxations. Congratulations Dan! Below is a picture of the three taken at the session.

20111113-160856.jpg

Day 2: The second day of the Informs meeting was great. In particular I loved the integer optimization session (for apparent reasons) and had some extremely intense and deep discussions with some of my colleagues – for me this is the most important reason to go to a conference: dialog. One such discussion was with David Goldberg from GATech about the fundamental concepts in optimization/math/reasoning that completely redefine your thinking once fully understood/appreciated (this includes concepts such as “equality”, “proof by contradiction”, “induction”, or “diagonalization”). I will have a separate blog post together with David on that. If you think there is a concept that completely changed your thinking, please email us or drop us a line in the comment section. I am off to my talk now – more later…

Day 3: I actually did not see too much of the third day as I had to leave early.

All in all, Informs 2011 was a great event, in particular to meet all the people that are usually spread all over the US. I was actually quite happy to see that there were also quite a lot of Europeans at the conference

Written by Sebastian

November 13, 2011 at 9:09 pm

indirect proofs: contrapositives vs. proofs by contradiction

leave a comment »

Last week I read a rather interesting discussion on contrapositives vs. proofs by contradiction as part of Timothy Gowers’ Cambridge Math Tripos, mathoverflow, and Terry Tao’s blog. At first sight these two concepts, the contrapositive and the reductio ad absurdum (proof by contradiction) might appear to be very similar. Suppose we want to prove A \Rightarrow B for some statements A and B. Then this is equivalent to showing \neg B \Rightarrow \neg A (at least in classical logic). The latter is the contrapositive and often it is easier to go with the contrapositive. In the case of the indirect proof we do something similar, however there is a slight difference: we assume that A \wedge \neg B and deduce a contradiction. So what the big deal? The difference seems to be more of a formal character. However this is not true. In the first case we remain in the space of “true statements”, i.e., any deduction from \neg B is a consequence of A that we can use later “outside of the proof”. In the case of the proof by contradiction we move in a “contradictory space” (as A \wedge \neg B  is contradictory) and everything that we derive in this space is potentially garbage. Its sole purpose is to derive a contradiction however as we work in a contradictory system we cannot guarantee that the statement derived within the proof are true statement; in fact they are likely not be true as they should result in a final contradiction.

Interestingly a similar phenomenon is known for cutting-plane procedures or cutting-plane proof systems (both terms essentially mean the same thing; it is just a different perspective) . Let me give you an ultra-brief introduction of cutting-plane procedures. Given a polytope P \subseteq [0,1]^n we are often interested in the integral hull of that polytope which is defined to be P_I = \text{conv}(P \cap \{0,1\}^n). A cutting-plane procedure M is now a map that assigns to P a new polytope M(P) such that P_I \subseteq M(P) \subseteq P and M(P) hopefully provides a tighter approximation of P_I. So what the cutting-plane procedure does, is to derive new valid inequalities for P_I by examining P and usually the derivation is computationally bounded (otherwise we could just guess the integral hull); the exact technical details are not too important at this point.

Now any well-defined cutting-plane procedure M satisfies M(P \cap \{cx \le \delta\}) \subseteq M(P) \cap \{cx \le \delta\}. Or put differently, giving the cutting-plane procedure access to an additional inequality can potentially increase the strength of the procedure as compared to let it work on P and then intersect with the half-space cx \leq \delta afterwards. Now what does this have to do with indirect proofs and contrapositives? The connection arises from the following trivial insight: an inequality cx \leq \delta (with integral coefficients and right-hand side) is valid for P_I if and only if (P \cap \{cx \geq \delta + 1 \})_I = \emptyset. In particular a sufficient condition for the validity of cx \leq \delta for P_I is M(P \cap \{cx \geq \delta + 1 \}) = \emptyset. The key point is that M(P \cap \{cx \geq \delta + 1 \}) can be strictly contained in M(P) \cap \{cx \geq \delta + 1 \}. The first one is the indirect proof, whereas the second one is the contrapositive, as we verify the validity of cx \leq \delta by testing if  M(P) \cap \{cx \geq \delta + 1 \} = \emptyset. However we do not use the inequality cx \leq \delta in the cutting-plane procedure, i.e., the procedure has no a priori knowledge about what to prove, whereas in the case of indirect proofs, we add the negation of cx \leq \delta and the procedure can use this information.

So how much do can you gain? Suppose we have a graph G= (V,E) and we consider the associated fractional stable set polytope FSTAB(G) = \{ x \in [0,1]^V \mid x_u + x_v \leq 1 \ \forall\; (u,v) \in E\}. Typically (there are a few exceptions), for a classical cutting-plane procedure the derivation of clique inequalities is involved and we need \Omega(\log k) applications of the cutting-plane procedure to derive the clique inequalities for a clique C of size k, i.e., \sum_{v \in C} x_v \leq 1. However an indirect proof of the clique inequalities takes only a single application of the most basic cutting-plane operator: Consider

FSTAB(G) \cap \{\sum_{v \in C} x_v \geq 2\} =: Q

for a clique C. It is not hard to see that Q \cap \{x_v = 1\} = \emptyset for all v \in C. A basic derivation that any sensible cutting-plane operator M supports is to derive that x_i \leq 0, i.e., x_i \leq 0 is valid for M(Q) whenever x_i < 1 is valid for P. Therefore we obtain that M(Q) \subseteq \bigcap_{v \in C} \{ x_v = 0\}. On the other hand M(Q) \subseteq \{\sum_{v \in C} x_v \geq 2\} and so M(Q) = \emptyset holds and thus the indirect proof derived \sum_{v \in C} x_v \leq 1.

So what one can see from this example is that indirect proofs (at least in the context of cutting-plane proof systems) can derive strong valid inequalities in rather few rounds and outperform their direct counterpart drastically (constant number of rounds vs. log(n) rounds). However a priori knowledge of what we want to prove is needed in order to apply the indirect proof paradigm. This makes it hard to exploit the power of indirect proofs in cutting-plane algorithms. After all, you need to know the “derivation” before you did the actual “derivation”. Nonetheless, in some cases we can use indirect proofs by guessing good candidates for strong valid inequalities and then verify their validity using an indirect proof.

Check out the links for further reading:

  1. http://gowers.wordpress.com/2011/10/05/basic-logic-relationships-between-statements-converses-and-contrapositives/
  2. http://mathoverflow.net/questions/12342/reductio-ad-absurdum-or-the-contrapositive
  3. http://terrytao.wordpress.com/2009/11/05/the-no-self-defeating-object-argument/

Written by Sebastian

October 18, 2011 at 8:01 pm

Long time no see

with one comment

It has been quite a while that I wrote my last blog post; the last one that really counts (at least to me) was back in February. As pointed out at some point it was not that I was lacking something to write about but more that I did not want to “touch” certain topics. That in turn made me wonder what a blog is good for when, in fact, one is still concerned about whether to write about certain topics. So I got the feeling that in the end, all this web 2.0 authenticity, all this being really open, direct, authentic, etc. is nothing else but a (self-) deception. On the other hand, I also did not feel like writing about yet another conference. I have to admit that I have been to some really crappy conferences lately and since I did not have anything positive to say I preferred to not say anything at all. There were a few notable exceptions, e.g., the MIP or IPCO. Another thing that bothered me (and still does) is the dilution of real information with non-sense. In fact I have the feeling that the signal-to-noise ratio considerably dropped over the last two years and I didn’t want to add to this further. This feeling of over-stimulation with web 2.0 noise seems to be a global trend (at least this is my perception). Many people gave up their blogs or have been somewhat neglecting them. Also maintaining a blog with say weekly posts (apart from passing on a few links or announcements) takes up a lot of time. Time that arguably could be better invested into doing research and writing papers.

Despite those issues or concerns I do believe that the web with all its possibilities can really enhance the way we do science. As with all new technologies one has to find a modus operandi that provides positive utility. In principle the web can provide an information democracy/diversification, however any “democratic endeavor” on the web has a huge enemy. The Matthew effect (or commonly known as “more gains more”). This term, coined by R.K. Merton, derives its name from the following biblical Gospel of Matthew (see also wikipedia):

For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away. — Matthew 25:29, New Revised Standard Version

In principle it states that the “rich get richer” while the “poor get poorer”. If we think of the different social networks (facebook, myspace, friendster) it refers to the effect that the one that has the largest user basis is going to attract more people than the one with a smaller one. In the next “round” this effect is then even more pronounced until the smaller competitor virtually ceases to exist. In the real-world this effect is often limited due to various kinds of “friction”. There might be geographic limitations, cultural barriers etc., that do wash out the advantage of the larger one so that the compounding nature of the effect is slowed down or non-existent (this hold even true in the highly globalized world we live in). That is the reason why dry cleaners, bakeries, and other forms of local business are not outperformed by globalized companies (ok, some are). In the context of the internet however there is often no inhibitor to the Matthew effect. It often translates into some type of preferential attachment although with the difference that the overall user basis is limited so that the gain of one party is the loss of another (preferential attachment processes are usually not zero-sum).

So what does this mean in the context of blogs? Blog reading is to a certain extent zero-sum. There is some limited amount of time that we are willing to spend reading blogs. Those with a large user basis will have more active discussions and move higher in the priority list for reading. In the end the smaller ones might only have a handful of readers making it hard to justify the amount of time spent writing the posts. Downscaling the frequency of posts might even pronounce the effect as it might be perceived as inactivity. One way out of this dilemma could be any form of joining the smaller units to larger ones, i.e., either “digesting” several blogs to a larger one or alternatively “shared blogging”. I haven’t made up my mind yet what (if!) I am going to do about this. But I guess, in the end some type of consolidation is inevitable.

Having bothered you with this abstruse mixture of folklore, economics, and internet, I actually intended to write about something else but somewhat related today: About deciding whether and when to dump a project. This problem is very much inspired by my previous experiences as a consultant and recent decisions about academic projects. More precisely, suppose that you have a project and you have an estimate for the overall time of the project. At some point you want to review the progress and based on what you see at this point you want to make a call whether or not you will abandon the project. The longer you wait with your review the better your information is that you gain from the review. On the other hand you potentially wasted too much time and resources to increase the confidence in your decision. In fact it might make even sense you not start a project at all. Suppose that you have an a priori estimate for the probability of success of your project, say p. Further let r(t) denote our function of erring, i.e., r(0) = 1/2 and r(1) = 0 which means that at time t= 0 we do not have any information yet and so we can only guess leading to guessing wrong with probability 50% and at time t = 1 we have perfect information. Let t denote the point in time at which we review the project (as a fraction of the overall time, here assumed to be 1). We have four cases to consider (one might opt for a different payoff function; the following one resembles my particular choice):

  1. The project is going to be successful and at the point of reviewing we guessed right, i.e., we went through with it. In this case the benefit is s. This happens with probability (1-r(t)) p and expected payoff for this scenario is: (1-r(t)) p s. [alternatively one could consider the benefit s – t; or something else]
  2. The project is going to be successful and at the point of reviewing we guessed wrong, i.e., we dropped the project. In this case the benefit is – (t + s), i.e., we lose our investment up to that point (here with unit value) and the overall benefit. Probability is r(t) p and expected payoff – r(t) p (t+s).
  3. The project is going to fail and we guessed right: Benefit -t, i.e., the investment so far. Expected payoff – (1-r(t)) (1-p) t.
  4. The project is going to fail and we guessed wrong, i.e., we went through with it: Benefit -T, were T is some cost for this scenario. Expected payoff – r(t) (1-p) T.

All in all we have the following expected overall payoff as a function of t:

\mathbb E(t) = -[(1-r(t))p (-s) + (1-r(t))(1-p) t + r(t)p(t+s) + r(t)(1-p) T]

Next we have to define our function which models our increase in confidence. I opted for a function that gains information in a logarithm fashion, i.e., in the beginning we gain confidence fast and then we have a tailing-off effect:

r_k(t) := \frac{1}{2} \frac{(1 + \log(k)}{(-\log(k) + \log(1 + k)))} - \frac{\log(k + t)}{(2 (-\log(k) + \log(1 + k)))}

The parameter k can be understood as the rate of learning. For example for k = 0.01 it looks like this:


Assuming s = 1 and T = 1, i.e., the payoffs are basically the invested time and p = 30%, the expected payoff as function of the time of review t looks like this (payoff: blue line, error rate: red line):

The maximum payoff is reached for a review after roughly 20% of the estimated overall time. However it is still negative. This suggests that we do not learn fast enough to perform a well-informed decision. For example for k = 0.001, the situation looks different:

The optimal point for a review is after 14% of the estimated project time. Having once estimated your rate of learning, one can also determine which projects one should not get involved with at all. For k = 0.001 this is the case when the probability of success p is less than roughly 27%.

Although this model is somewhat very simple it provides some nice qualitative (and partly quantitative) insights. For example that there are indeed projects that you should not even get involved with; this is somewhat clear from intuition but I was surprised that the probability of success of those is still quite high. Also, when over time your rate of learning increases (due to experience with other projects) you can get involved with more risky endeavors because your higher review confidence allows you to purge more effectively. For example when k goes down to, say, k = 0.00001 (which is not unrealistic as in this case shortly after the beginning of the project you would only err with around 20%) you could get involved with projects that only have a probability of success of 19%.

And no complaints about the abrupt ending – I consumed my allocated blogging time.

GLPK Lab for Windows released

with 2 comments

Yesterday the first version of GLPK Lab for Windows, a new modeling GUI for GLPK has been released. Maintainers of the package are Luiz Bettoni, Andrew Makhorin, and Heinrich Schuchardt. I have not tried it out yet as I am not a Windows user. So any feedback is welcome. From the release notes:

We are pleased to announce the very first release of GLPK Lab for
Windows.

GLPK Lab is a set of software tools and libraries based on the GLPK
(GNU Linear Programming Kit) package and intended for solving
large-scale linear programming (LP), mixed integer programming (MIP),
and other related problems.

Currently GLPK Lab includes the following main components:

*  GLPK Lab Editor, which is the SciTE editor adapted to be used along
with GLPK Lab (based on the Gusek project)

*  Stand-alone GLPK LP/MIP solver

*  GLPK API library compiled with Microsoft Visual C++ 2008 and
intended for developing GLPK-based applications in C or C++

*  Java binding to GLPK API library (from GLPK-Java) intended for
developing GLPK-based applications in Java

GLPK Lab is free, public-domain software hosted on SourceForge. Its
webpage is here: <http://glpklabw.sourceforge.net/>. (Please note that
GLPK Lab is *not* a GNU package.)

Happy Modeling,

The GLPK Lab Development Team

More information on GLPK including tutorials, examples, and tools can be found here.

Written by Sebastian

April 4, 2010 at 10:13 pm

Posted in Mathematics and Optimization, Software

Tagged with

CPLEX free for academic use

with 6 comments

Probably I am not the first one to report this but CPLEX is now available for free for academics as part of IBM’s academic initiative (see here):

Full-version CPLEX now available free-of-charge to academics

Posted: Feb 18, 2010 08:30:03 PM

Effective February 15, 2010, IBM is offering no-charge full-version ILOG Optimization products via IBM’s Academic Initiative program (http://www.ibm.com/academicinitiative). This move builds on the availability of limited Teaching Editions available since August 2009, and now provides registered members with renewable one-year access to IBM ILOG OPL-CPLEX, CPLEX, CP Optimizer and CP for both teaching and research use. Registered members can access ILOG Optimization software at: https://www.ibm.com/developerworks/university/software/get_software.html, where they can search for ILOG Optimization or individual solvers by name. Depending on the search criteria, electronic assemblies will be retrieved for IBM ILOG OPL Development Studio, CPLEX, CP Optimizer or CP on all commercially available platforms. To run the software, members will also need to download a key from: https://www.ibm.com/developerworks/university/support/ilog.html, but are no longer required to install ILM. Note that as part of Academic Initiative, IBM also makes its professionally-developed training materials available for use in classrooms and labs at: https://www14.software.ibm.com/webapp/devtool/scholar/web/coursewarePickPage.do?source=ai-course-websphere.

The application is pretty forward and the verification of eligibility is supposed to take “3-5 business days”. But it seems that the accounts are approved much faster (at least in my case).

Written by Sebastian

February 22, 2010 at 2:21 pm