Archive for the ‘Mathematics and Optimization’ Category
The Scottish Café revisited
The Scottish Café (Polish: Kawiarnia Szkocka) was the café in Lwów (now Lviv) where, in the 1930s and 1940s, mathematicians from the Lwów School collaboratively discussedresearch problems, particularly in functional analysis and topology.
Stanislaw Ulam recounts that the tables of the café had marble tops, so they could write in pencil, directly on the table, during their discussions. To keep the results from being lost, and after becoming annoyed with their writing directly on the table tops, Stefan Banach‘s wife provided the mathematicians with a large notebook, which was used for writing the problems and answers and eventually became known as the Scottish Book. The book—a collection of solved, unsolved, and even probably unsolvable problems—could be borrowed by any of the guests of the café. Solving any of the problems was rewarded with prizes, with the most difficult and challenging problems having expensive prizes (during the Great Depression and on the eve of World War II), such as a bottle of fine brandy.[1]
For problem 153, which was later recognized as being closely related to Stefan Banach’s “basis problem“, Stanislaw Mazur offered the prize of a live goose. This problem was solved only in 1972 by Per Enflo, who was presented with the live goose in a ceremony that was broadcast throughout Poland.[2]
I have been thinking about “the scottish café” for a while and i was wondering whether it is possible to have “something” like this in a virtual, decentralized fashion. Last night I accidentally participated in a “google hangout” and I thought that this might be a good platform for this.
At this point I do not have any idea what a good setup and rules etc. is. But you are most welcome to join and provide your ideas and feedback.
A proposed initial setup could be either a classic seminar talk + discussion or free discussion about a pre-specified topic. The first one might be easier in terms of getting things started whereas the second one is more in the original spirit of the café. While I personally would prefer the second one in the mid-term it might be better to get things started with the first one.
- The Café: https://plus.google.com/b/106979538736138847732/
- Lars Schewe’s google+ page: https://plus.google.com/118353189915712144351/
- My own google+ page: https://plus.google.com/104806249797641578404/
Informs Annual Meeting 2011
I plan to update this post in the course of the following days and I would have also loved to send out a few more tweets, especially now that I have this fancy twitter ribbon, but I have hard time connecting to the internet at the convention center.
Day 1: So the first day of the Informs Annual Meeting is almost over… What were my personal highlights from today? I listened to quite a lot of talks and most of them were really good so that I feel that it would not be fair to mention only a few selected ones. One personal highlight that I would like to mention though is the Student Paper Prize that Dan Dadush from Georgia Tech got for his work together with Santanu Dey and Juan Pablo Vielma on the Chvátal-Gomory closure for compact convex sets. This result is really amazing as it provides deep insights into the way of how cutting-plane procedures work when applied to non-linear relaxations. Congratulations Dan! Below is a picture of the three taken at the session.
Day 2: The second day of the Informs meeting was great. In particular I loved the integer optimization session (for apparent reasons) and had some extremely intense and deep discussions with some of my colleagues – for me this is the most important reason to go to a conference: dialog. One such discussion was with David Goldberg from GATech about the fundamental concepts in optimization/math/reasoning that completely redefine your thinking once fully understood/appreciated (this includes concepts such as “equality”, “proof by contradiction”, “induction”, or “diagonalization”). I will have a separate blog post together with David on that. If you think there is a concept that completely changed your thinking, please email us or drop us a line in the comment section. I am off to my talk now – more later…
Day 3: I actually did not see too much of the third day as I had to leave early.
All in all, Informs 2011 was a great event, in particular to meet all the people that are usually spread all over the US. I was actually quite happy to see that there were also quite a lot of Europeans at the conference
indirect proofs: contrapositives vs. proofs by contradiction
Last week I read a rather interesting discussion on contrapositives vs. proofs by contradiction as part of Timothy Gowers’ Cambridge Math Tripos, mathoverflow, and Terry Tao’s blog. At first sight these two concepts, the contrapositive and the reductio ad absurdum (proof by contradiction) might appear to be very similar. Suppose we want to prove for some statements
and
. Then this is equivalent to showing
(at least in classical logic). The latter is the contrapositive and often it is easier to go with the contrapositive. In the case of the indirect proof we do something similar, however there is a slight difference: we assume that
and deduce a contradiction. So what the big deal? The difference seems to be more of a formal character. However this is not true. In the first case we remain in the space of “true statements”, i.e., any deduction from
is a consequence of
that we can use later “outside of the proof”. In the case of the proof by contradiction we move in a “contradictory space” (as
is contradictory) and everything that we derive in this space is potentially garbage. Its sole purpose is to derive a contradiction however as we work in a contradictory system we cannot guarantee that the statement derived within the proof are true statement; in fact they are likely not be true as they should result in a final contradiction.
Interestingly a similar phenomenon is known for cutting-plane procedures or cutting-plane proof systems (both terms essentially mean the same thing; it is just a different perspective) . Let me give you an ultra-brief introduction of cutting-plane procedures. Given a polytope we are often interested in the integral hull of that polytope which is defined to be
. A cutting-plane procedure
is now a map that assigns to
a new polytope
such that
and
hopefully provides a tighter approximation of
. So what the cutting-plane procedure does, is to derive new valid inequalities for
by examining
and usually the derivation is computationally bounded (otherwise we could just guess the integral hull); the exact technical details are not too important at this point.
Now any well-defined cutting-plane procedure satisfies
. Or put differently, giving the cutting-plane procedure access to an additional inequality can potentially increase the strength of the procedure as compared to let it work on
and then intersect with the half-space
afterwards. Now what does this have to do with indirect proofs and contrapositives? The connection arises from the following trivial insight: an inequality
(with integral coefficients and right-hand side) is valid for
if and only if
. In particular a sufficient condition for the validity of
for
is
. The key point is that
can be strictly contained in
. The first one is the indirect proof, whereas the second one is the contrapositive, as we verify the validity of
by testing if
. However we do not use the inequality
in the cutting-plane procedure, i.e., the procedure has no a priori knowledge about what to prove, whereas in the case of indirect proofs, we add the negation of
and the procedure can use this information.
So how much do can you gain? Suppose we have a graph and we consider the associated fractional stable set polytope
. Typically (there are a few exceptions), for a classical cutting-plane procedure the derivation of clique inequalities is involved and we need
applications of the cutting-plane procedure to derive the clique inequalities for a clique
of size
, i.e.,
. However an indirect proof of the clique inequalities takes only a single application of the most basic cutting-plane operator: Consider
for a clique . It is not hard to see that
for all
. A basic derivation that any sensible cutting-plane operator
supports is to derive that
, i.e.,
is valid for
whenever
is valid for
. Therefore we obtain that
. On the other hand
and so
holds and thus the indirect proof derived
.
So what one can see from this example is that indirect proofs (at least in the context of cutting-plane proof systems) can derive strong valid inequalities in rather few rounds and outperform their direct counterpart drastically (constant number of rounds vs. log(n) rounds). However a priori knowledge of what we want to prove is needed in order to apply the indirect proof paradigm. This makes it hard to exploit the power of indirect proofs in cutting-plane algorithms. After all, you need to know the “derivation” before you did the actual “derivation”. Nonetheless, in some cases we can use indirect proofs by guessing good candidates for strong valid inequalities and then verify their validity using an indirect proof.
Check out the links for further reading:
Long time no see
It has been quite a while that I wrote my last blog post; the last one that really counts (at least to me) was back in February. As pointed out at some point it was not that I was lacking something to write about but more that I did not want to “touch” certain topics. That in turn made me wonder what a blog is good for when, in fact, one is still concerned about whether to write about certain topics. So I got the feeling that in the end, all this web 2.0 authenticity, all this being really open, direct, authentic, etc. is nothing else but a (self-) deception. On the other hand, I also did not feel like writing about yet another conference. I have to admit that I have been to some really crappy conferences lately and since I did not have anything positive to say I preferred to not say anything at all. There were a few notable exceptions, e.g., the MIP or IPCO. Another thing that bothered me (and still does) is the dilution of real information with non-sense. In fact I have the feeling that the signal-to-noise ratio considerably dropped over the last two years and I didn’t want to add to this further. This feeling of over-stimulation with web 2.0 noise seems to be a global trend (at least this is my perception). Many people gave up their blogs or have been somewhat neglecting them. Also maintaining a blog with say weekly posts (apart from passing on a few links or announcements) takes up a lot of time. Time that arguably could be better invested into doing research and writing papers.
Despite those issues or concerns I do believe that the web with all its possibilities can really enhance the way we do science. As with all new technologies one has to find a modus operandi that provides positive utility. In principle the web can provide an information democracy/diversification, however any “democratic endeavor” on the web has a huge enemy. The Matthew effect (or commonly known as “more gains more”). This term, coined by R.K. Merton, derives its name from the following biblical Gospel of Matthew (see also wikipedia):
For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away. — Matthew 25:29, New Revised Standard Version
In principle it states that the “rich get richer” while the “poor get poorer”. If we think of the different social networks (facebook, myspace, friendster) it refers to the effect that the one that has the largest user basis is going to attract more people than the one with a smaller one. In the next “round” this effect is then even more pronounced until the smaller competitor virtually ceases to exist. In the real-world this effect is often limited due to various kinds of “friction”. There might be geographic limitations, cultural barriers etc., that do wash out the advantage of the larger one so that the compounding nature of the effect is slowed down or non-existent (this hold even true in the highly globalized world we live in). That is the reason why dry cleaners, bakeries, and other forms of local business are not outperformed by globalized companies (ok, some are). In the context of the internet however there is often no inhibitor to the Matthew effect. It often translates into some type of preferential attachment although with the difference that the overall user basis is limited so that the gain of one party is the loss of another (preferential attachment processes are usually not zero-sum).
So what does this mean in the context of blogs? Blog reading is to a certain extent zero-sum. There is some limited amount of time that we are willing to spend reading blogs. Those with a large user basis will have more active discussions and move higher in the priority list for reading. In the end the smaller ones might only have a handful of readers making it hard to justify the amount of time spent writing the posts. Downscaling the frequency of posts might even pronounce the effect as it might be perceived as inactivity. One way out of this dilemma could be any form of joining the smaller units to larger ones, i.e., either “digesting” several blogs to a larger one or alternatively “shared blogging”. I haven’t made up my mind yet what (if!) I am going to do about this. But I guess, in the end some type of consolidation is inevitable.
Having bothered you with this abstruse mixture of folklore, economics, and internet, I actually intended to write about something else but somewhat related today: About deciding whether and when to dump a project. This problem is very much inspired by my previous experiences as a consultant and recent decisions about academic projects. More precisely, suppose that you have a project and you have an estimate for the overall time of the project. At some point you want to review the progress and based on what you see at this point you want to make a call whether or not you will abandon the project. The longer you wait with your review the better your information is that you gain from the review. On the other hand you potentially wasted too much time and resources to increase the confidence in your decision. In fact it might make even sense you not start a project at all. Suppose that you have an a priori estimate for the probability of success of your project, say p. Further let r(t) denote our function of erring, i.e., r(0) = 1/2 and r(1) = 0 which means that at time t= 0 we do not have any information yet and so we can only guess leading to guessing wrong with probability 50% and at time t = 1 we have perfect information. Let t denote the point in time at which we review the project (as a fraction of the overall time, here assumed to be 1). We have four cases to consider (one might opt for a different payoff function; the following one resembles my particular choice):
- The project is going to be successful and at the point of reviewing we guessed right, i.e., we went through with it. In this case the benefit is s. This happens with probability (1-r(t)) p and expected payoff for this scenario is: (1-r(t)) p s. [alternatively one could consider the benefit s - t; or something else]
- The project is going to be successful and at the point of reviewing we guessed wrong, i.e., we dropped the project. In this case the benefit is - (t + s), i.e., we lose our investment up to that point (here with unit value) and the overall benefit. Probability is r(t) p and expected payoff - r(t) p (t+s).
- The project is going to fail and we guessed right: Benefit -t, i.e., the investment so far. Expected payoff - (1-r(t)) (1-p) t.
- The project is going to fail and we guessed wrong, i.e., we went through with it: Benefit -T, were T is some cost for this scenario. Expected payoff - r(t) (1-p) T.
All in all we have the following expected overall payoff as a function of t:
Next we have to define our function which models our increase in confidence. I opted for a function that gains information in a logarithm fashion, i.e., in the beginning we gain confidence fast and then we have a tailing-off effect:
The parameter k can be understood as the rate of learning. For example for k = 0.01 it looks like this:

Assuming s = 1 and T = 1, i.e., the payoffs are basically the invested time and p = 30%, the expected payoff as function of the time of review t looks like this (payoff: blue line, error rate: red line):
The maximum payoff is reached for a review after roughly 20% of the estimated overall time. However it is still negative. This suggests that we do not learn fast enough to perform a well-informed decision. For example for k = 0.001, the situation looks different:
The optimal point for a review is after 14% of the estimated project time. Having once estimated your rate of learning, one can also determine which projects one should not get involved with at all. For k = 0.001 this is the case when the probability of success p is less than roughly 27%.
Although this model is somewhat very simple it provides some nice qualitative (and partly quantitative) insights. For example that there are indeed projects that you should not even get involved with; this is somewhat clear from intuition but I was surprised that the probability of success of those is still quite high. Also, when over time your rate of learning increases (due to experience with other projects) you can get involved with more risky endeavors because your higher review confidence allows you to purge more effectively. For example when k goes down to, say, k = 0.00001 (which is not unrealistic as in this case shortly after the beginning of the project you would only err with around 20%) you could get involved with projects that only have a probability of success of 19%.
And no complaints about the abrupt ending – I consumed my allocated blogging time.
GLPK Lab for Windows released
Yesterday the first version of GLPK Lab for Windows, a new modeling GUI for GLPK has been released. Maintainers of the package are Luiz Bettoni, Andrew Makhorin, and Heinrich Schuchardt. I have not tried it out yet as I am not a Windows user. So any feedback is welcome. From the release notes:
We are pleased to announce the very first release of GLPK Lab for
Windows.GLPK Lab is a set of software tools and libraries based on the GLPK
(GNU Linear Programming Kit) package and intended for solving
large-scale linear programming (LP), mixed integer programming (MIP),
and other related problems.Currently GLPK Lab includes the following main components:
* GLPK Lab Editor, which is the SciTE editor adapted to be used along
with GLPK Lab (based on the Gusek project)* Stand-alone GLPK LP/MIP solver
* GLPK API library compiled with Microsoft Visual C++ 2008 and
intended for developing GLPK-based applications in C or C++* Java binding to GLPK API library (from GLPK-Java) intended for
developing GLPK-based applications in JavaGLPK Lab is free, public-domain software hosted on SourceForge. Its
webpage is here: <http://glpklabw.sourceforge.net/>. (Please note that
GLPK Lab is *not* a GNU package.)Happy Modeling,
The GLPK Lab Development Team
More information on GLPK including tutorials, examples, and tools can be found here.
CPLEX free for academic use
Probably I am not the first one to report this but CPLEX is now available for free for academics as part of IBM’s academic initiative (see here):
Full-version CPLEX now available free-of-charge to academics
Posted: Feb 18, 2010 08:30:03 PM
Effective February 15, 2010, IBM is offering no-charge full-version ILOG Optimization products via IBM’s Academic Initiative program (http://www.ibm.com/academicinitiative). This move builds on the availability of limited Teaching Editions available since August 2009, and now provides registered members with renewable one-year access to IBM ILOG OPL-CPLEX, CPLEX, CP Optimizer and CP for both teaching and research use. Registered members can access ILOG Optimization software at: https://www.ibm.com/developerworks/university/software/get_software.html, where they can search for ILOG Optimization or individual solvers by name. Depending on the search criteria, electronic assemblies will be retrieved for IBM ILOG OPL Development Studio, CPLEX, CP Optimizer or CP on all commercially available platforms. To run the software, members will also need to download a key from: https://www.ibm.com/developerworks/university/support/ilog.html, but are no longer required to install ILM. Note that as part of Academic Initiative, IBM also makes its professionally-developed training materials available for use in classrooms and labs at: https://www14.software.ibm.com/webapp/devtool/scholar/web/coursewarePickPage.do?source=ai-course-websphere.
The application is pretty forward and the verification of eligibility is supposed to take “3-5 business days”. But it seems that the accounts are approved much faster (at least in my case).
Sikuli – a scripting language based on screenshots
There is a pretty cool, new scripting language, Sikuli which is a “research project developed by User Interface Design Group, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)“. The cool thing is that you can automate GUI interactions by using screenshots of buttons, sliders, input boxes, etc. But rather before I start to explain how it works, just check out the video.
The language itself is based on Jython and thus you can use Jython to do almost everything you like and it is fairly simple to use:
So now you finally can automate your optimal FarmVille production schedule by having Sikuli interact with the applet
. Just put your production schedule into Sikuli and let it do the rest. Here is a GMPL model (based on the AMPL model from O.R. by the Beach – I just changed “;” and removed the “option” lines) for determining the optimal production schedule that maximizes cash. Can be solved to optimality with GLPK in 0.1 secs – the Sikuli code is left as an exercise
param T;
set Horizon := 0..T;
set Crops;
param TotalArea;
param InitialArea;
param InitialMoney;
param PlowCost;
param Growth{Crops};
param Cost{Crops};
param Revenue{Crops};
var Plant{Crops,Horizon} integer >= 0;
var Area{Horizon} >= 0, <= TotalArea;
var Money{Horizon} >= 0;
maximize z: Money[T] + 4*PlowCost;
subject to
area0: Area[0] = InitialArea + sum {i in Crops} Plant[i,0];
area{t in 1..T}: Area[t] = Area[t-1]
+ sum {i in Crops} Plant[i,t]
- sum {i in Crops : t >= Growth[i]} Plant[i,t-Growth[i]];
money0: Money[0] = InitialMoney - sum {i in Crops} (PlowCost + Cost[i])*Plant[i,0];
money{t in 1..T}: Money[t] = Money[t-1]
- sum {i in Crops} (PlowCost + Cost[i])*Plant[i,t]
+ sum {i in Crops : t >= Growth[i]} Revenue[i]*Plant[i,t-Growth[i]];
solve;
display z;
display {i in Crops, t in Horizon : Plant[i,t] > 0} Plant[i,t];
display Area,Money;
data;
param T := 36;
set Crops := SB EP WH SY;
param TotalArea := 144;
param InitialArea := 0;
param InitialMoney := 323;
param PlowCost := 15;
param:
Growth Cost Revenue :=
SB 1 10 35
EP 12 25 88
WH 18 35 115
SY 6 15 63;
end;
A polyhedral characterization of border bases
Tomorrow I will give a talk on a recent paper which is joint work with Gábor Braun from the Alfréd Rényi Institute of Mathematics. It deals with the polyhedral characterization of all admissible order ideals (and hence border bases) that exist for a given zero-dimensional ideal. As a hardness result, it also follows that determining an optimal order ideal (w.r.t. to a linear objective function) is NP-hard. This is in particular interesting as it shows that not only computing the L-stable span (the vector space approximation of the ideal) is hard but also choosing a ‘nice’ basis which is a mere basis transformation from a linear algebra point of view (see also my previous post for some more details). Here are the slides:
The GNU Linear Programming Kit (GLPK) : Resources, Tutorials…
I just put together some more resources, tutorials, and information about the GNU Linear Programming Kit (GLPK). The page can be found here.
If you have some information, links, etc., that you would like to see there, just drop me a line.
Information asymmetry and beating the market
Recently, I was wondering how much money you can effectively gain by investing, given a certain information advantage: Suppose that you want to invest some money, for the sake of simplicity say $10,000. Can you assume to be able to extract an average-exceeding return from the market given that you have an information advantage? If you believe in the strong form of the efficient market hypothesis then the answer is no of course. If not, then is it at least theoretically possible?
Let us consider a simplified setting. Suppose that we can invest (long/short) in a digital security (e.g., digital options) with payouts 0 and 1 (with a price of 0.5) and let us further suppose that it pays out 1 with a probability of 50%. Now assume that we have a certain edge over the market, i.e., we can predict the outcome slightly more accurately, say with accuracy. If we have a good estimate of our edge, we can use the Kelly Criterion to allocate our money. The Kelly Criterion, named after John L. Kelly, Jr determines the proportional amount of money to bet from own bankroll so that the overall utility is maximized – this criterion is provably optimal. It was presented by Kelly in his seminal 1956 paper “A New Interpretation of Information Rate“. In this paper Kelly links the channel capacity of a private wire (inside information) to the maximum amount of return that one can extract from a bet. While this bound is a theoretical upper bound, it is rather strong in its negative interpretation: If you do not have any inside information (which includes being just smarter than everybody else or other intangible edges) you cannot extract any cash. The Kelly Criterion arises as an optimal money management strategy derived from the link to Shannon‘s Information Theory and in its simplest form it can be stated as:
where are the odds,
the probability to win, and
the probability to lose. So in our setting, where we basically consider fair coin tosses whose outcomes we can predict with
accuracy, an edge of 1% or 100bps is considerable. Using the money management strategy from above (neglecting taxes, transaction fees, etc.), we obtain:
with an initial bankroll of $10,000, y-axis is log10(bankroll), x-axis is #bets. The five lines belong to the %5, 25%, 50%, 75%, and 95% percentiles computed on the basis of 5,000 Monte-Carlo runs. So even the 5% percentile sees a ten-fold increase of the bankroll after roughly 4,100 bets, whereas the 95% percentile is already at a 100-fold increase. In terms of real deals the number of bets is already considerable though — after all, which private investor does 4,000 transactions??
Unfortunately, an edge of 100bp is very optimistic and for, say, for 50bp edge the situation already looks a quite different: the 50% percentile barely reaches a ten-fold increase after 10,000 bets.
And now let us come to the more realistic scenario when considering financial markets. Here an edge of 10bp is already considered significant. Given all the limitations as a private investor, i.e., being further down the information chain, sub-optimal market access, etc., assuming an edge of 10bp would be still rather optimistic. In this case, using an optimal allocation of funds, we have the following:
Here the 25% percentile actually lost money and even the 50% percentile barely gained anything over 10,000 bets. In the long run also here a strictly positive growth occurs, but for 10bp it takes extremely long: While you might be able do 4,000 deals over the course of say 10 – 30 years. Here even after 100,000 bets the 5% percentile barely reaches a 29% gain (over 100,000 bets!!). Given transaction costs, taxes, fees, etc., in reality the situation looks worse (especially when considered more complicated financial structures). So it comes all down to the question, how large your edge is.
Although extremely simplified here, a similar behavior can be shown for more complicated structures (using e.g., random walks).




