Archive for the ‘what else is out there’ Category
Christmas time is reading time… and I actually read two quite amazing books:
- Jacques Hadamard: The Psychology of Invention in the Mathematical Field
A classic. Very nice introspection and discussion about “how” mathematics is done. In particular it discusses a lot of potential theories of mathematical innovation and invention (as one might have guessed from the title).
- Norbert Wiener: I Am a Mathematician
Part two of his autobiography. A very interesting read with a lot of big names and a lot of interesting insights into various aspects in and around mathematics. (thanks to Dave Goldberg for the pointer)
Today I would like to talk about the Mastermind game and related (recreational?!) math problems – the references that I provide in the following are probably not complete. Most of you might know this game from the 70s and 80s. The first player is making up a secret sequence of colored pebbles (of a total of 6 colors) and the other player has to figure out the sequence by asking questions about the code by proposing potential solutions. The first player then indicates the number of color matches.
More precisely, Wikipedia says:
The codebreaker tries to guess the pattern, in both order and color, within twelve (or ten, or eight) turns. Each guess is made by placing a row of code pegs on the decoding board. Once placed, the codemaker provides feedback by placing from zero to four key pegs in the small holes of the row with the guess. A colored (often black) key peg is placed for each code peg from the guess which is correct in both color and position. A white peg indicates the existence of a correct color peg placed in the wrong position.
If there are duplicate colours in the guess, they cannot all be awarded a key peg unless they correspond to the same number of duplicate colours in the hidden code. For example, if the hidden code is white-white-black-black and the player guesses white-white-white-black, the codemaker will award two colored pegs for the two correct whites, nothing for the third white as there is not a third white in the code, and a colored peg for the black. No indication is given of the fact that the code also includes a second black.
Once feedback is provided, another guess is made; guesses and feedback continue to alternate until either the codebreaker guesses correctly, or twelve (or ten, or eight) incorrect guesses are made.
In a slightly more formal way, we have a string in and the “decoder” wants to reconstruct this string by inferring from the provided feedback. One of the natural questions that arise is of course how many questions do suffice. Knuth [Knuth76] then showed that five questions suffice to be able to always reconstruct the secret string. What is interesting about the proof is that it is a “table” – basically output of a computer program. This lookup table can be used so find a next question at any given point. The table is a greedy optimization in some sense: “Figure 1 [the lookup table] was found by choosing at every stage a test pattern that minimizes the maximum number of remaining possibilities, over all 15 responses by the codemaker”.
Later in 1983, Vasicek Chvátal dedicated a paper on the Mastermind game to Paul Erdős for his 70th birthday. Chvátal looked at generalized admissible Mastermind vectors denoted by of vectors of length n with k different colors. It is not too hard to see that the minimum number of questions needed to correctly identify any string in is bounded from below by
which arises from the fact that there are only different answers and different strings have to be distinguished. Complementing this bound, Chvátal showed that the number of questions needed to be asked without waiting for the answer (i.e., the questions are asked in one go, then the answers to all questions are provided at once, and then the code has to be uniquely identified) can be bounded from above as follows: the number of questions needed for this static case will be denoted by and for any there exists so that for all and we have
and clearly we have . The proof uses the probabilistic method in a nice way. Moreover, Chvátal also provides some upper and lower bounds for special cases. Those of you guys that know about my addiction to the Chvátal-Gomory closure and its friends might have already guessed that this is exactly how I came across the problem…
The latter problem where we do not wait for the answers is usually called the static mastermind problem whereas the classical version is called the dynamic mastermind problem. Later in 2003 and 2004 Goddard (see [Godd03,04]) provided optimal values for the minimal number of questions to be asked both in the dynamic as well as static case and also for the average number (denoted by ) of questions needed whenever the secret string is uniformly picked at random. With the notation from above we have the following number of questions (tables taken from [Godd03,04]):
For the average number of queries needed () we obtain:
and similarly for the dynamic case we have the following minimum number of queries :
|3 -||4||4||4||4||5||<= 6|
|4 -||4||4||4||5||<= 6|
|5 -||5||5||5||<= 6|
|7 -||6||6||<= 6|
and for the static case we have the following table. Note that in the table below the final “query” that states the recovered string is not counted as in comparison to the ones above. Therefore in order to compare the values with the ones above you need to add “1″ to each entry.
|3 -||2||3||3||4||4||<= 5|
|7 -||5||6||<= 7|
|8 -||6||7||<= 8|
(there seems to be a typo for n = 2 and k = 3 in one of the tables, as the static case has a better performance than the dynamic case which is not possible).
In order to be able to actually check (with a computer) whether a certain number of questions suffices, we have to exclude symmetries in a smart way. Otherwise the space of potential candidates is too large. In this context, in particular the orderly generation framework of [McKay98] is very powerful. The idea behind that framework is to incrementally extend the considered structures in such a way that we only add a canonical candidate per orbit. Moreover, after having extended our structure to the next “size” we need to check whether it is isomorphic to one of the previously explored structures. In this case we do not consider it. For each candidate we check whether the number of distinct answers is equal to the total number of possible secret codes. In this case there is a bijection between the two and therefore we can decode the code. However it is not clear that this bijection needs to have a “nice” structure or that it is “compact” in some sense.
- [Knuth76]: Knuth, D.E. 1976. “The computer as a master mind.” Journal of Recreational Mathematics. http://colorcode.laebisch.com/links/Donald.E.Knuth.pdf (Accessed June 9, 2011).
- [Chvátal83]: Chvátal, V. 1983. “Mastermind.” Combinatorica 3: 325-329.
- [McKay98]: McKay, B.D. 1998. “Isomorph-free exhaustive generation.” Journal of Algorithms 26(2): 306–324.
- [Good03]: Goddard, W. 2003. “Static Mastermind.” Journal of Combinatorial Mathematics and Combinatorial Computing 47: 225-236
- [Godd04]: Goddard, W. 2004. “Mastermind Revisited.” Journal of Combinatorial Mathematics and Combinatorial Computing 51: 215-220
Timothy Gowers just started a new series of blog posts for first-year mathematics students. While the blog posts will be centered around Cambridge’s courses I am pretty sure that the discussed topics and hints will be valuable to other students as well. In fact, what I find most impressive is the goal of the series: to teach people how to do mathematics! We all learned what mathematics is and results have been presented to us in a nice, cleaned-up fashion. However only very few of us were taught how to solve/approach problems – most of us learned it the hard way at some point. It is as if you go to a restaurant to get great food: this does not teach you how to cook yourself! In particular it does not teach you that the nice result is a product of quite a mess in the kitchen. When doing math, everybody will reach her or his limit sooner or later (as compared to math in school which was easy for many math students) and it is precisely this point in time, when students start to doubt their own potential. In fact some kind of a bias is bound to take place: every math problem that can be solved is “easy” and every problem that is not solved is a small personal crisis – “am I good enough”? As you did not see the mess in the kitchen, one might think that things come easy in a nice form or not at all. In the end there is no positive feedback available anymore, only negative feedback.
I am very much looking forward to this series and I am sure that Tim has some valuable insights to share!
I just got an email from IBM asking me to participate in the Academic Initiative Survey. I participated with the aim to address a few shortcomings with respect to IBM’s support for, and interest in their optimization products, e.g., that it is quite a hassle to download cplex as one has to go through an uncountably infinite number of pages before one actually reaches the download page – if one reaches it at all. Also there were a few other things that I wanted to address.
But guess what. One of the first question was to what academic field one belongs to. Operations Research? Mathematics? nada. That was already a bad omen. And in fact. There were only two references to optimization at all “Linear Programming” and “Integer Programming” in the courses that I teach / want to teach (out of a gazzillion listed including a lot of voodoo stuff). Effectively, optimization and the optimization products were virtually not present at all. Cplex, OPL, OPL Studio and none of the other optimization tools were even mentioned.
This apparent lack of interest raises serious questions about IBM’s future plans for cplex and their optimization products. In particular, questions about continuity and support. Who knows… 10 years ago I would have been really scared as cplex was the strongest industrial strength solver and therefore choice number one in many applications – however times have changed and fortunately there are alternatives now.
This xkcd comic is rather old and most of you might have seen it already… Anyways, I found it pretty hilarious (yes, geek humor) and so I wanted to share it with you:
The European Capital of Culture for 2010 is Essen in Germany, on behalf of the Ruhr Area (“Ruhrpott” or “Ruhrgebiet” as we say) and its several cities (e.g. Bochum, Dortmund, Gelsenkirchen, Duisburg, Oberhausen, …). The official homepage can be found here . As I grew up in Essen and practically lived there the first “part” of my life I thought I take the opportunity and provide you with some background information and links, just in case you are in Germany in 2010 and you have a chance to visit that diverse and really interesting area. Unfortunately, some of the articles are in German only, but I tried to provide English resources whenever possible.
According to wikipedia, the Ruhr Area is the fourth largest urban area in Europe with some 7.3 million inhabitants. What is special about this area is its unique industry culture and the transformation in the 90′s away from heavy industry to a more service oriented area (see , , , German only). As you can imagine, this transition brought a lot of initial problems and desperation to people heavily relying on heavy industry for their income and the fact that those people could not be easily moved to different jobs (see , German only). Now that this transition is (almost) complete, the Ruhr Area presents its distinguished face to the world as the European Capital of Culture. Also check out the special issue on Spiegel (, German only) about Ruhr.2010.
Events and must-sees:
An event planer with events scheduled in 2010 can be found in  and following the same link, you can also find the Top-10 attractions in the Ruhrpott (according to Spiegel and I kinda agree).
Impressions from the Ruhr Area:
In  you can listen to some “typical” sounds of the Ruhr Area. These sound recordings were taken by Richard Ortmann and represent some of the day to day noises you would have heard (and partly still would hear today). Also check out his sound archive with a more comprehensive collection of sound nostalgia . Probably one of the most famous songs about the Ruhr Area is Herbert Grönemeyer’s “Bochum” (about one of the big cities in this area):
Right now, 09.01.2010 15:47, there is a live show about Ruhr.2010 as the European Capital of Culture on ZDF (one of the German main tv stations). You can find it also in the ZDF mediathek  together with many other shows and pictures about the Ruhr Area where you can watch it for free.
Here are a few pictures from various photographers depicting the Ruhr Area:
: “Wir sind die Dinosaurier – und wir wollen nicht aussterben” – Spiegel 11.02.2007 (German only)
: Von der stinkenden Brühe zum Lebenselexier – Spiegel 29.08.2007 (German only)
: Kulturhauptstadt Ruhr.2010 – Spiegel 08.01.2010 (German only)
: Industrie in XXL – Spiegel 22.06.2009 (German only)
: Der Pulsschlag aus Stahl verklingt – Spiegel 20.10.2007 (German only – but for the sounds it does not matter)
: Richard Ortmann – Sound Archive (German only – but for the sounds it does not matter)
: ZDF mediathek’s Ruhr.2010 collection (tv shows and pictures)
: Spiegel Ruhr.2010 event planer including Top 10 attractions (German only – but rather self-explaining)
: Ruhr.2010 – official homepage
Sorry for the long inactivity, but I am totally caught up in end-of-year wrapping up. I have to admit that I find it quite dissatisfying when you realize the year is almost over and there are still so many things on your list that should have been done… So the last months of the year somehow always end up in total chaos…. anyways…
I came across a very interesting, recent paper (via Mike’s blog post – read also his excellent post) by Arora, Barak, Brunnermeier, and Ge with the title “Computational Complexity and Information Asymmetry in Financial Products” – the authors also provide an informal discussion of the relevance for derivative pricing in practice. As I worked with structured products myself for some time, this paper raised my interest and if you are interested in the trade-off between, say, full rationality and bounded-rationality when it comes to pricing, you should definitely give it a look as well.
The paper deals with the effect of information asymmetry between the structuring entity (which is often also the seller) and the buyer of a structure. The considered derivatives in the paper are CDO like structures and, running the risk of over-simplification, the main points are as follows:
- Having a set of assets that should be structured into derivatives, here CDOs, a malicious structurer can hide a significant amount of junk assets when assigning the assets to the derivatives. More precisely, the structurer can ensure that the junk assets are overrepresented in a certain subset of the derivatives to be structured which significantly deteriorates their value.
- A buyer with full-rationality (which can here perform exponential time computations) can actually detect this tampering by testing all possible assignment subsets and verifying that there is actually an/no over-representation.
- On the other hand, a buyer with limited computational resources, say which is only capable of performing polynomial time computations (the standard assumption when considering efficient algorithms that behave well in the size of the input) cannot detect that the assignments of the assets to the derivatives has been tampered.
- Under some additional assumptions, the tampering is even ex post undetectable.
- The authors propose different payoff function that are more resistant to tampering in the sense that heavy, detectable tampering is needed to skew the payoff profile significantly.
Now the authors devise a model similar to Akerlof’s lemons. Stated in a simplified way, the buyer, knowing that he cannot detect the tampering, will assume that a tampering has been performed and is only willing to pay the adjusted price factoring in the potential tampering of the structure – adverse selection. The honest structurer is not willing to sell his derivatives for the reduced price and leaves the market. This effect, based on the information asymmetry between buyer and seller (which was exemplified in Akerlof’s paper using the market of used cars) in the classical setting would lead to a complete collapse of the market as it would repeat ad infinitum until nobody would be left willing to trade. Countermeasures stopping this vicious circle are warranties in the case of the cars. The variant considered here for the structured products will likely converge to the point where the maximum amount of tampering has been performed and buyers and sellers expectations or levels of information are aligned.
What particularly fascinated me is the type of problem encoded to establish intractability. Contrary to the classical NP-hard problems known in optimization that mostly ask for some kind of an optimal combinatorial solution, the authors use the densest subgraph problem/assumption which asserts that deciding between two random distributions (here the fair one and the tampered one) cannot be done in polynomial time (provided that the tampering is not too obvious). In particular:
Densest subgraph problem. Let be a bipartite graph with out-degree for all vertices in . The densest subgraph problem for is to distinguish between the two distributions:
- which is obtained by choosing for every vertex in an amount of neighbors in randomly. (what would be the fair assignment)
- which is obtained by first choosing and with and , and then choosing neighbors for every vertex outside of , and random neighbors for every vertex in . Then we choose random additional neighbors in for every vertex in . (which means that we choose some assets and some derivatives a priori and we prefer to add edges between those sets — slightly simplified. On the rest we do random assignments)
Then the densest subgraph assumption states that whenever, as functions of are chosen sufficiently moderate, then we cannot distinguish between those two distributions, i.e., we cannot detect the tampering with a polynomial time algorithm:
Densest subgraph assumption. Let be such that , then there is no and poly-time algorithm that distinguishes between and with advantage .
Note that the vertices correspond to the structures and the to the underlyings/assets. Although asymptotically intractable, what would be interesting to know is what one can do in practice for reasonable instance sizes, i.e, up to which degree one would be actually able to detect tampering. As Mike already said:
In particular, if a group put out 1000 groupings of financial instruments, and I needed to solve the densest subgraph problem on the resulting instance, I would work very hard at getting an integer program, constraint program, dynamic program, or other program to actually solve the instance (particularly if someone is willing to pay me millions to do so). If the group then responded with 10,000 groupings, I would then simply declare that they are tampering and invoke whatever level of adverse selection correction you like (including just refusing to have anything to do with them). Intractable does not mean unsolvable, and not every size instance needs more computing than “the fastest computers on earth put together”.
Another point might be that there are potentially billions of ways of tampering structured products. Especially when the payoff profiles are highly non-linear (e.g., FX-ratchet swaps with compounding coupons) deliberate over-/underestimation of parameters might completely change the valuation of the structures. The proposed framework highlights that there might be ways of tampering that we cannot detect in the worst case, even ex-post (under additional assumptions). But before we can actually detect tampering we have to be aware of this kind of tampering and we have a real problem if tampering is undetectable ex post – how to prove it? This is in some sense related to the stated open question 3: Is there an axiomatic way of showing that there are no tamper-proof derivatives – slightly weakened: with respect to ex-post undetectability.
I could also very well imagine that when giving a closer look to traded structures (especially the nasty OTC ones), that there will be more pricing problems that are essentially intractable. It is almost like one of the main hurdles so far to establish intractability was the more stochastical character of prizing problems while hardness is often stated in terms of some kind of combinatorial problem. An approach like the one proposed in the article might overcome this issue by establishing hardness via distinguishing two distributions.
Check this out:
We present a system that composes a realistic picture from a simple
freehand sketch annotated with text labels. The composed picture
is generated by seamlessly stitching several photographs in agreement
with the sketch and text labels; these are found by searching
the Internet. Although online image search generates many inappropriate
results, our system is able to automatically select suitable
photographs to generate a high quality composition, using a filtering
scheme to exclude undesirable images. We also provide a novel
image blending algorithm to allow seamless image composition.
Each blending result is given a numeric score, allowing us to find
an optimal combination of discovered images. Experimental results
show the method is very successful; we also evaluate our system using
the results from two user studies.
Here is the video:
And the link to the paper — they actually use dynamic programming for parts of their algorithm.
Rick Bookstaber recently argued that the arms race in high frequency trading, a form of quantitative trading where effectively time = money , results in a net drain of social welfare:
A second reason is that high frequency trading is embroiled in an arms race. And arms races are negative sum games. The arms in this case are not tanks and jets, but computer chips and throughput. But like any arms race, the result is a cycle of spending which leaves everyone in the same relative position, only poorer. Put another way, like any arms race, what is happening with high frequency trading is a net drain on social welfare.
It is all about milliseconds and being a tiny little bit faster:
In terms of chips, I gave a talk at an Intel conference a few years ago, when they were launching their newest chip, dubbed the Tigerton. The various financial firms who had to be as fast as everyone else then shelled out an aggregate of hundreds of millions of dollar to upgrade, so that they could now execute trades in thirty milliseconds rather than forty milliseconds – or whatever, I really can’t remember, except that it is too fast for anyone to care were it not that other people were also doing it. And now there is a new chip, code named Nehalem. So another hundred million dollars all around, and latency will be dropped a few milliseconds more.
In terms of throughput and latency, the standard tricks are to get your servers as close to the data source as possible, use really big lines, and break data into little bite-sized packets. I was speaking at Reuters last week, and they mentioned to me that they were breaking their news flows into optimized sixty byte packets for their arms race-oriented clients, because that was the fastest way through network. (Anything smaller gets queued by some network algorithms, so sixty bytes seems to be the magic number).
Although high-frequency trading is basically about being fast and thus time is the critical resource, in quantitative trading, in general, it is all about computational resources and having the best/smartest ideas and strategies. The best strategy is worthless if you lack the computational resources to crunch the numbers and, vice versa, if you do have the computational power but no smart strategies this does not get you anywhere either.
Jasmina Hasanhodzic, Andrew W. Lo, Emanuele Viola argue in their latest paper “A Computational View of Market Efficiency” that efficiency in markets has to be considered with respect to the level of computational sophistication, i.e., as market can (appear to) be efficient for those participants which use only a low level of computational resources, whereas it can be inefficient for those participants that invest a higher amount of computational resources.
In this paper we suggest that a reinterpretation of market efficiency in computational terms might be the key to reconciling this theory with the possibility of making profits based on past prices alone. We believe that it does not make sense to talk about market efficiency without taking into account that market participants have bounded resources. In other words, instead of saying that a market is “efficient” we should say, borrowing from theoretical computer science, that a market is efficient with respect to resources S, e.g., time, memory, etc., if no strategy using resources S can generate a substantial profit. Similarly, we cannot say that investors act optimally given all the available information, but rather they act optimally within their resources. This allows for markets to be efficient for some investors, but not for others; for example, a computationally powerful hedge fund may extract profits from a market which looks very efficient from the point of view of a day-trader who has less resources at his disposal—arguably the status quo.
More precisely, it is even argued that the high-complexity traders gain from the low-complexity traders (of course, within the studied, simplified market model – but nonetheless!!):
The next claim shows a pattern where a high-memory strategy can make a bigger profit after a low-memory strategy has acted and modified the market pattern. This profit is bigger than the profit that is obtainable by a high-memory strategy without the low-memory strategy acting beforehand, and even bigger than the profit obtainable after another high- memory strategy acts beforehand. Thus it is precisely the presence of low-memory strategies that creates opportunities for high-memory strategies which were not present initially. This example provides explanation for the real-life status quo which sees a growing quantitative sophistication among asset managers.
Informally, the proof of the claim exhibits a market with a certain “symmetry.” For high-memory strategies, the best choice is to maintain the symmetry by profiting in multiple points. But a low-memory strategy will be unable to do. Its optimal choice will be to “break the symmetry,” creating new profit opportunities for high-memory strategies.
So although in pure high-frequency trading, the relevance of smart strategies might be smaller and thus it is more (almost only?) about speed, in general quantitative trading it seems like (again in the considered model) that the combination of strategy and high computational resources might generate a (longer-term) edge. This edge cannot necessarily be compensated with increased computational resources only, as you still need to have access to the strategy. The considered model considers memory as a the main computational/limiting resource. One might argue that it reflects the sophistication of the strategy along with the real computational resources implicitly, as limited memory might not be able to hold a complex strategy. On the other hand a lot of memory is pointless without a strategy using it. So both might be considered to be intrinsically linked.
An easy example illustrating this point is maybe the following. Consider the sequence “MDMD” and suppose that you can only store, say these 4 letters. A 4-letter-strategy might predict something like “MD” for the next two letters. If those letters though represent the initial of the weekdays (in German), the next 3 letters will be “FSS”. It is impossible though to predict this sequence solely using information about the past on the last 4 letters. The situation changes if we can store up to 7 letters “FSSMDMD”. Then a prediction is possible.
One point of the paper is now that the high-complexity traders might fuel their profits by the shortsightedness of the low-complexity traders. And thus an arms race might be a consequence (to exploit this asymmetry on the one hand and to protect against exploitation on the other). To some extent this is exactly what we are seeing already when traders with “sophisticated” models, that for example are capable of accounting for volatility skew, arbitrage out less sophisticated traders. On the other hand, it does not help to use a sophisticated model (i.e., more computational resources) if one doesn’t know how to use it, e.g., a Libor market model without an appropriate calibration (non-trivial) is worthless.