Archive for the ‘Finance’ Category
Recently, I was wondering how much money you can effectively gain by investing, given a certain information advantage: Suppose that you want to invest some money, for the sake of simplicity say $10,000. Can you assume to be able to extract an average-exceeding return from the market given that you have an information advantage? If you believe in the strong form of the efficient market hypothesis then the answer is no of course. If not, then is it at least theoretically possible?
Let us consider a simplified setting. Suppose that we can invest (long/short) in a digital security (e.g., digital options) with payouts 0 and 1 (with a price of 0.5) and let us further suppose that it pays out 1 with a probability of 50%. Now assume that we have a certain edge over the market, i.e., we can predict the outcome slightly more accurately, say with accuracy. If we have a good estimate of our edge, we can use the Kelly Criterion to allocate our money. The Kelly Criterion, named after John L. Kelly, Jr determines the proportional amount of money to bet from own bankroll so that the overall utility is maximized – this criterion is provably optimal. It was presented by Kelly in his seminal 1956 paper “A New Interpretation of Information Rate“. In this paper Kelly links the channel capacity of a private wire (inside information) to the maximum amount of return that one can extract from a bet. While this bound is a theoretical upper bound, it is rather strong in its negative interpretation: If you do not have any inside information (which includes being just smarter than everybody else or other intangible edges) you cannot extract any cash. The Kelly Criterion arises as an optimal money management strategy derived from the link to Shannon‘s Information Theory and in its simplest form it can be stated as:
where are the odds, the probability to win, and the probability to lose. So in our setting, where we basically consider fair coin tosses whose outcomes we can predict with accuracy, an edge of 1% or 100bps is considerable. Using the money management strategy from above (neglecting taxes, transaction fees, etc.), we obtain:
with an initial bankroll of $10,000, y-axis is log10(bankroll), x-axis is #bets. The five lines belong to the %5, 25%, 50%, 75%, and 95% percentiles computed on the basis of 5,000 Monte-Carlo runs. So even the 5% percentile sees a ten-fold increase of the bankroll after roughly 4,100 bets, whereas the 95% percentile is already at a 100-fold increase. In terms of real deals the number of bets is already considerable though — after all, which private investor does 4,000 transactions??
Unfortunately, an edge of 100bp is very optimistic and for, say, for 50bp edge the situation already looks a quite different: the 50% percentile barely reaches a ten-fold increase after 10,000 bets.
And now let us come to the more realistic scenario when considering financial markets. Here an edge of 10bp is already considered significant. Given all the limitations as a private investor, i.e., being further down the information chain, sub-optimal market access, etc., assuming an edge of 10bp would be still rather optimistic. In this case, using an optimal allocation of funds, we have the following:
Here the 25% percentile actually lost money and even the 50% percentile barely gained anything over 10,000 bets. In the long run also here a strictly positive growth occurs, but for 10bp it takes extremely long: While you might be able do 4,000 deals over the course of say 10 – 30 years. Here even after 100,000 bets the 5% percentile barely reaches a 29% gain (over 100,000 bets!!). Given transaction costs, taxes, fees, etc., in reality the situation looks worse (especially when considered more complicated financial structures). So it comes all down to the question, how large your edge is.
Although extremely simplified here, a similar behavior can be shown for more complicated structures (using e.g., random walks).
The concept of securitization is very versatile. From Wikipedia:
Securitization is a structured finance process that distributes risk by aggregating debt instruments in a pool, then issues new securities backed by the pool. The term “Securitisation” is derived from the fact that the form of financial instruments used to obtain funds from the investors are securities. As a portfolio risk backed by amortizing cash flows – and unlike general corporate debt – the credit quality of securitized debt is non-stationary due to changes in volatility that are time- and structure-dependent. If the transaction is properly structured and the pool performs as expected, the credit risk of all tranches of structured debt improves; if improperly structured, the affected tranches will experience dramatic credit deterioration and loss. All assets can be securitized so long as they are associated with cash flow. Hence, the securities which are the outcome of Securitisation processes are termed asset-backed securities (ABS). From this perspective, Securitisation could also be defined as a financial process leading to an issue of an ABS.
The cash flows of the initial assets are paid according to seniority of the tranches in a waterfall-like structure: First the claims of the most senior tranche are satisfied and if there are remaining cash flows, the claims of the following tranche are satisfied. This continues as long as there are cash-flows left to cover claims:
Individual securities are often split into tranches, or categorized into varying degrees of subordination. Each tranche has a different level of credit protection or risk exposure than another: there is generally a senior (“A”) class of securities and one or more junior subordinated (“B,” “C,” etc.) classes that function as protective layers for the “A” class. The senior classes have first claim on the cash that the SPV receives, and the more junior classes only start receiving repayment after the more senior classes have repaid. Because of the cascading effect between classes, this arrangement is often referred to as a cash flow waterfall. In the event that the underlying asset pool becomes insufficient to make payments on the securities (e.g. when loans default within a portfolio of loan claims), the loss is absorbed first by the subordinated tranches, and the upper-level tranches remain unaffected until the losses exceed the entire amount of the subordinated tranches. The senior securities are typically AAA rated, signifying a lower risk, while the lower-credit quality subordinated classes receive a lower credit rating, signifying a higher risk.
In more mathematical terms, securitization basically works as follows: take your favorite set of random variables (for the sake of simplicity say binary ones) and consider the joint distribution of these variables (pooling). In a next step determine percentiles of the joint distribution (of default, i.e. 0) that you sell of separately (tranching). The magic happens via the law of large numbers and the central limit theorem (and variants of it): although each variable can have a high probability of default, the probability that more than, say x% of those default at the same time decreases (almost) exponentially. Thus the resulting x-percentile can have a low probability of default already for small x. That is the magic behind securitization which is called credit enhancement.
So given that this process of risk mitigation and tailoring of risks to the risk appetite of potential investors is rather versatile, why not applying the same concept to other cash flows that bear a certain risk of default and turn them into structured products 😉
(a) Rents: Landlords face the problem that the tenant’s credit quality is basically unknown. Often, a statement about the tenant’s income and liabilities should help to better estimate the risk of default. But this procedure can, at best, serve as an indicator. So why not using the same process to securitize the rent cash flows and sell the corresponding tranches back to the landlords. This would have several upsides. First of all, the landlord obtains a significantly more stable cash flow and depending on the risk appetite could even invest in the more subordinated tranches. This could potentially reduce rents as the risk premium charged by the landlord due to his/her potentially risk averse preference could be reduced to the risk neutral amount (plus some spreads, e.g., operational and structuring costs). The probability of default could be significantly easier estimated for the pooled rent cash flows as due to diversification it is well approximated by the expected value (maybe categorized into subclasses according to credit ratings). Of course, one would have to deal with problems such as adverse selection and the potentially hard task to estimate the correlation – which can have a severe impact on the value of the tranches (see my post here).
(b) Sport bets: Often these bets as random variables have a high probability of default, e.g., roughly 50% for a balanced win/loss bet). In order to reduce the risk due to diversification a rather large amount of cash has to be invested to obtain a reasonable risk profile. Again, securitizing those cash flows could create securities with more tailored risk profiles that could be of interest to people that are rather risk averse on the one hand and risk affine gamblers on the other hand.
That is the wonderful world of structured finance 😉
I am on my way to the 22nd Australasian Finance and Banking Conference 2009 in Sydney. So, what the hell, is a mathematician doing on a finance conference? Well, basically mathematics and in particular optimization and operations research. I am thrilled to see the current developments in economics and finance that take computational aspects, which ultimately limit the amount of rationality that we can get, into account (I wrote about this before here, here, and here). In fact, I am convinced that these aspects will play an important role in the future, especially for structured products. After all, who is going to buy a structure where it is impossible to compute the value? Not even to talk about other complications such as bad data or dangerous model assumptions (such as static volatilities and correlations which are still used today!). Most valuation problems though can be cast as optimization problems and especially the more complex structured products (e.g., mean variance optimizer) do explicitly ask for a solution to an optimization problem in order to be valuated. For the easier structures, Monte Carlo based approaches (or bi-/trinomial trees) are sufficient for pricing. As Arora, Barak, Brunnermeier, and Ge show in their latest paper, for more complex structures (e.g., CDOs) these approaches might fall short capturing the real value of the structures, due to e.g., deliberate tampering.
I am not going to talk about aspect of computational resources though: I will be talking about my paper “Optimal Centralization of Liquidity Management” which is joined work with Christian Schmaltz from the Frankfurt School of Finance and Management. The problem that we are considering is basically a facility location problem: In a large banking network, where and how do you manage liquidity? In a centralized liquidity hub or rather in smaller liquidity centers spread all over the network. Being short on liquidity is a very expensive matter, either one has to borrow money via the interbank market (which is usually dried up or at least tight in tougher economical conditions) or one has to borrow via the central bank. If both is not available, the bank goes into a liquidity default. The important aspect here is that the decision on the location and the amount of liquidity produced, is driven to a large extent by the liquidity demand volatility. In this sense a liquidity center turns into an option on cheap liquidity and in fact, the value of a liquidity center can be actually captured in an option framework. The value of the liquidity center is the price of the exact demand information – the more volatility we have, the higher this price will be and the more we save when we have this information in advance. The derived liquidity center location problem implicitly computes the prices of the options which arise as marginal costs in the optimization model. Here are the slides:
Another interesting, nicely written paper about valuating and pricing CDOs is “The Economics of Structured Finance” from Coval, Jurek, and Stafford which just appeared in the Journal of Economic Perspectives. It nicely complements the paper of Arora, Barak, Brunnermeier, and Ge titled “Computational Complexity and Information Asymmetry in Financial Products” (see also here). The authors argue that already small estimation errors in correlation and probability of default (of the underlying loans) can have devastating effect on the overall performance of a tranche. Whereas the senior tranches remain quite stable in the presence of estimation errors, the overall rating of the junior and mezzanine tranches can be greatly affected. Intuitively this is clear, as the junior and the mezzanine tranches act as a cushion for the senior tranches (and in turn the junior tranches are a protection of the mezzanine tranches). What is not so clear though at first is that this effect is so pronounced, i.e., smallest estimation errors lead to a rapid decline in credit quality of these tranches. In fact, what happens here is that the junior and mezzanine tranches pay the price for the credit enhancement of the senior tranches. And the stability of the latter with respect to estimation errors comes at the expense of highly sensitive junior and mezzanine tranches.
This effects becomes even more severe when considering CDO^2, where the loans of the junior and mezzanine tranches are repackaged again. These structures possess a very high sensitivity to slightest variations or estimation errors in the probability of default or correlation.
In both cases, slight impressions in the estimation can have severe impacts. But also, considering it the other way around, slight changes in the probability of default or the correlation due to changed economic conditions can have devastating effect on the value of the lower prioritized tranches.
So if you are interested in CDOs, credit enhancement, and structured finance you should give it a look.
Sorry for the long inactivity, but I am totally caught up in end-of-year wrapping up. I have to admit that I find it quite dissatisfying when you realize the year is almost over and there are still so many things on your list that should have been done… So the last months of the year somehow always end up in total chaos…. anyways…
I came across a very interesting, recent paper (via Mike’s blog post – read also his excellent post) by Arora, Barak, Brunnermeier, and Ge with the title “Computational Complexity and Information Asymmetry in Financial Products” – the authors also provide an informal discussion of the relevance for derivative pricing in practice. As I worked with structured products myself for some time, this paper raised my interest and if you are interested in the trade-off between, say, full rationality and bounded-rationality when it comes to pricing, you should definitely give it a look as well.
The paper deals with the effect of information asymmetry between the structuring entity (which is often also the seller) and the buyer of a structure. The considered derivatives in the paper are CDO like structures and, running the risk of over-simplification, the main points are as follows:
- Having a set of assets that should be structured into derivatives, here CDOs, a malicious structurer can hide a significant amount of junk assets when assigning the assets to the derivatives. More precisely, the structurer can ensure that the junk assets are overrepresented in a certain subset of the derivatives to be structured which significantly deteriorates their value.
- A buyer with full-rationality (which can here perform exponential time computations) can actually detect this tampering by testing all possible assignment subsets and verifying that there is actually an/no over-representation.
- On the other hand, a buyer with limited computational resources, say which is only capable of performing polynomial time computations (the standard assumption when considering efficient algorithms that behave well in the size of the input) cannot detect that the assignments of the assets to the derivatives has been tampered.
- Under some additional assumptions, the tampering is even ex post undetectable.
- The authors propose different payoff function that are more resistant to tampering in the sense that heavy, detectable tampering is needed to skew the payoff profile significantly.
Now the authors devise a model similar to Akerlof’s lemons. Stated in a simplified way, the buyer, knowing that he cannot detect the tampering, will assume that a tampering has been performed and is only willing to pay the adjusted price factoring in the potential tampering of the structure – adverse selection. The honest structurer is not willing to sell his derivatives for the reduced price and leaves the market. This effect, based on the information asymmetry between buyer and seller (which was exemplified in Akerlof’s paper using the market of used cars) in the classical setting would lead to a complete collapse of the market as it would repeat ad infinitum until nobody would be left willing to trade. Countermeasures stopping this vicious circle are warranties in the case of the cars. The variant considered here for the structured products will likely converge to the point where the maximum amount of tampering has been performed and buyers and sellers expectations or levels of information are aligned.
What particularly fascinated me is the type of problem encoded to establish intractability. Contrary to the classical NP-hard problems known in optimization that mostly ask for some kind of an optimal combinatorial solution, the authors use the densest subgraph problem/assumption which asserts that deciding between two random distributions (here the fair one and the tampered one) cannot be done in polynomial time (provided that the tampering is not too obvious). In particular:
Densest subgraph problem. Let be a bipartite graph with out-degree for all vertices in . The densest subgraph problem for is to distinguish between the two distributions:
- which is obtained by choosing for every vertex in an amount of neighbors in randomly. (what would be the fair assignment)
- which is obtained by first choosing and with and , and then choosing neighbors for every vertex outside of , and random neighbors for every vertex in . Then we choose random additional neighbors in for every vertex in . (which means that we choose some assets and some derivatives a priori and we prefer to add edges between those sets — slightly simplified. On the rest we do random assignments)
Then the densest subgraph assumption states that whenever, as functions of are chosen sufficiently moderate, then we cannot distinguish between those two distributions, i.e., we cannot detect the tampering with a polynomial time algorithm:
Densest subgraph assumption. Let be such that , then there is no and poly-time algorithm that distinguishes between and with advantage .
Note that the vertices correspond to the structures and the to the underlyings/assets. Although asymptotically intractable, what would be interesting to know is what one can do in practice for reasonable instance sizes, i.e, up to which degree one would be actually able to detect tampering. As Mike already said:
In particular, if a group put out 1000 groupings of financial instruments, and I needed to solve the densest subgraph problem on the resulting instance, I would work very hard at getting an integer program, constraint program, dynamic program, or other program to actually solve the instance (particularly if someone is willing to pay me millions to do so). If the group then responded with 10,000 groupings, I would then simply declare that they are tampering and invoke whatever level of adverse selection correction you like (including just refusing to have anything to do with them). Intractable does not mean unsolvable, and not every size instance needs more computing than “the fastest computers on earth put together”.
Another point might be that there are potentially billions of ways of tampering structured products. Especially when the payoff profiles are highly non-linear (e.g., FX-ratchet swaps with compounding coupons) deliberate over-/underestimation of parameters might completely change the valuation of the structures. The proposed framework highlights that there might be ways of tampering that we cannot detect in the worst case, even ex-post (under additional assumptions). But before we can actually detect tampering we have to be aware of this kind of tampering and we have a real problem if tampering is undetectable ex post – how to prove it? This is in some sense related to the stated open question 3: Is there an axiomatic way of showing that there are no tamper-proof derivatives – slightly weakened: with respect to ex-post undetectability.
I could also very well imagine that when giving a closer look to traded structures (especially the nasty OTC ones), that there will be more pricing problems that are essentially intractable. It is almost like one of the main hurdles so far to establish intractability was the more stochastical character of prizing problems while hardness is often stated in terms of some kind of combinatorial problem. An approach like the one proposed in the article might overcome this issue by establishing hardness via distinguishing two distributions.
Rick Bookstaber recently argued that the arms race in high frequency trading, a form of quantitative trading where effectively time = money ;-), results in a net drain of social welfare:
A second reason is that high frequency trading is embroiled in an arms race. And arms races are negative sum games. The arms in this case are not tanks and jets, but computer chips and throughput. But like any arms race, the result is a cycle of spending which leaves everyone in the same relative position, only poorer. Put another way, like any arms race, what is happening with high frequency trading is a net drain on social welfare.
It is all about milliseconds and being a tiny little bit faster:
In terms of chips, I gave a talk at an Intel conference a few years ago, when they were launching their newest chip, dubbed the Tigerton. The various financial firms who had to be as fast as everyone else then shelled out an aggregate of hundreds of millions of dollar to upgrade, so that they could now execute trades in thirty milliseconds rather than forty milliseconds – or whatever, I really can’t remember, except that it is too fast for anyone to care were it not that other people were also doing it. And now there is a new chip, code named Nehalem. So another hundred million dollars all around, and latency will be dropped a few milliseconds more.
In terms of throughput and latency, the standard tricks are to get your servers as close to the data source as possible, use really big lines, and break data into little bite-sized packets. I was speaking at Reuters last week, and they mentioned to me that they were breaking their news flows into optimized sixty byte packets for their arms race-oriented clients, because that was the fastest way through network. (Anything smaller gets queued by some network algorithms, so sixty bytes seems to be the magic number).
Although high-frequency trading is basically about being fast and thus time is the critical resource, in quantitative trading, in general, it is all about computational resources and having the best/smartest ideas and strategies. The best strategy is worthless if you lack the computational resources to crunch the numbers and, vice versa, if you do have the computational power but no smart strategies this does not get you anywhere either.
Jasmina Hasanhodzic, Andrew W. Lo, Emanuele Viola argue in their latest paper “A Computational View of Market Efficiency” that efficiency in markets has to be considered with respect to the level of computational sophistication, i.e., as market can (appear to) be efficient for those participants which use only a low level of computational resources, whereas it can be inefficient for those participants that invest a higher amount of computational resources.
In this paper we suggest that a reinterpretation of market efficiency in computational terms might be the key to reconciling this theory with the possibility of making profits based on past prices alone. We believe that it does not make sense to talk about market efficiency without taking into account that market participants have bounded resources. In other words, instead of saying that a market is “efficient” we should say, borrowing from theoretical computer science, that a market is efficient with respect to resources S, e.g., time, memory, etc., if no strategy using resources S can generate a substantial profit. Similarly, we cannot say that investors act optimally given all the available information, but rather they act optimally within their resources. This allows for markets to be efficient for some investors, but not for others; for example, a computationally powerful hedge fund may extract profits from a market which looks very efficient from the point of view of a day-trader who has less resources at his disposal—arguably the status quo.
More precisely, it is even argued that the high-complexity traders gain from the low-complexity traders (of course, within the studied, simplified market model – but nonetheless!!):
The next claim shows a pattern where a high-memory strategy can make a bigger profit after a low-memory strategy has acted and modified the market pattern. This profit is bigger than the profit that is obtainable by a high-memory strategy without the low-memory strategy acting beforehand, and even bigger than the profit obtainable after another high- memory strategy acts beforehand. Thus it is precisely the presence of low-memory strategies that creates opportunities for high-memory strategies which were not present initially. This example provides explanation for the real-life status quo which sees a growing quantitative sophistication among asset managers.
Informally, the proof of the claim exhibits a market with a certain “symmetry.” For high-memory strategies, the best choice is to maintain the symmetry by profiting in multiple points. But a low-memory strategy will be unable to do. Its optimal choice will be to “break the symmetry,” creating new profit opportunities for high-memory strategies.
So although in pure high-frequency trading, the relevance of smart strategies might be smaller and thus it is more (almost only?) about speed, in general quantitative trading it seems like (again in the considered model) that the combination of strategy and high computational resources might generate a (longer-term) edge. This edge cannot necessarily be compensated with increased computational resources only, as you still need to have access to the strategy. The considered model considers memory as a the main computational/limiting resource. One might argue that it reflects the sophistication of the strategy along with the real computational resources implicitly, as limited memory might not be able to hold a complex strategy. On the other hand a lot of memory is pointless without a strategy using it. So both might be considered to be intrinsically linked.
An easy example illustrating this point is maybe the following. Consider the sequence “MDMD” and suppose that you can only store, say these 4 letters. A 4-letter-strategy might predict something like “MD” for the next two letters. If those letters though represent the initial of the weekdays (in German), the next 3 letters will be “FSS”. It is impossible though to predict this sequence solely using information about the past on the last 4 letters. The situation changes if we can store up to 7 letters “FSSMDMD”. Then a prediction is possible.
One point of the paper is now that the high-complexity traders might fuel their profits by the shortsightedness of the low-complexity traders. And thus an arms race might be a consequence (to exploit this asymmetry on the one hand and to protect against exploitation on the other). To some extent this is exactly what we are seeing already when traders with “sophisticated” models, that for example are capable of accounting for volatility skew, arbitrage out less sophisticated traders. On the other hand, it does not help to use a sophisticated model (i.e., more computational resources) if one doesn’t know how to use it, e.g., a Libor market model without an appropriate calibration (non-trivial) is worthless.
Recently browsing the internet I found google insights, somewhat the bigger brother of google trends. There you can compare not only the trends of certain words but you can also split the results into various time / location buckets and compare them. For example you can compare the searches run in the US for “carribean” to the ones for “recession” from Jan 2007 until today resulting in something like this (blue -> carribean / red -> recession):
One can see that queries for “carribean” already started to drop in Jan 2008 and dropped significantly further in Sep 2008 while the ones for recession started to significantly rise in Aug/Sep 2008. In hindsight it is easy to see patterns – just search long enough – and it is not clear if they constitute any correlation.
Further, while interesting for a lot of applications, historical information is not well suited for making predictions. But there are also other services such as twitter and facebook out there where users pour in tons of data in real time. Especially the latter can be easily searched in real time for trends and phrases as well. New information is quickly propagated through the network and made available to millions of people combing for specific phrases such as, e.g., “oil” or “oil price”. The following trend search is from Twist, a twitter trend service. For any point in time a click reveals the post written – everything updated in real time:
Now, having information on price changes and “market research” available even faster and more immediate than ever before (and not only for those with Bloomberg or Reuters access) one might suspect that the volatility in the market increases as people might act more impulsively and emotionally (as often claimed e.g., in behavorial finance) especially if prices go down. Having a delay in the information processing chains smoothens the trading behavior, effectively reducing volatility. If these delays are reduced to instanteneous information availability (short term) volatility increases.
Another, maybe even more critical problem could be that using twitter and other mass-publication-of-micro-information services the pump-and-dump strategies for microcaps, which are illegal (under most jurisdictions), can be performed even more effectively than ever before. As spam filters got more and more effective pump-and-dump via spamming got harder and harder. But with micro-blogging the whole story changes. By definition there are no spammers as you follow somebody and you do not get unsolicitated emails/spam. Due to this there might be some special legal issues here that deserve extra attention: When somebody is writing a tweet to spread wrong information concealed as “personal opinion” and millions eavesdrop, can the person be held responsibility if wrong information leads, e.g., to a fire sale? The story goes even a bit further: other people might re-tweet or copy the story multiplying the number of readers and adding credibility to it as more and more versions of it are out there (a turbo-charged version of the Matthew effect and its generalizations) – who is going to reconstruct the time line when money is at stake and a decision has to be taken now?
Probably soon hedge funds will pop up trading these noises by mining millions of tweets for signals trying to extract some cash from the market.