Sebastian Pokutta's Blog

Mathematics and related topics

Archive for September 2010

Long time no see

with one comment

It has been quite a while that I wrote my last blog post; the last one that really counts (at least to me) was back in February. As pointed out at some point it was not that I was lacking something to write about but more that I did not want to “touch” certain topics. That in turn made me wonder what a blog is good for when, in fact, one is still concerned about whether to write about certain topics. So I got the feeling that in the end, all this web 2.0 authenticity, all this being really open, direct, authentic, etc. is nothing else but a (self-) deception. On the other hand, I also did not feel like writing about yet another conference. I have to admit that I have been to some really crappy conferences lately and since I did not have anything positive to say I preferred to not say anything at all. There were a few notable exceptions, e.g., the MIP or IPCO. Another thing that bothered me (and still does) is the dilution of real information with non-sense. In fact I have the feeling that the signal-to-noise ratio considerably dropped over the last two years and I didn’t want to add to this further. This feeling of over-stimulation with web 2.0 noise seems to be a global trend (at least this is my perception). Many people gave up their blogs or have been somewhat neglecting them. Also maintaining a blog with say weekly posts (apart from passing on a few links or announcements) takes up a lot of time. Time that arguably could be better invested into doing research and writing papers.

Despite those issues or concerns I do believe that the web with all its possibilities can really enhance the way we do science. As with all new technologies one has to find a modus operandi that provides positive utility. In principle the web can provide an information democracy/diversification, however any “democratic endeavor” on the web has a huge enemy. The Matthew effect (or commonly known as “more gains more”). This term, coined by R.K. Merton, derives its name from the following biblical Gospel of Matthew (see also wikipedia):

For to all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away. — Matthew 25:29, New Revised Standard Version

In principle it states that the “rich get richer” while the “poor get poorer”. If we think of the different social networks (facebook, myspace, friendster) it refers to the effect that the one that has the largest user basis is going to attract more people than the one with a smaller one. In the next “round” this effect is then even more pronounced until the smaller competitor virtually ceases to exist. In the real-world this effect is often limited due to various kinds of “friction”. There might be geographic limitations, cultural barriers etc., that do wash out the advantage of the larger one so that the compounding nature of the effect is slowed down or non-existent (this hold even true in the highly globalized world we live in). That is the reason why dry cleaners, bakeries, and other forms of local business are not outperformed by globalized companies (ok, some are). In the context of the internet however there is often no inhibitor to the Matthew effect. It often translates into some type of preferential attachment although with the difference that the overall user basis is limited so that the gain of one party is the loss of another (preferential attachment processes are usually not zero-sum).

So what does this mean in the context of blogs? Blog reading is to a certain extent zero-sum. There is some limited amount of time that we are willing to spend reading blogs. Those with a large user basis will have more active discussions and move higher in the priority list for reading. In the end the smaller ones might only have a handful of readers making it hard to justify the amount of time spent writing the posts. Downscaling the frequency of posts might even pronounce the effect as it might be perceived as inactivity. One way out of this dilemma could be any form of joining the smaller units to larger ones, i.e., either “digesting” several blogs to a larger one or alternatively “shared blogging”. I haven’t made up my mind yet what (if!) I am going to do about this. But I guess, in the end some type of consolidation is inevitable.

Having bothered you with this abstruse mixture of folklore, economics, and internet, I actually intended to write about something else but somewhat related today: About deciding whether and when to dump a project. This problem is very much inspired by my previous experiences as a consultant and recent decisions about academic projects. More precisely, suppose that you have a project and you have an estimate for the overall time of the project. At some point you want to review the progress and based on what you see at this point you want to make a call whether or not you will abandon the project. The longer you wait with your review the better your information is that you gain from the review. On the other hand you potentially wasted too much time and resources to increase the confidence in your decision. In fact it might make even sense you not start a project at all. Suppose that you have an a priori estimate for the probability of success of your project, say p. Further let r(t) denote our function of erring, i.e., r(0) = 1/2 and r(1) = 0 which means that at time t= 0 we do not have any information yet and so we can only guess leading to guessing wrong with probability 50% and at time t = 1 we have perfect information. Let t denote the point in time at which we review the project (as a fraction of the overall time, here assumed to be 1). We have four cases to consider (one might opt for a different payoff function; the following one resembles my particular choice):

  1. The project is going to be successful and at the point of reviewing we guessed right, i.e., we went through with it. In this case the benefit is s. This happens with probability (1-r(t)) p and expected payoff for this scenario is: (1-r(t)) p s. [alternatively one could consider the benefit s – t; or something else]
  2. The project is going to be successful and at the point of reviewing we guessed wrong, i.e., we dropped the project. In this case the benefit is – (t + s), i.e., we lose our investment up to that point (here with unit value) and the overall benefit. Probability is r(t) p and expected payoff – r(t) p (t+s).
  3. The project is going to fail and we guessed right: Benefit -t, i.e., the investment so far. Expected payoff – (1-r(t)) (1-p) t.
  4. The project is going to fail and we guessed wrong, i.e., we went through with it: Benefit -T, were T is some cost for this scenario. Expected payoff – r(t) (1-p) T.

All in all we have the following expected overall payoff as a function of t:

\mathbb E(t) = -[(1-r(t))p (-s) + (1-r(t))(1-p) t + r(t)p(t+s) + r(t)(1-p) T]

Next we have to define our function which models our increase in confidence. I opted for a function that gains information in a logarithm fashion, i.e., in the beginning we gain confidence fast and then we have a tailing-off effect:

r_k(t) := \frac{1}{2} \frac{(1 + \log(k)}{(-\log(k) + \log(1 + k)))} - \frac{\log(k + t)}{(2 (-\log(k) + \log(1 + k)))}

The parameter k can be understood as the rate of learning. For example for k = 0.01 it looks like this:

Assuming s = 1 and T = 1, i.e., the payoffs are basically the invested time and p = 30%, the expected payoff as function of the time of review t looks like this (payoff: blue line, error rate: red line):

The maximum payoff is reached for a review after roughly 20% of the estimated overall time. However it is still negative. This suggests that we do not learn fast enough to perform a well-informed decision. For example for k = 0.001, the situation looks different:

The optimal point for a review is after 14% of the estimated project time. Having once estimated your rate of learning, one can also determine which projects one should not get involved with at all. For k = 0.001 this is the case when the probability of success p is less than roughly 27%.

Although this model is somewhat very simple it provides some nice qualitative (and partly quantitative) insights. For example that there are indeed projects that you should not even get involved with; this is somewhat clear from intuition but I was surprised that the probability of success of those is still quite high. Also, when over time your rate of learning increases (due to experience with other projects) you can get involved with more risky endeavors because your higher review confidence allows you to purge more effectively. For example when k goes down to, say, k = 0.00001 (which is not unrealistic as in this case shortly after the beginning of the project you would only err with around 20%) you could get involved with projects that only have a probability of success of 19%.

And no complaints about the abrupt ending – I consumed my allocated blogging time.