Sebastian Pokutta's Blog

Mathematics and related topics

Archive for July 2009

GLPK 4.39 released / Gusek update available

leave a comment »

A new version of the GNU Linear Programming Kit (GLPK) has been released yesterday. From the release notes:

GLPK 4.39 — Release Information

Release date: Jul 26, 2009

GLPK (GNU Linear Programming Kit) is intended for solving large-scale
linear programming (LP), mixed integer linear programming (MIP), and
other related problems. It is a set of routines written in ANSI C and
organized as a callable library.

In this release:

The following new API routines were added:

glp_warm_up           “warm up” LP basis
glp_set_vertex_name   assign (change) vertex name
glp_create_v_index    create vertex name index
glp_find_vertex       find vertex by its name
glp_delete_v_index    delete vertex name index
glp_read_asnprob      read assignment problem data in DIMACS
glp_write_asnprob     write assignment problem data in DIMACS
glp_check_asnprob     check correctness of assignment problem
glp_asnprob_lp        convert assignment problem to LP
glp_asnprob_okalg     solve assignment problem with the
out-of-kilter algorithm
glp_asnprob_hall      find bipartite matching of maxumum
cardinality with Hall’s algorithm

Also were added some API routines to read plain data files.

The API routines glp_read_lp and glp_write_lp to read/write
files in CPLEX LP format were re-implemented. Now glp_write_lp
correctly writes double-bounded (ranged) rows by introducing
slack variables rather than by duplicating the rows.

Also a new version of Gusek including GLPK 4.39 has been released.


Written by Sebastian

July 27, 2009 at 8:38 pm

Posted in Software

The Netflix Prize – shootout on the finish line

leave a comment »

From a New York times article:

After nearly three years and entries from more than 50,000 contestants, a multinational team says that it has met the requirements to win the million-dollar Netflix Prize: It developed powerful algorithms that improve the movie recommendations made by Netflix’s existing software by more than 10 percent.

The online movie rental service uses its Cinematch software to analyze each customer’s film-viewing habits and recommends other movies that customer might enjoy. Because accurate recommendations increase Netflix’s appeal to its customers, the movie rental company started a contest in October 2006, offering $1 million to the first contestant that could improve the predictions by at least 10 percent.

On June 26th, 2009 the team “BellKor’s Pragmatic Chaos” were the first to submit a solution that improves more than 10% over the cinematch algorithm used by Netflix to match customers and movies. This triggered a final 30 days period in which other teams have the chance to beat that submission – the final winners will be the one with the best improvement. It looked very much like the BellKor’s Pragmatic Chaos would be the winners until yesterday… but then, suddenly a new team “The Ensemble” popped up out of nothingness (more or less – it is actually a collaborative efforts of several other teams) and made a submission on July 25th, 18:32:29 which outperforms the one of  BellKor’s Pragmatic Chaos by a tiny fraction. Snapshot of the leaderboard:

Picture 2

Given that the 30 days period was triggered on June 26th and depending on day counting convention this looks very much like a shootout on the finish line. Maybe, who knows, there is another team lurking in the dark making a last minute submission? Stay tuned!

Update 26.07.2009: We have new submissions and the match continues:

Picture 2

Update 26.07.2009: Game over

Contest Closed

Thank you for your interest in the Netflix Prize.

We are delighted to report that, after almost three years and more than 43,000 entries from over 5,100 teams in over 185 countries, the Netflix Prize Contest stopped accepting entries on 2009-07-26 18:42:37 UTC. The closing of the contest is in accordance with the Rules — thirty (30) days after a submitted prediction set achieved the Grand Prize qualifying RMSE on the quiz subset.

Team registration, team updates and the dataset download are also closed. The Contest Forum and Leaderboard remain open.

Qualified entries will be evaluated as described in the Rules. We look forward to awarding the Grand Prize, which we expect to announce in a few weeks. However if a Grand Prize cannot be awarded because no submission can be verified by the judges, the Contest will reopen. We will make an announcement on the Forum after the Contest judges reach a decision.

Once the Grand Prize is awarded, the ratings for the qualifying set will be released and the combined training data and qualifying sets will become available upon request at the Machine Learning Archive at UC Irvine.

Thank you again for your interest in the Netflix Prize. Keep checking this site for updates in the coming weeks.

Update 26.07.2009: There are several rumors spreading that the final winner is not yet determined as the score posted online is the one for a data set that is used for reporting the performance (only), whereas netflix uses a different one internally to do the real performance judgment. From the FAQ:

Why this whole quiz/test subset structure? Why not reveal a submission’s RMSE on the test subset?

We wanted a way of informing you and your competitive colleagues about your progress toward a prize while making it difficult for you to simply train and optimize against “the answer oracle”. We also wanted a way for the judges to determine how robust your algorithm is. So we have you supply nearly 3 million predictions, then tell you and the world how you did on one half (the “quiz” subset) while we judge you on how you did on the other half (the “test” subset), without telling you that score or which prediction you make applies to which subset.

So it is possible that the story looks different on the “test” subset, especially given that both teams were so close together.

If you are interested in the math behind it, then have a look here! At the end of that article you will find additional links.

Written by Sebastian

July 26, 2009 at 10:11 am

Are twitter and friends increasing volatility in the market?

leave a comment »

Recently browsing the internet I found google insights, somewhat the bigger brother of google trends. There you can compare not only the trends of certain words but you can also split the results into various time / location buckets and compare them. For example you can compare the searches run in the US for “carribean” to the ones for “recession” from Jan 2007 until today resulting in something like this (blue -> carribean / red -> recession):

Picture 2

One can see that queries for “carribean” already started to drop in Jan 2008 and dropped significantly further in Sep 2008 while the ones for recession started to significantly rise in Aug/Sep 2008. In hindsight it is easy to see patterns – just search long enough – and it is not clear if they constitute any correlation.

Further, while interesting for a lot of applications, historical information is not well suited for making predictions. But there are also other services such as twitter and facebook out there where users pour in tons of data in real time. Especially the latter can be easily searched in real time for trends and phrases as well. New information is quickly propagated through the network and made available to millions of people combing for specific phrases such as, e.g., “oil” or “oil price”. The following trend search is from Twist, a twitter trend service. For any point in time a click reveals the post written – everything updated in real time:

Picture 3

Now, having information on price changes and “market research” available even faster and more immediate than ever before (and not only for those with Bloomberg or Reuters access) one might suspect that the volatility in the market increases as people might act more impulsively and emotionally (as often claimed e.g., in behavorial finance) especially if prices go down. Having a delay in the information processing chains smoothens the trading behavior, effectively reducing volatility. If these delays are reduced to instanteneous information availability (short term) volatility increases.

Another, maybe even more critical problem could be that using twitter and other mass-publication-of-micro-information services the pump-and-dump strategies for microcaps, which are illegal (under most jurisdictions), can be performed even more effectively than ever before. As spam filters got more and more effective pump-and-dump via spamming got harder and harder. But with micro-blogging the whole story changes. By definition there are no spammers as you follow somebody and you do not get unsolicitated emails/spam. Due to this there might be some special legal issues here that deserve extra attention: When somebody is writing a tweet to spread wrong information concealed as “personal opinion” and millions eavesdrop, can the person be held responsibility if wrong information leads, e.g., to a fire sale? The story goes even a bit further: other people might re-tweet or copy the story multiplying the number of readers and adding credibility to it as more and more versions of it are out there (a turbo-charged version of the Matthew effect and its generalizations) – who is going to reconstruct the time line when money is at stake and a decision has to be taken now?

Probably soon hedge funds will pop up trading these noises by mining millions of tweets for signals trying to extract some cash from the market.

Written by Sebastian

July 24, 2009 at 8:53 pm

Mendeley revisited

with 5 comments

mendeleyAs promised roughly half a year ago, here is my opinion on Mendeley. I was reminded of writing my review two days ago after I downloaded the latest version of Mendeley. After the update I fired up Mendeley and I was shocked (at first) – ALL MY DATA WAS GONE. Although I started out to test Mendeley only, I got so used to it that I couldn’t believe that my data was gone. The desktop client just asked me to log in but all the categories, groups etc were gone. More or less out of desperation I logged in (again) and the software immediately synchronized with the web service which has a backup of all the references (I chose to just save the references and not the pdfs). After less than a minute all the references were transferred back to my machine and even the links to the files on my hard drive worked. I was quite impressed with this backup/synchronization feature so that I decided that it was time to keep my promise and write a few lines about it.

So first of all, what is Mendeley? From the FAQ:

Mendeley is a combination of a desktop application and a website which helps you manage, share and discover both content and contacts in research.

Our software, Mendeley Desktop, offers you:

  • Automatic extraction of document details (authors, title, journal etc.) from academic papers into a library database, which saves you a lot of manual typing! As more people use Mendeley, the quality of the data extraction improves.
  • Super-efficient management of your papers: “Live” full-text search across all your papers – the results start to appear as you type! Mendeley Desktop also lets you filter your library by authors, journals or keywords. You can also use document groups, notes and tags to organize your knowledge, and export the document details in different citation styles.
  • Sharing and synchronisation of your library (or parts of it) with selected colleagues. This is perfect for jointly managing all the papers in your lab!
  • More great features: A plug-in for citing your articles in Microsoft Word, OCR (image-to-text conversion, so you can full-text search all your scanned PDFs) and lots more new features being worked upon.

Our website, Mendeley Web, complements Mendeley Desktop by offering you these features:

  • An online back up of your library: Store your documents in your account and access them from anywhere through your browser.
  • Detailed statistics of all things interesting: You can upload your own publications to your research profiles, then track the evolution of your readership. How often are your papers downloaded? How often are they read? From which academic disciplines and geographic regions do your readers come from? Additionally, there are detailed statistics for each academic discipline and research topic. Who are the up-and-coming authors in your discipline? Is the interest in a research topic growing or declining? What are the most widely read papers on a specific subject?
  • A research network that allows you to keep track of your colleagues’ publications, conference participations, awards etc., and helps you discover people with research interests similar to yours.
  • A recommendation engine for papers that might interest you, but are not yet in your library! Based on what you know already, what should you read next? Coming soon

To be honest, in the beginning when I first started to use Mendeley in Jan 2009 it did not quite convince me. The interface of the desktop software was slow, the meta-data extraction was quite crappy, etc. For short, I just didn’t like the user experience. I knew Papers from before but it didn’t convince me either and so I stuck with the one that was free – Mendeley. Then, over time more and more updates for Mendeley were released. The user interface got better and also the meta-data extraction improved significantly – nobody wants to enter all the meta-data by hand… The current version of Mendeley is 0.9.xx. It is still beta but the latest version is really a great piece of software. Some of the features:

  1. Grouping and categorizing of your papers – you can slice and dice it in almost any way.
  2. Shared groups – you are collaborating with somebody and you want to share your papers. Mendeley can do that for you. One guy adds a new paper, the other guy gets it automatically the next time (s)he uses Mendeley.
  3. Full text search – search within all your papers. fast.
  4. Integrated pdf viewer that allows annotation, notes, etc.
  5. DOI / arxiv support – Add the corresponding references and Mendeley updates meta-data automatically. No need to enter anything.
  6. Statistics functions (via the web service) – you can get detailed statistics about the papers in your library. Most-read journals, more-read authors, etc.
  7. Every entry can have one or more pdfs (or other formats) attached to it – One click opens the paper in the built-in pdf viewer.
  8. It is free – according to the faq, the current service level will remain free. There might be some premium features coming up at some point that will be charged for extra.
  9. Backup/synchronization with the online service – all our references and notes are synchronized with the web service so that you can use your library on different machines and you always have a backup.
  10. You can log-in to your Mendeley account from all over the world and access your references on the web (part of the synchronization feature).
  11. Meta-data extraction – The extraction works quite well, conditional on the pdfs being somewhat reasonably tagged.

So if you are still looking for a great software for organizing your papers with a great overall concept then you should definitely give it a try!!

Written by Sebastian

July 18, 2009 at 9:16 pm

Some updates from the EURO (updated on-the-go)

leave a comment »

Being on the EURO in Bonn, I decided to post a few pictures / impressions from the conference.

Monday, 06.07.2009:

Here are a few pictures from the opening session to start with. The morning streams were pretty empty so far (as expected?) but I suppose today’s plenary talks will be pretty well attended.

Here are two pictures from the plenary talk which were given by Reinhard Selten and Christos Papadimitriou. The first one is from Reinhard Selten’s talk on “Experimental Results on the Process of Goal Formation and Aspiration Adaptation”.

Reinhard SeltenThe second one is from Christos Papadimitriou’s talk on “Computing Equilibria”. He also talked about a current research project that compared genetic algorithms to simulated annealing and that came with its very own message (see the picture) 😉 :

Christos Papadimitriou

I highly recommend the post by Mike about Christos’ talk – including the details on Christos’ question

Why do Simulated Annealing algorithms work, while Genetic Algorithms don’t?

and “Darwin’s subtle mistake”.

Tuesday, 07.07.2009:

IMG_0165I just learned a hell-lot about revenue management in hotels: Every day you get these small little Nik Nak’s for free. You will find them conveniently placed next to your bed so that, when waking up after a nap, you are tempted to eat them.

They look extremely innocent but they are not. Once eaten – especially if you eat those kept from the days before – you get really thirsty. I do not know how they manage to make them that salty.

You start looking around in your room and what do you find?

A bottle of water.

A bottle of water. Nothing fancy. Less than a liter. Non-carbonated. The complementary bottle of water I thought. So I took the bottle from the desk and right before opening it I saw that the lovely tag around the neck of the bottle was actually not a “welcome tag” but a price tag:



For less than a liter of water. What is that? A mock-up of 5000%? That is the warmest welcome I had in ages!!

Hey guys, we are facing this “condition” right now. It is called recession!! That means that people do not buy water for  7 Euros when they can get a whole meal for that much money!

Today Noam Nisan (see also his report on the EURO on his blog) gave a great talk about google’s tv ad business and how they allocate the slots using game theory and ascending auctions (see his paper).

Wednesday, 08.07.2009:

The last day of the EURO 2009. Today I gave my presentation on the membership problem over the \{0,\frac{1}{2}\}-closure for 0/1 polytopes (joint work with A.N. Letchford and A.S. Schulz). By extending Caprara and Fischetti’s 1996 result one can show that the problem remains (strongly) coNP-complete even for 0/1 polytopes. That is kind of surprising as over all the other well-known closure operators acting on 0/1 polytopes (e.g., lift and project, Sherali-Adams hierarchy, Lovász-Schrijver closures (N_0, N, N^+), and Lasserre hierarchy) one can separate in polynomial time. Further Bienstock and Zuckerberg showed in 2003 that under some reasonable assumptions the more general Gomory-Chvátal closure (of fixed rank) can be approximated well in polynomial time for covering problems.

Mike Trick also gave an excellent talk today on scheduling the baseball league. The underlying scheduling problems seem to be very hard indeed. Mike told us that even for small instances – say with 12 teams – the optimal solution is still unknown and although significant research efforts the progress seems to be slow. In fact Mike is maintaining a set of instances of these scheduling problems so that you can try your luck.

Written by Sebastian

July 6, 2009 at 9:09 am

A great lecture series on “Computational Complexity and Quantum Computation”

leave a comment »

A great lecture series on “Computational Complexity and Quantum Computation” by Timothy Gowers is available online at the University of Cambridge. The lecture is actually available in several formats including one suitable for iPhones and iPods so that you could even watch them on the go ;-). The next time you are bored while waiting in line you know what to do… (that is probably the natural extension of listening to audio books while driving)

Especially the part on Razborov’s result is interesting. The known exponential size lower bounds on the size of Gomory-Chvátal cutting plane proofs are based on this result and monotone interpolation, i.e., splitting up a proof of infeasibility into two for certain subsystems using only monotone operations. Timothy Gowers tries to provide a motivation *how* one actually can come up with a construction as Razborov’s. See also his great overview article on Razborov’s proof and its motivation (and his blog post on both).  I bet that the part on quantum computing is great as well but I have to admit that I haven’t been waiting in lines often enough recently…

Computational complexity is the study of what resources, such as time and memory, are needed to carry out given computational tasks, with a particular focus on lower bounds for the amount needed of these resources. Proving any result of this kind is notoriously difficult, and includes the famous problem of whether P = N P . This course will be focused on two major results in the area. The first is a lower bound, due to Razborov, for the number of steps needed to determine whether a graph contains a large clique, if only “monotone” computations are allowed. This is perhaps the strongest result in the direction of showing that P and N P are distinct (though there is unfortunately a very precise sense in which the proof cannot be developed to a proof of the whole conjecture). The second is Peter Shor’s remarkable result that a quantum computer can factorize large integers in
polynomial time. In order to present these two results, it will be necessary to spend some time discussing some of the basic concepts of computational complexity, such as the relationship between Turing machines and the more obviously mathematical notion of circuit complexity, and an introduction to what a quantum computation actually is. For the latter, no knowledge of quantum mechanics will be expected, and scarcely any will be imparted during the course: it is possible to understand quantum computation in a very “pure mathematics” way. The reason this is a graduate course rather than a Part III course is that I intend to give several lectures in an informal style that would be hard to examine. It is not because the material will be more advanced: indeed, my aim will be to make allowances for the fact that people will not be working on it with an exam in mind, and to make the course as easy to follow as I can. Having said that, the main results will be proved in full: the informal discussion will be with a view to making these proofs more comprehensible.

The collection will have 12 graduate level lectures which are currently being given during the Easter term 2009. Many thanks to Adrian Callum-Hinshaw for his help with these video lectures.

Written by Sebastian

July 1, 2009 at 11:19 pm