Hi there, IOI folks!

As you may or may not know, the major issue with day 1 of IOI 2013 was a system malfunction with an unlucky consequence: for most of the contest, the contestants were receiving virtually no feedback.

Yes, we are aware that the results of day 1 have been significantly influenced by these issues. The problem we had to solve this evening was not to come up with a fair solution to these issues. There is no fair solution. All we can hope for is a solution that minimizes the damage.

After a long and thorough discussion in the IOI General Assembly, and as a result of a series of votes, the decision was taken that the results of day 1 will be kept. That is, the final result will be the sum of day 1 and day 2, as always.

(However, note that there is a theoretical possibility that some results of day 1 will change as the result of individual appeals. This will only be known after the appeals session of day 2.)

On the rest of this page, I would like to give you some more insight into why the International Scientific Committee recommended to keep the scores intact. Our main concern was to determine how much damage was caused by not giving out feedback. Only once you know that, you can start arguing whether changing the results (e.g., giving a smaller weight to day 1, or canceling day 1 completely) causes more or less damage.

For two of the tasks (Dreaming and Wombats) we were not too concerned. The only bias for these tasks was that the contestants who submitted during the first hour (or so, before the system went down) did get feedback. Still, for both of these tasks the contestants were given complete information -- in the sense that, given enough time, they should have been able to test their solutions on their own machines. Yes, it's harder, yes, it's not what we originally intended, but at least in theory it can be done. Also, the clearly defined subtasks make it reasonable to estimate one's score.

The Art Class problem was a different story. In this problem, the feedback for submissions was supposed to be a part of the solving process, giving the contestants the information on what works and what does not, and allowing them to tweak their submissions. And the reason why this problem felt special is the special nature of the test data. The contestants had no way of determining their scores on their own, they only could see how their solutions perform on the sample pictures.

Initially, we felt that this may introduce a significant bias into the contest: the scores for Art Class could very well turn out to be pure random noise.

Knowing this, the ISC decided to take a closer look on the actual submissions. More precisely, we looked at the correlation between their performances on the samples and on the real tests. In other words, we asked the question: "If we used the sample tests instead of the real ones, would the measured accuracies change significantly?"

Maybe surprisingly, the answer is "NO".

And what this "NO" means is that, basically, whatever a solution did on the 4*9 samples, roughly the same thing happened for the real test data.

To illustrate that this is indeed the case, below we show two plots of test results. Both have been obtained by hand by ISC members, and may slightly differ from what the grader reported (for reasons such as different random seeds). In each plot, the x axis is the accuracy (in percents) of a submission on the sample images, and the y axis is the accuracy on the real test cases. The first plot shows all submissions, the second plot only shows the best submission for each contestant.

--- Michal `misof' Forisek, ISC member

(disclaimer: This text contains my personal opinions, it is not an official document in any sense.)

One final bit of trivia: The only "unlucky" contestant you can see as the mark in the middle of the right side of the second plot is actually a submission that had hard-wired sizes of the sample test cases and answered based on those. No wonders it got the samples right ;)