## On suitability of tasks for the IOI competition

Michal Forišek,
Faculty of Mathematics, Physics and Informatics,
Comenius University, Slovakia

Abstract. As GA members probably know, there is almost no chance to reject a task presented at the IOI (the only exception being its appearance in some national contest). Still, I do feel that some of the tasks selected at IOI 2004 were not suitable. In this article I will try to argue why this is the case. I will also try to show how to avoid this type of tasks. Moreover, I will briefly mention the effect of the new 50% rule.

### 1. Overview

The usual competition rules approved at the beginning of each IOI define the nature of competition tasks as follows:

"All of the tasks in IOI (year) are designed to be algorithmic in nature. Efficiency plays an important role in some tasks; whenever efficiency of algorithmic computations is important, there will be grading inputs for which some correct but inefficient approaches can also score some points. It is important, therefore, for contestants to attempt a task even if the contestant does not know how to solve the hardest possible test cases."

Basically, the point is: The more effective is the implemented algorithm, the more points it should score. One possibility to achieve this would be the IMO style -- the contestant sumbits his solution written on paper. This approach would make it possible to score the algorithm according to its efficiency. However, this approach has several drawbacks. For example, it is much easier to spot a bug in a (usually short) mathematical proof than to spot a bug in a program. Therefore the idea of testing programs on several inputs was introduced.

Usually the notion of the effectivity of an algorithm coincides with the notion of its asymptotic time complexity. In the theoretical computer science point of view, if two algorithms are better, usually the one with a better time complexity is considered to be better. In my point of view, this should be also the view adopted by the IOI. For a given task, if two contestants implement correct solutions with different time complexities, the one with the better time complexity should score more points. Moreover, any correct solution should score more points than any incorrect solution (i.e. a solution based on an incorrect idea).

When the programs are evaluated by testing on various inputs, this is usually achieved by careful selection of the input files -- several are small (to test for correctness in general and special cases) and the size of the others is uniformly increasing to test for effectivity.

It would seem that in this way we are able to achieve the goals stated above. Sadly, as I will show later, this doesn't always have to be the case.

In my point of view, one particular type of tasks causes troubles. These tasks can be characterised by one or more of the following properties:

1. The set of correct solutions is large.
2. The value of a random solution is close to the optimal value.
3. There is a (theoretically incorrect) heuristic algorithm scoring well on random data.

I will show the exact nature of the troubles by showing two such tasks and explaining what causes the problems. In fact, you are already familiar with both of the tasks -- as they appeared on IOI 2004!

#### 2.1. Artemis

Problem statement at http://www.ioi2004.org/html/competition_test1.html.

• It is hard to create input files, where the optimum is not achieved. There are not too many such input files in the test data.
• The trivial O(N^3) solution (that stops if an optimal solution is found) works for surprisingly high values of N, therefore the test data has to be fairly large.

I completely agree with these statements. Note that it follows that this task clearly has the property #1 I mentioned above. Later I will show that also property #2 applies.

As we already know, most of the inputs were fairly large. I know that some contestants found and correctly implemented an O(N^2 log N) solution. Sadly, this solution needed some time for preprocessing the input. The result: the trivial O(N^3) algorithm found a nearly-optimal solution pretty soon and it possibly terminated before the better algorithm found some solution. As a consequence, implementations of a better algorithm actually scored less points (usually around 40) than the trivial O(N^3) solution did.

The other point is that trivial algorithms could easily score far more than the expected and intended 50 points. During the "analysis mode" after day 1 I had the opportunity to access my contestants' computers. In 5 minutes I wrote and submitted the following trivial O(N^3) solution:

```- try all trees for corner #1
- try all other trees for corner #2
- for all trees:
- if the tree is inside, increment counter
- if already too many trees are inside, try next rectangle
```

Moreover, the algorithm contained an instruction: after 0.99 seconds, output the best solution found so far and terminate. Note that this algorithm doesn't always have to give a correct answer. You may even add a step "randomly permute the trees" at the beginning, then we can clearly see that this algorithm has virtually no idea behind it. It simply tries a small subset of all possible solutions and outputs the best one it found in the given time.

Imagine my surprise when I found out that this trivial program would score 80 points! When compared with 40 points for a correct O(N^2 log N) solution, this sounds pretty unfair to me.

Note that the value of N was theoretically large enough to make a difference between the O(N^2 log N) and O(N^3) algorithms, If both algorithms were forced to try all possible solutions and select the best one, the O(N^2 log N) algorithm would be much faster. But the nature of this task (e,g, properties #1, #2) caused that this goal could probably not be achieved. In other words, there was probably no way to select test data so that points for an algorithm would correspond to its time complexity.

#### 2.2. Farmer

Problem statement at http://www.ioi2004.org/html/competition_test2.html.

This is an almost obvious knapsack problem. However, there were finer points that are not considered in the solution published on the IOI 2004 web page. Let's take a closer look at the problem's properties:

1. Suppose there are no more than Q cypress trees in the fields. Then the optimum solution can be achieved without solving the knapsack by simply taking all the fields and several longest strips.
2. If there are more than Q cypress trees in the fields, there are only two possible answers. If there is a subset of the fields with exactly Q cypress trees, the answer is Q, otherwise the answer is Q-1.

From the properties of the task we can deduce the following facts:

• A solution flipping a coin and answering Q or Q-1 would solve half of input cases of type 2. Note that for type 2 inputs this problem is essentially a YES-NO problem (either output YES, the result is Q, or NO, the result is Q-1). For obvious reasons this type of problems is avoided at the IOI.
• An incorrect greedy solution taking first the fields (in any order) would solve at least all type 2 inputs, where the answer is Q-1.
• An incorrect greedy solution taking first the fields (again, in any order) and then strips sorted according to their length (longest first) would solve all type 1 inputs.

As far as I was able to find out during "analysis mode", 18 of the 20 test inputs were of type 1. (The scientific committee probably didn't make the observations above.)

The result: an incorrect heuristic greedy solution scored at least 90 points. I'm aware that such solutions were submitted by contestants.

And even if the input data would be more balanced, there will still be simple but incorrect solutions scoring far too many points.

### 3. Conclusion

While I agree that a clever heuristic algorithm should score some points, I'm strongly convinced that ANY (well, almost any) correct algorithm is better and should score more points. I'm also convinced that for correct algorithms, the score should roughly correspond to the time complexity of the algorithm. In my opinion the Scientific Committee should select the competition tasks (and prepare the test data) in such a way that this would be accomplished.

As I argued above, for the tasks "Artemis" and "Farmer" this wasn't done and probably it wasn't even possible. Therefore I see these tasks as unsuitable and I think that similar tasks should not appear on future IOIs.

Here I will repeat the types of problems that are algorithmic in nature, but still I see them to be unsuitable for the IOI.

1. The set of correct solutions is large.
2. The value of a random solution is close to the optimal value.
3. There is a (theoretically incorrect) heuristic algorithm scoring well on random data.

Note that these properties coincide and an unsuitable task will usually have more of them.

I already have many years of experience as a problem setter. Several times I have encountered tasks with properties similar to the two tasks I discussed here. Always, there were problems with selecting proper test data for this tasks. Usually, for this tasks we knew about at least one well-performing heuristic algorithm. We implemented them and selected the test data so that they wouldn't score many points --- but only to find out that the contestants wrote some similar heuristics that scored too well. My insight from this experience is: "If there is a known good heuristic algorithm for the problem, better throw it away."

This was exactly the case for the task "Farmer". I know that the Scientific Committee was aware that good heuristic algorithms do exist. Even if they didn't know about the other problems I mention, this alone should have been enough to reject the task.

I think that the ISC should become aware of the problems I mentioned in this article and should make sure that such tasks won't occur at future IOIs.

### Appendix A. The 50% rule

Probably sooner or later we will see some statistics on how the scores changed in comparision with past IOIs. It is true that the cutoff for bronze was much higher than in past years. But in my point of view, a major cause was that incorrect solutions could score far too many points. In fact, I showed two such algorithms that take 10 minutes to implement and score 80+90 = 170 points. Therefore we now know nothing about the real effect of the 50% rule. Perhaps the next IOI will tell.

### Appendix B. Approximation algorithms

I feel I should emphasise that in my point of view, there is nothing wrong with approximation algorithms in general. If (for example) finding an optimal solution is NP-hard and the contestants are required to write a good approximation algorithm, the task can be pretty good. It is still algorithmic in nature and the better the approximation ratio, the more points the algorithm will score. Excellent examples of such tasks are XOR (IOI 2002) and I'Shongololo (IOI 1997).

The point I wanted to stress is that if there are fast (usually polynomial-time) algorithms solving a given task, no heuristic algorithm (i.e. an algorithm that doesn't always have to find an optimal solution) should be allowed to score many points.