Tossaporn Saengja back

o1-ioi and o3 in competitive programming

OpenAI released a preprint on o1-ioi/o3 in “Competitive Programming with Large Reasoning Models”¹. I found three interesting notes.

I got bronze (71st percentile) in IOI 2013 and coached students to IOI. I’m not confident if I would get a medal in IOI 2024. For IOI 2024, report says o1-ioi could get a gold (362.14pts) with 10K submission, where o3 improves (395.64pts) to within 50 submissions.

o1-ioi continues reinforcement learning (RL) training targeted at coding tasks. At test-time, it samples 10K solutions for each subtask, and use clustering and reranking for best sampled solutions to submit. It also creates “one version of the document for each subtask”. Top 50 sampled solutions would get a score of 213 (49th percentile, Honourable Mention).

o3 is an exploration on the limits of RL. No details from the preprint, but the method is probably similar to DeepSeek-R1 (RL with verifiable rewards). o3 also samples from a single prompt on the “original problem,” generating 1K solutions per problem. Highest test-time compute are selected as top 50 solutions.

First note is that attacking every subtask is a common technique in competitive programming. o1-ioi seems to adopt the approach, where o3 is attacking the problem holistically.

Second note is I think o3 would not happen if o1-ioi doesn’t show promising results on a relaxed constraint (10k submission). o1-ioi shows that the model “knows” a solution “very deep” inside the weights (since it takes 10k search). o3 shows that RL can tweak the weights to surface the solution, thus enhancing its ability.

Third note is RL can push capabilities, but supervised fine-tuning could probably get to the same point much easier. Not sure if o3-mini-high (supposed a distillation from o3) would get a gold. Having a good mentor is a huge leverage (either human, or GPT)

Bonus note from ChatGPT-4o itself: “This mirrors how humans improve: extensive problem-solving exposure helps identify”deeply embedded” knowledge, but structured learning (mentorship, targeted training) improves efficiency.”

https://arxiv.org/abs/2502.06807 ↩︎