Tossaporn Saengja back
OpenAI released a preprint on o1-ioi/o3 in “Competitive Programming with Large Reasoning Models”1. I found three interesting notes.
I got bronze (71st percentile) in IOI 2013 and coached students to IOI.
I’m not confident if I would get a medal in IOI 2024. For IOI 2024, report
says o1-ioi
could get a gold (362.14pts) with 10K submission,
where o3
improves (395.64pts) to within 50 submissions.
o1-ioi
continues reinforcement learning (RL) training
targeted at coding tasks. At test-time, it samples 10K solutions for each
subtask, and use clustering and reranking for best sampled
solutions to submit. It also creates “one version of the document for each
subtask”. Top 50 sampled solutions would get a score of 213 (49th
percentile, Honourable Mention).
o3
is an exploration on the limits of RL. No details from the
preprint, but the method is probably similar to DeepSeek-R1 (RL with
verifiable rewards). o3
also samples from a single prompt on
the “original problem,” generating 1K solutions per problem.
Highest test-time compute are selected as top 50 solutions.
First note is that attacking every subtask is a common technique
in competitive programming. o1-ioi
seems to adopt the
approach, where o3
is attacking the
problem
holistically.
Second note is I think o3
would not happen if
o1-ioi
doesn’t show promising results on a relaxed constraint
(10k submission). o1-ioi
shows that the model “knows” a
solution “very deep” inside the weights (since it takes 10k search).
o3
shows that RL can tweak the weights to surface the
solution, thus enhancing its ability.
Third note is RL can push capabilities, but supervised fine-tuning could
probably get to the same point much easier. Not sure if
o3-mini-high
(supposed a distillation from o3
)
would get a gold. Having a good mentor is a huge leverage (either human,
or GPT)
Bonus note from ChatGPT-4o itself: “This mirrors how humans improve: extensive problem-solving exposure helps identify”deeply embedded” knowledge, but structured learning (mentorship, targeted training) improves efficiency.”