Tossaporn Saengja back
Muennighoff et al. 1 curate a very high quality, small reasoning dataset s1K (N=1,000). Supervised fine-tuning Qwen2.5-32B-Instruct on s1K in <30 minutes of 16 H100 GPUs improves math capability to be competitive vs o1.
#paper #ai
They gather very high quality dataset (N=59,000), around the ballpark of math olympiad, PhD qualifying exams and filter with three stages:
Simple length manipulation on reasoning:
</think>
with
Wait
to encourage reflection.</think> Final Answer:
https://arxiv.org/abs/2501.19393v2↩︎