Tossaporn Saengja back

simple, efficient reasoning dataset

s1: Simple test-time scaling

Muennighoff et al. 1 curate a very high quality, small reasoning dataset s1K (N=1,000). Supervised fine-tuning Qwen2.5-32B-Instruct on s1K in <30 minutes of 16 H100 GPUs improves math capability to be competitive vs o1.

#paper #ai

Method

They gather very high quality dataset (N=59,000), around the ballpark of math olympiad, PhD qualifying exams and filter with three stages:

Budget forcing

Simple length manipulation on reasoning:


  1. https://arxiv.org/abs/2501.19393v2↩︎