Step-3.5-Flash
model paper datasetMost advanced agentic model. 196B MoE (11B active) using MTP-3 and hybrid attention. Optimized for speed (350 tok/s) with 256k context. Step-3.5-Flash leverages **StepCrawl**, a proprietary high-signal data acquisition system that prioritizes information-dense documents (especially PDFs) through a sophisticated URL selection layer, moving beyond standard web-scale crawls. Released with a 1.6M-row instruction-tuning dataset.
Outputs 3
Step-3.5-Flash
modelMost advanced agentic model. 196B MoE (11B active) using MTP-3 and hybrid attention. Optimized for speed (350 tok/s) with 256k context.
Architecture MOE
Parameters 196B
Active params 11B
Context window 256,000
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
paperDetailed the RL pipeline and MTP-3 acceleration for the Step 3.5 Flash model.
arXiv: 2602.10604
Step-3.5-Flash-SFT
datasetMassive 1.6M-row instruction-tuning dataset released to the community.