Poolside's compact agentic-coding model and its first open-weight release (Apache 2.0, April 2026): a 33.4B-total / 3.0B-active Mixture-of-Experts transformer sharing M.1's expert layout (256 experts, top-8 + shared), but adding interleaved sliding-window and global attention at a 3:1 ratio and grouped-query attention (8 KV heads) for a small KV-cache footprint — it runs on a single GPU. Trained from scratch on >30T tokens using 2,048 NVIDIA H200 GPUs, sharing the M.1 "Model Factory" recipe (AutoMixer data mixtures, Muon optimization, CISPO online RL). FP8, NVFP4, and INT4 quantizations are published for low-VRAM deployment.

Reported benchmarks (technical report, May 2026): SWE-bench Verified 69.9, SWE-bench Multilingual 67.2, SWE-bench Pro 35.7, Terminal-Bench 2.0 42.9 — leading its open weight class (Devstral Small 2, Gemma 4 31B, Qwen3.5/3.6 35B-A3B) on SWE-bench Verified in the report's figures. One of the strongest open-weight launches of 2026 by adoption, climbing to ~219k Hugging Face downloads. Not currently scored on Artificial Analysis — numbers above are self-reported.

Model Details

Architecture MOE
Parameters 33.4B
Active params 3B
Experts 256 (top-8)
Context window 262,144
Training tokens 30T
Training hardware 2,048 NVIDIA H200
Optimizer Muon (Moonlight variant)
License Apache 2.0

Benchmark Scores

Benchmark Score Mode
SWE-bench Verified 69.9
SWE-bench Multilingual 67.2
SWE-bench Pro 35.7
Terminal-Bench 2.0 42.9
open-weightcodingagenticmoeefficiency

Related