560B parameter MoE model activating ~27B per token. Meituan's foundational LLM with PID-controller-based dynamic expert allocation and "Zero-computation Experts" mechanism. 128K context, 100+ tokens/sec on H800.

Outputs 2

LongCat-Flash-Chat

model
Architecture MOE
Parameters 560B
Active params 27B
Context window 128,000

LongCat-Flash Technical Report

paper

Details the Zero-computation Experts mechanism and PID-controller routing.

arXiv: 2509.01322

moeopen-weightscaling

Related