LongCat-Flash-Chat

560B parameter MoE model activating ~27B per token. Meituan's foundational LLM with PID-controller-based dynamic expert allocation and "Zero-computation Experts" mechanism. 128K context, 100+ tokens/sec on H800.

Project Page GitHub HuggingFace Paper (arXiv)

Outputs 2

LongCat-Flash-Chat

model

HuggingFace GitHub

Architecture MOE

Parameters 560B

Active params 27B

Context window 128,000

LongCat-Flash Technical Report

paper

Details the Zero-computation Experts mechanism and PID-controller routing.

Paper (arXiv)

arXiv HTML

moeopen-weightscaling

Outputs 2

LongCat-Flash-Chat

LongCat-Flash Technical Report

Related