Hunyuan Turbo S
model paper"Fast-thinking" model using a Hybrid-Mamba-Transformer architecture for near-instant replies with complex reasoning. 56B activated / 560B total hybrid MoE. 256K context, 16T pre-training tokens.
Outputs 2
Hunyuan-TurboS: Mamba-Transformer Synergy
paper56B activated / 560B total hybrid MoE with Mamba-Transformer architecture. 256K context, 16T pre-training tokens.
arXiv: 2505.15431