Nemotron 3.5 Content Safety

4B-parameter content-safety classifier built on Google Gemma 3 4B IT (LoRA fine-tuned). Multimodal: takes a user prompt, an optional image, and an optional assistant response as a single context window and produces a coherent safety verdict. Multilingual: 12 explicitly trained languages plus ~140 via zero-shot.

Two novel capabilities over the March 2026 Nemotron 3 Content Safety predecessor: custom policy enforcement (accepts a policy spec alongside the input) and an optional THINK mode emitting step-by-step reasoning before the verdict. Same latency profile as the predecessor in default mode; 3× lower end-to-end latency than alternative multimodal safety models.

Headline scores: 96.5% Multilingual Aegis classification accuracy (12 languages), 88.8% on RTP-LX, 92.7% combined average, ~85% on multimodal benchmarks. NVIDIA Open Model License (research + commercial). Companion datasets: Nemotron-3.5-Content-Safety-Dataset (primary; 99% real photographs, not synthetic), Nemotron-Safety-Guard-Dataset-v3, Nemotron-VLM-Dataset-v2, CantTalkAboutThis-Topic-Control-Dataset.

HuggingFace blog HuggingFace (model)HuggingFace (dataset)HuggingFace (predecessor)

Model Details

Parameters 4B

License NVIDIA Open Model License

safetymoderationmultimodalopen-weight

Nemotron 3.5 Content Safety

Your notes

Model Details

Related