Early-fusion dense Transformer for open-vocabulary grounding and segmentation from natural-language prompts. Processes image patches and text tokens in a shared parameter space from the first layer with a hybrid attention mask (bidirectional over image tokens, causal over text/task tokens) and a "Chain-of-Perception" autoregressive interface — each detected instance emits <coord> → <size> → <seg>, with coordinates encoded via Fourier features and segmentation produced from a dot-product against upsampled image features.

Two open-weight checkpoints: Falcon Perception (0.6B) for full grounding + segmentation, and Falcon OCR (0.3B) specialised for document layout + OCR. Trained on 54M images with 195M positive expressions and 488M hard negatives in three stages, using multi-teacher distillation from DINOv3 + SigLIP2 and ensemble consensus validation (SAM 3 + Qwen3-VL-30B + Moondream3) — 700 GPU-hours total.

Benchmark headline: SA-Co 68.0 Macro-F1 (vs SAM 3's 62.3); on the new PBench diagnostic the lead widens with prompt complexity — +13.4 on OCR-guided, +21.9 on spatial, +15.8 on relations, +14.2 on dense scenes. Falcon OCR reports olmOCR 80.3 and OmniDocBench 88.6; full layout+OCR throughput is 2.9 images/s on H100.

Companion release: PBench, a capability-stratified diagnostic benchmark (L0–L4 + dense). Docker / vLLM server / MLX (Apple Silicon) shipped alongside. CC-BY 4.0.

Model Details

Architecture DENSE
Parameters 0.6B

Benchmark Scores

Benchmark Score Mode
SA-Co Macro-F1 68.0
PBench L2 OCR-guided 38.0
PBench L3 spatial 53.5
PBench Dense (100s instances) 72.6
olmOCR (Falcon OCR) 80.3
OmniDocBench (Falcon OCR) 88.6

Variants

Name Parameters Notes
Falcon Perception 0.6B Full grounding + segmentation; SA-Co 68.0 Macro-F1 (#1 vs SAM 3)
Falcon Perception 300M 300M Detection-only (bounding boxes); no segmentation head
Falcon OCR 0.3B Document layout + OCR; olmOCR 80.3, OmniDocBench 88.6

Paper

Authors: Aviraj Bevli · Sofian Chaybouti · Yasser Dahou · Hakim Hacid · Ngoc Dung Huynh · Phuc H. Le Khac · Sanath Narayan · Wamiq Reyaz Para · Ankit Singh
frontiermultimodalvisionopen-weightfoundational