Falcon Perception | Lab Index

Early-fusion dense Transformer for open-vocabulary grounding and segmentation from natural-language prompts. Processes image patches and text tokens in a shared parameter space from the first layer with a hybrid attention mask (bidirectional over image tokens, causal over text/task tokens) and a "Chain-of-Perception" autoregressive interface — each detected instance emits <coord> → <size> → <seg>, with coordinates encoded via Fourier features and segmentation produced from a dot-product against upsampled image features.

Two open-weight checkpoints: Falcon Perception (0.6B) for full grounding + segmentation, and Falcon OCR (0.3B) specialised for document layout + OCR. Trained on 54M images with 195M positive expressions and 488M hard negatives in three stages, using multi-teacher distillation from DINOv3 + SigLIP2 and ensemble consensus validation (SAM 3 + Qwen3-VL-30B + Moondream3) — 700 GPU-hours total.

Benchmark headline: SA-Co 68.0 Macro-F₁ (vs SAM 3's 62.3); on the new PBench diagnostic the lead widens with prompt complexity — +13.4 on OCR-guided, +21.9 on spatial, +15.8 on relations, +14.2 on dense scenes. Falcon OCR reports olmOCR 80.3 and OmniDocBench 88.6; full layout+OCR throughput is 2.9 images/s on H100.

Companion release: PBench, a capability-stratified diagnostic benchmark (L0–L4 + dense). Docker / vLLM server / MLX (Apple Silicon) shipped alongside. CC-BY 4.0.

Paper (arXiv)HuggingFace blog HuggingFace (Falcon Perception)HuggingFace (Falcon Perception 300M)HuggingFace (Falcon OCR)GitHub Playground

Model Details

Architecture DENSE

Parameters 0.6B

Benchmark Scores

Benchmark	Score	Mode
SA-Co Macro-F1	68.0	—
PBench L2 OCR-guided	38.0	—
PBench L3 spatial	53.5	—
PBench Dense (100s instances)	72.6	—
olmOCR (Falcon OCR)	80.3	—
OmniDocBench (Falcon OCR)	88.6	—

Variants

Name	Parameters	Notes
Falcon Perception	0.6B	Full grounding + segmentation; SA-Co 68.0 Macro-F1 (#1 vs SAM 3)
Falcon Perception 300M	300M	Detection-only (bounding boxes); no segmentation head
Falcon OCR	0.3B	Document layout + OCR; olmOCR 80.3, OmniDocBench 88.6

Paper

arXiv HTML

Authors: Aviraj Bevli · Sofian Chaybouti · Yasser Dahou · Hakim Hacid · Ngoc Dung Huynh · Phuc H. Le Khac · Sanath Narayan · Wamiq Reyaz Para · Ankit Singh

frontiermultimodalvisionopen-weightfoundational

Your notes

Model Details

Benchmark Scores

Variants

Paper