LocateAnything-3B | Lab Index

3B vision-language grounding model from NVIDIA's LPR group. Introduces Parallel Box Decoding — a non-autoregressive box-output head that improves throughput ~2.5× over standard VLM detection while handling object detection, phrase grounding, GUI element grounding, and OCR in a single model.

Trained on 12M images / 785M boxes. Open weights; project page includes demos and benchmarks.

No results found