LingBot-Map

Hugging Face

Feed-forward 3D foundation model for real-time scene reconstruction from streaming video or image sequences. Uses a Geometric Context Transformer with anchor context, pose-reference window, and trajectory memory for drift correction. Runs at ~20 FPS on 518x378 inputs and stable over 10k+ frame sequences via paged KV-cache attention and keyframe strategies.

3d Local Latest LingBot Family