Next Best View (NBV) algorithms aim to maximize 3D scene acquisition quality using minimal resources, e.g. number of acquisitions, time taken, or distance traversed. Prior methods often rely on coverage maximization as a proxy for reconstruction quality, but for complex scenes with occlusions and finer details, this is not always sufficient and leads to poor reconstructions. Our key insight is to train an acquisition policy that directly optimizes for reconstruction quality rather than just coverage. To achieve this, we introduce the View Introspection Network (VIN): a lightweight neural network that predicts the Relative Reconstruction Improvement (RRI) of a potential next viewpoint without making any new acquisitions. We use this network to power a simple, yet effective, sequential sampling-based greedy NBV policy. Our approach, VIN-NBV, generalizes to unseen object categories, operates without prior scene knowledge, is adaptable to resource constraints, and can handle occlusions. We show that our RRI fitness criterion leads to a ~30 gain in reconstruction quality over a coverage-based criterion using the same greedy strategy. Furthermore, VIN-NBV also outperforms deep reinforcement learning methods, Scan-RL and GenNBV, by ~40%.
Overview of the VIN-NBV Policy and the VIN architecture. The VIN is trained to predict the reconstruction improvement of a query view given a set of prior acquisitions. The VIN-NBV policy uses the VIN to select the next best view to acquire. The design of our policy makes it easy to modify with custom termination criteria and decision making logic.
We show the final average chamfer distance of our method compared to prior works evaluated on the OmniObject3D houses category for 20 captures. We also graph the average chamfer distance of our method as more acquisitions are made. We show that our method outperforms all prior works and that the chamfer distance improves as more acquisitions are made.
An interactive comparison of the final reconstruction after 10 total acquistions using our method (VIN-NBV) and our coverage baseline (Cov-NBV). Click the different object names to visualize more objects.
Controls: Drag on any panel to rotate the shared camera; use the mouse wheel to zoom. Click here to reset view to default.
We provide an interactive comparison of the final reconstruction under different time in motion limits. Under the different time limits the robot is forced to complete all acquistions in a certain time. The robot can move for a maximum of 15, 30, 45, and 60 seconds during acquistion. We compare our method (VIN-NBV) with the coverage baseline Cov-NBV and show the final results.
Controls: Drag on any panel to rotate the shared camera; use the mouse wheel to zoom. Click here to reset view to default.
@misc{frahm2025vinnbvviewintrospectionnetwork,
title={VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction},
author={Noah Frahm and Dongxu Zhao and Andrea Dunn Beltran and Ron Alterovitz and Jan-Michael Frahm and Junier Oliva and Roni Sengupta},
year={2025},
eprint={2505.06219},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.06219},
}