Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion

Lalit Manam, Venu Madhav Govindu

[Paper] [Supp] [Poster] [Video] [Code] (Updated Jun 2, 2024)

Sensitive Triangles

In Structure-from-Motion (SfM), the underlying viewgraphs of unordered image collections generally have a highly redundant set of edges that can be sparsified for efficiency without significant loss of reconstruction quality. Often, there are also false edges due to incorrect image retrieval and repeated structures (symmetries) that give rise to ghosting and superimposed reconstruction artifacts. We present a unified method to simultaneously sparsify the viewgraph and remove false edges. We propose a scoring mechanism based on camera triplets that identifies edge redundancy as well as false edges. Our edge selection is formulated as an optimization problem which can be provably solved using a simple thresholding scheme. This results in a highly efficient algorithm which can be incorporated as a pre-processing step into any SfM pipeline, making it practically usable. We demonstrate the utility of our method on generic and ambiguous datasets that cover the range of small, medium and large-scale datasets, all with different statistical properties. Sparsification of generic datasets using our method significantly reduces reconstruction time while maintaining the accuracy of the reconstructions as well as removing ghosting artifacts. For ambiguous datasets, our method removes false edges, thereby avoiding incorrect superimposed reconstructions.

Objective

We present a unified method to sparsify the viewgraph and disambiguate repeated structures. The assumptions in both the tasks are very much opposed and thus have been mostly dealt independently. Table 1 summarizes different scenarios in viewgraphs which can be handled by methods designed for specific tasks.

Methods	Mostly True Edges	Many False Edges
Graph Sparsification	✔	✖
Disambiguation	✖	✔
Ours	✔	✔

Table 1. Summary of different scenarios in viewgraphs handled by methods designed for specific tasks.

Epipolar Inliers in Camera Triplets

The top figure shows two scenarios in camera triplets. The left subfigure shows a generic scenario with green edge having low number of epipolar inliers due to low visual overlap, thus, can potentially be removed to sparsify the viewgraph. The right subfigure shows an ambiguous scenario where two different facades of a building are matched, giving false edges marked in green and blue. The false edges have a low number of inliers compared to the true edge (marked orange), thus giving a cue about ambiguity.

Our Method

For each edge in a camera triplet, compute the edge scores as a fraction of epipolar inliers on the edge compared to the maximum epipolar inliers in all three edges of the triplet.
For every edge, final edge score is the average of edge scores obtained from all the triplets.

Analysing Edge Scores

To analyse edge scores, we take datasets from Doppengangers which have ground truth labels for ambiguous edges, which is shown in Figure 1.

Histogram of Edge Scores

Figure 1. Histograms of edge quality scores over all datasets from Doppelgangers.

Our edge scoring mechanism leads to low scores for ambiguous edges.
Non-ambiguous edges are scored more uniformly, which aids uniform edge removal for graph sparsification.

Results

We use the following notations in this section:

$G$ : Original viewgraph.
$G_{L C T}$ : Graph containing edges contributing to the largest connected component of the triplet graph (see Sec. 3 of the paper) $G_{T}$ .
$G_{D o p p}$ : Graph obtained after applying Doppelgangers on the original graph $G$ .
$G_{F} (m)$ : Graph obtained after applying our method with minimum edge score $m$ .
$# N_{C R}$ : Number of cameras reconstructed.
$t_{R}$ : Reconstruction time using COLMAP.

Graph Sparsification

Reconstruction results on generic datasets

Figure 2. Reconstructions obtained with different graphs on generic datasets.

Top row: Dataset creating ghost artifacts in reconstructions (marked in blue). Applying our method removes such artifacts with other parts of the reconstructions intact.
Bottom row: Dataset with many redundant cameras and edges. Our method sparsifies the graphs, giving visually similar reconstructions in reduced reconstruction time.

Disambiguation

Reconstruction results on ambiguous datasets

Figure 3. Reconstructions obtained with different graphs on ambiguous datasets.

Datasets resulting in superimposed reconstructions (marked in blue) are corrected after applying our method.
Faster reconstruction time compared to Doppelgangers as viewgraphs are sparsification on top of removal of ambiguous edges.

Publication

Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion (Lalit Manam and Venu Madhav Govindu), IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [bibtex]