Fusing Directions and Displacements in Translation Averaging

Lalit Manam, Venu Madhav Govindu

[Paper] [Supp] [Poster] [Code] (Updated Mar 15, 2024)

Sensitive Triangles

Translation averaging solves for 3D camera translations given many pairwise relative translation directions. The mismatch between inputs (directions) and output estimates (absolute translations) makes translation averaging a challenging problem, which is often addressed by comparing either directions or displacements using relaxed cost functions that are relatively easy to optimize. However, the distinctly different nature of the cost functions leads to varied behaviour under different baselines and noise conditions. In this paper, we argue that translation averaging can benefit from a fusion of the two approaches. Specifically, we recursively fuse the individual updates suggested by direction and displacement-based methods using their uncertainties. The uncertainty of each estimate is modelled by the inverse of the Hessian of the corresponding optimization problem. As a result, our method utilizes the advantages of both methods in a principled manner. The superiority of our translation averaging scheme is demonstrated via the improved accuracies of camera translations on benchmark datasets compared to the state-of-the-art methods.

Notations

Underlying viewgraph: \( \mathcal{G}=\left(\mathcal{V},\mathcal{E}\right) \).
Global rotation and translation: \(\mathbf{R}_{i} \in \mathbb{SO}(3), \mathbf{T}_{i} \in \mathbb{R}^{3}, \forall i \in \mathcal{V} \).
Relative rotation and translation direction: \( \mathbf{R}_{ij}= \mathbf{R}_j \mathbf{R}_i^{-1}, \mathbf{t}_{ij}= \frac{\mathbf{R}_j(\mathbf{T}_i-\mathbf{T}_j)}{\|\mathbf{R}_j(\mathbf{T}_i-\mathbf{T}_j)\|_2}, \forall \left(i,j\right) \in \mathcal{E} \).
Relative translation direction in global reference frame: \( \mathbf{v}_{ij}=-\mathbf{R}_j^{-1} \mathbf{t}_{ij}= \frac{\mathbf{T}_j-\mathbf{T}_i}{\|\mathbf{T}_j-\mathbf{T}_i\|_2}, \forall \left(i,j\right) \in \mathcal{E} \).

Cost Functions for Translation Averaging

Comparing displacement vectors: \begin{align*} \min_{\mathbf{T}_{i, i \in \mathcal{V}}} & \rho \left( \| \mathbf{T}_j - \mathbf{T}_i - \|\mathbf{T}_j - \mathbf{T}_i\| \mathbf{v}_{ij} \|_2 \right) \\ \text{s.t. } \sum_{i \in \mathcal{V}} \mathbf{T}_i =\mathbf{0}, \sum_{(i,j) \in \mathcal{E}} & \left\langle \mathbf{T}_j - \mathbf{T}_i, \mathbf{v}_{ij} \right\rangle =1, \lambda_{ij} \ge 0, \text{ } \forall (i,j) \in \mathcal{E} \end{align*}
Comparing direction vectors: \begin{align*} \min_{\mathbf{T}_{i, i \in \mathcal{V}}} & \rho \left(\left\| \frac{\mathbf{T}_j - \mathbf{T}_i}{\| \mathbf{T}_j - \mathbf{T}_i \|} - \mathbf{v}_{ij} \right\|_2 \right) \\ \text{s.t. } \sum_{i \in \mathcal{V}} \mathbf{T}_i =\mathbf{0}, \sum_{(i,j) \in \mathcal{E}} & \left\langle \mathbf{T}_j - \mathbf{T}_i, \mathbf{v}_{ij} \right\rangle =1, \gamma_{ij} \ge 0, \text{ } \forall (i,j) \in \mathcal{E} \end{align*}

These cost functions are relaxed by introducting non-negative slack variables, \(\lambda_{ij}\) and \(\gamma_{ij}\), which are ideally equal to baseline and inverse baseline for the edge \((i,j)\) respectively.

Comparing relaxed displacement vectors: \begin{align*} \min_{\mathbf{T}_{i, i \in \mathcal{V}}, \lambda_{ij, (i,j) \in \mathcal{E}} } & \rho \left( \| \mathbf{T}_j - \mathbf{T}_i - \lambda_{ij} \mathbf{v}_{ij} \|_2 \right) \\ \text{s.t. } \sum_{i \in \mathcal{V}} \mathbf{T}_i =\mathbf{0}, \sum_{(i,j) \in \mathcal{E}} & \left\langle \mathbf{T}_j - \mathbf{T}_i, \mathbf{v}_{ij} \right\rangle =1, \lambda_{ij} \ge 0, \text{ } \forall (i,j) \in \mathcal{E} \end{align*}
Comparing relaxed direction vectors: \begin{align*} \min_{\mathbf{T}_{i, i \in \mathcal{V}}, \gamma_{ij, (i,j) \in \mathcal{E}} } & \rho \left( \| \left(\mathbf{T}_j - \mathbf{T}_i\right) \gamma_{ij} - \mathbf{v}_{ij} \|_2 \right) \\ \text{s.t. } \sum_{i \in \mathcal{V}} \mathbf{T}_i =\mathbf{0}, \sum_{(i,j) \in \mathcal{E}} & \left\langle \mathbf{T}_j - \mathbf{T}_i, \mathbf{v}_{ij} \right\rangle =1, \gamma_{ij} \ge 0, \text{ } \forall (i,j) \in \mathcal{E} \end{align*}

The behaviour of the relaxed cost functions vary with noise and the spread of baselines as seen in the top figure. In real data, the noise and baseline distribution differs with scenes as seen in below figures.


Montreal Notre Dame	Yorkminster

Figure 1. Histograms of normalized ground truth baselines.


Ellis Island	Madrid Metropolis

Figure 2. Histograms of input direction errors (in degrees).

Results

Synthetic Data


Disparate baselines with input noise \( \sigma = 5\)	Similar baselines with input noise \( \sigma = 5\)

Figure 3. Histogram of mean errors for synthetic datasets on noisy data with different baselines. The leftward shift indicates the superior performance of our method.

Real Data

Dataset	1DSfM	LUD	ShapeFit	BATA	Fused-TA (Ours)
Alamo	2.4	3.0	2.6	2.4	2.4
Ellis Island	32.3	26.0	15.6	25.3	17.5
Montreal Notre Dame	1.9	2.4	2.3	1.9	1.8
NYC Library	3.1	3.0	3.6	2.5	2.3
Piazza del Popolo	7.2	5.5	15.8	5.0	4.5
Piccadilly	2.2	2.6	2.4	2.3	2.1
Roman Forum	4.3	8.6	8.6	4.4	5.3
Tower of London	8.0	10.7	8.6	6.9	6.2
Trafalgar	11.7	7.6	8.3	6.9	6.5
Union Square	9.2	8.5	14.3	8.1	7.9
Vienna Cathedral	12.8	6.5	7.1	6.5	6.1

Table 1. Median camera translation errors (in meters) on 1DSfM datasets.

Publication

Fusing Directions and Displacements in Translation Averaging (Lalit Manam and Venu Madhav Govindu), International Conference on 3D Vision, 2024 (Oral Presentation). [bibtex]