End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery

1Anonymized Affiliations

Registration result on one ex-vivo human specimen from the SpineDepth dataset: interactive view.

Colorbar
Icon

If the interactive viewer does not appear, please use Safari or Chrome. Use your mouse to rotate and zoom in. The visualization shows the intraoperative point cloud and the preoperative vertebral mesh registered with the network prediction, with colors indicating the point-to-point distance to the ground truth.

Abstract

Intraoperative navigation in spine surgery demands millimeter-level accuracy, which is currently achieved through radiation-intensive intraoperative imaging and bone-anchored markers that are invasive and disrupt surgical workflow. Markerless RGB-D registration methods offer a promising alternative. However, existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, potentially propagating errors through the registration process. We present End2Reg, an end-to-end deep learning framework that jointly optimizes segmentation and registration, eliminating the need for segmentation labels and manual steps. The network learns task-specific segmentation masks optimized for registration, guided solely by the registration objective without explicit segmentation supervision. End2Reg achieves state-of-the-art performance on ex- and in-vivo benchmarks, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61%, while maintaining robust performance under partial occlusions. Ablation results confirm that end-to-end optimization significantly improves registration accuracy. Overall, End2Reg advances towards fully automatic, markerless intraoperative navigation.

Model architecture.

After

The framework consists of a segmentation module and a registration module, taking as input the intraoperative RGB-D point cloud and the preoperative point cloud. The network is jointly optimized: the registration loss is backpropagated through both modules, with a Straight-Through Gumbel–Softmax estimator enabling gradients to pass through the discrete segmentation step. The network outputs the rigid transformation T, aligning the preoperative model to the intraoperative scene.

Results SpineDepth dataset (ex-vivo)

Before and after: Original pointcloud and Unsupervised Segmentation output.

Before After

Interactive slider: visualization of three different specimens from the SpineDepth dataset.

Registration results

After

Results SpineAlign dataset (in-vivo)

Before and after: Original pointcloud and Unsupervised Segmentation output.

Before After

Interactive slider: visualization of two different specimens from the SpineAlign dataset.

Registration results

After

BibTeX

@article{ my article
      }