See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

Abstract

This work proposes a time-reversed reconstruction framework that uses paired RGB and thermal images to recover scene states from a few seconds earlier. The approach couples Visual-Language Models with a constrained diffusion process, where one VLM generates scene descriptions and another guides image reconstruction, ensuring semantic and structural consistency. Experiments demonstrate the feasibility of reconstructing plausible past frames up to 120 seconds earlier in controlled scenarios.

Publication
In arXiv preprint arXiv:2510.05408

Detailed description and code references available in the arXiv preprint: https://arxiv.org/pdf/2510.05408