Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction

by   Justin Wilson, et al.

Reflective and textureless surfaces such as windows, mirrors, and walls can be a challenge for object and scene reconstruction. These surfaces are often poorly reconstructed and filled with depth discontinuities and holes, making it difficult to cohesively reconstruct scenes that contain these planar discontinuities. We propose Echoreconstruction, an audio-visual method that uses the reflections of sound to aid in geometry and audio reconstruction for virtual conferencing, teleimmersion, and other AR/VR experience. The mobile phone prototype emits pulsed audio, while recording video for RGB-based 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. The inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. Our prototype, VR demo, and experimental results from real-world and virtual scenes with challenging surfaces and sound indicate high success rates on classification of material, depth estimation, and closed/open surfaces, leading to considerable visual and audio improvement in 3D scenes (see Figure 1).


page 1

page 2

page 4

page 7

page 8

page 9

page 10


Visual-Assisted Sound Source Depth Estimation in the Wild

Depth estimation enables a wide variety of 3D applications, such as robo...

3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video

3D object reconstructions of transparent and concave structured objects,...

De-noising, Stabilizing and Completing 3D Reconstructions On-the-go using Plane Priors

Creating 3D maps on robots and other mobile devices has become a reality...

Depth Infused Binaural Audio Generation using Hierarchical Cross-Modal Attention

Binaural audio gives the listener the feeling of being in the recording ...

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention

Binaural audio gives the listener an immersive experience and can enhanc...

Enhance-NeRF: Multiple Performance Evaluation for Neural Radiance Fields

The quality of three-dimensional reconstruction is a key factor affectin...

Generalized Scene Reconstruction

A new passive approach called Generalized Scene Reconstruction (GSR) ena...

Please sign up or login with your details

Forgot password? Click here to reset