Earballs: Neural Transmodal Translation

05/27/2020
by   Andrew Port, et al.
0

As is expressed in the adage "a picture is worth a thousand words", when using spoken language to communicate visual information, brevity can be a challenge. This work describes a novel technique for leveraging machine learned feature embeddings to translate visual (and other types of) information into a perceptual audio domain, allowing users to perceive this information using only their aural faculty. The system uses a pretrained image embedding network to extract visual features and embed them in a compact subset of Euclidean space – this converts the images into feature vectors whose L^2 distances can be used as a meaningful measure of similarity. A generative adversarial network (GAN) is then used to find a distance preserving map from this metric space of feature vectors into the metric space defined by a target audio dataset equipped with either the Euclidean metric or a mel-frequency cepstrum-based psychoacoustic distance metric. We demonstrate this technique by translating images of faces into human speech-like audio. For both target audio metrics, the GAN successfully found a metric preserving mapping, and in human subject tests, users were able to accurately classify audio translations of faces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Face-to-Music Translation Using a Distance-Preserving Generative Adversarial Network with an Auxiliary Discriminator

Learning a mapping between two unrelated domains-such as image and audio...
research
12/31/2015

Autoencoding beyond pixels using a learned similarity metric

We present an autoencoder that leverages learned representations to bett...
research
10/22/2020

NU-GAN: High resolution neural upsampling with GAN

In this paper, we propose NU-GAN, a new method for resampling audio from...
research
10/01/2020

Helicality: An Isomap-based Measure of Octave Equivalence in Audio Data

Octave equivalence serves as domain-knowledge in MIR systems, including ...
research
10/24/2021

Quality Map Fusion for Adversarial Learning

Generative adversarial models that capture salient low-level features wh...
research
12/22/2020

AudioViewer: Learning to Visualize Sound

Sensory substitution can help persons with perceptual deficits. In this ...
research
11/19/2015

Fast Metric Learning For Deep Neural Networks

Similarity metrics are a core component of many information retrieval an...

Please sign up or login with your details

Forgot password? Click here to reset