Zero-Shot Learning via Latent Space Encoding
Zero-Shot Learning (ZSL) is typically achieved by resorting to a class semantic embedding space to transfer the knowledge from the seen classes to unseen ones. Capturing the common semantic characteristics between the visual modality and the class semantic modality (e.g., attributes or word vector) is a key to the success of ZSL. In this paper, we present a novel approach called Latent Space Encoding (LSE) for ZSL based on an encoder-decoder framework, which learns a highly effective latent space to well reconstruct both the visual space and the semantic embedding space. For each modality, the encoderdecoder framework jointly maximizes the recoverability of the original space from the latent space and the predictability of the latent space from the original space, thus making the latent space feature-aware. To relate the visual and class semantic modalities together, their features referring to the same concept are enforced to share the same latent codings. In this way, the semantic relations of different modalities are generalized with the latent representations. We also show that the proposed encoder-decoder framework is easily extended to more modalities. Extensive experimental results on four benchmark datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of the proposed approach on several ZSL tasks, including traditional ZSL, generalized ZSL, and zero-shot retrieval (ZSR).
READ FULL TEXT