Video-text retrieval is a class of cross-modal representation learning
p...
Mining the shared features of same identity in different scene, and the
...
We present a simple approach based on pixel-wise nearest neighbors to
un...
Despite the fact that many 3D human activity benchmarks being proposed, ...