AJ Piergiovanni

research

∙ 06/06/2023

Diversifying Joint Vision-Language Tokenization Learning

Building joint representations across images and text is an essential st...

0 Vardaan Pahuja, et al. ∙

research

∙ 05/31/2023

Joint Adaptive Representations for Image-Language Learning

Image-language learning has made unprecedented progress in visual unders...

0 AJ Piergiovanni, et al. ∙

research

∙ 03/29/2023

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

The development of language models have moved from encoder-decoder to de...

0 Weicheng Kuo, et al. ∙

research

∙ 12/06/2022

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

We present a simple approach which can turn a ViT encoder into an effici...

0 AJ Piergiovanni, et al. ∙

research

∙ 12/02/2022

Compound Tokens: Channel Fusion for Vision-Language Representation Learning

We present an effective method for fusing visual-and-language representa...

0 Maxwell Mbabilla Aladago, et al. ∙

research

∙ 09/30/2022

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

We present F-VLM, a simple open-vocabulary object detection method built...

0 Weicheng Kuo, et al. ∙

research

∙ 09/14/2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Effective scaling and a flexible task interface enable large language mo...

6 Xi Chen, et al. ∙

research

∙ 09/09/2022

Pre-training image-language transformers for open-vocabulary tasks

We present a pre-training approach for vision and language transformer m...

0 AJ Piergiovanni, et al. ∙

research

∙ 08/01/2022

Video Question Answering with Iterative Video-Text Co-Tokenization

Video question answering is a challenging task that requires understandi...

0 AJ Piergiovanni, et al. ∙

research

∙ 05/02/2022

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

We present Answer-Me, a task-aware multi-task framework which unifies a ...

0 AJ Piergiovanni, et al. ∙

research

∙ 03/31/2022

FindIt: Generalized Localization with Natural Language Queries

We propose FindIt, a simple and versatile framework that unifies a varie...

0 Weicheng Kuo, et al. ∙

research

∙ 09/02/2021

4D-Net for Learned Multi-Modal Alignment

We present 4D-Net, a 3D object detection approach, which utilizes 3D Poi...

7 AJ Piergiovanni, et al. ∙

research

∙ 06/28/2021

Unsupervised Discovery of Actions in Instructional Videos

In this paper we address the problem of automatically discovering atomic...

10 AJ Piergiovanni, et al. ∙

research

∙ 06/21/2021

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

In this paper, we introduce a novel visual representation learning which...

0 Michael S. Ryoo, et al. ∙

research

∙ 06/07/2021

Unsupervised Action Segmentation for Instructional Videos

In this paper we address the problem of automatically discovering atomic...

0 AJ Piergiovanni, et al. ∙

research

∙ 04/14/2021

Adaptive Intermediate Representations for Video Understanding

A common strategy to video understanding is to incorporate spatial and m...

0 Juhana Kangaspunta, et al. ∙

research

∙ 03/30/2021

Recognizing Actions in Videos from Unseen Viewpoints

Standard methods for video recognition use large CNNs designed to captur...

0 AJ Piergiovanni, et al. ∙

research

∙ 08/18/2020

AssembleNet++: Assembling Modality Representations via Attention Connections

We create a family of powerful video models which are able to: (i) learn...

0 Michael S. Ryoo, et al. ∙

research

∙ 08/11/2020

Adversarial Generative Grammars for Human Activity Prediction

In this paper we propose an adversarial generative grammar model for fut...

15 AJ Piergiovanni, et al. ∙

research

∙ 07/23/2020

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification

Convolutional operations have two limitations: (1) do not explicitly mod...

10 Xiaofang Wang, et al. ∙

research

∙ 07/10/2020

AViD Dataset: Anonymized Videos from Diverse Countries

We introduce a new public video dataset for action recognition: Anonymiz...

0 AJ Piergiovanni, et al. ∙

research

∙ 02/26/2020

Evolving Losses for Unsupervised Video Representation Learning

We present a new method to learn video representations from large-scale ...

0 AJ Piergiovanni, et al. ∙

research

∙ 10/15/2019

Tiny Video Networks

Video understanding is a challenging problem with great impact on the ab...

0 AJ Piergiovanni, et al. ∙

research

∙ 10/08/2019

Model-based Behavioral Cloning with Future Image Similarity Learning

We present a visual imitation learning framework that enables learning o...

9 Alan Wu, et al. ∙

research

∙ 06/07/2019

Evolving Losses for Unlabeled Video Representation Learning

We present a new method to learn video representations from unlabeled da...

0 AJ Piergiovanni, et al. ∙

research

∙ 05/30/2019

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Learning to represent videos is a very challenging task both algorithmic...

0 Michael S. Ryoo, et al. ∙

research

∙ 04/18/2019

Early Detection of Injuries in MLB Pitchers from Video

Injuries are a major cost in sports. Teams spend millions of dollars eve...

0 AJ Piergiovanni, et al. ∙

research

∙ 02/01/2019

Learning Differentiable Grammars for Continuous Data

This paper proposes a novel algorithm which learns a formal regular gram...

0 AJ Piergiovanni, et al. ∙

research

∙ 11/26/2018

Evolving Space-Time Neural Architectures for Videos

In this paper, we present a new method for evolving video CNN models to ...

2 AJ Piergiovanni, et al. ∙

research

∙ 10/02/2018

Representation Flow for Action Recognition

In this paper, we propose a convolutional layer inspired by optical flow...

0 AJ Piergiovanni, et al. ∙

research

∙ 06/21/2018

Learning Shared Multimodal Embeddings with Unpaired Data

In this paper, we propose a method to learn a joint multimodal embedding...

0 AJ Piergiovanni, et al. ∙

research

∙ 05/20/2018

Learning Real-World Robot Policies by Dreaming

Learning to control robots directly based on images is a primary challen...

0 AJ Piergiovanni, et al. ∙

research

∙ 04/09/2018

Fine-grained Activity Recognition in Baseball Videos

In this paper, we introduce a challenging new dataset, MLB-YouTube, desi...

0 AJ Piergiovanni, et al. ∙

research

∙ 03/16/2018

Activity Detection with Latent Sub-event Hierarchy Learning

In this paper, we introduce a new convolutional layer named the Temporal...

0 AJ Piergiovanni, et al. ∙

research

∙ 12/05/2017

Learning Latent Super-Events to Detect Multiple Activities in Videos

In this paper, we introduce the concept of learning latent super-events ...

0 AJ Piergiovanni, et al. ∙

research

∙ 05/26/2016

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

In this paper, we newly introduce the concept of temporal attention filt...

0 AJ Piergiovanni, et al. ∙

AJ Piergiovanni

Featured Co-authors

Sign in with Google

Consider DeepAI Pro