research
          
      
      ∙
      06/08/2023
    Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Conversation agents fueled by Large Language Models (LLMs) are providing...
          
            research
          
      
      ∙
      12/06/2022
    Fine-tuned CLIP Models are Efficient Video Learners
Large-scale multi-modal training with image-text pairs imparts strong ge...
          
            research
          
      
      ∙
      10/06/2022
    MaPLe: Multi-modal Prompt Learning
Pre-trained vision-language (V-L) models such as CLIP have shown excelle...
          
            research
          
      
      ∙
      07/07/2022
     
             
                     
  
  
     
                             
                             share
 share