InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

Dec 1, 2023·

Bingbing Wen

Zhengyuan Yang

Jianfeng Wang

Zhe Gan

Bill Howe

Lijuan Wang

· 1 min read

PDF

Abstract

We present InfoVisDial, a comprehensive visual dialogue dataset created by bridging large multimodal and language models to enable informative conversations about visual content.

Type

Publication

Internship at Microsoft Azure AI

We present InfoVisDial, a comprehensive visual dialogue dataset created by bridging large multimodal and language models to enable informative conversations about visual content. This work was conducted during an internship at Microsoft Azure AI.

Last updated on Dec 1, 2023

Visual Dialogue Multimodal Models Dataset Microsoft Research

Authors

Bingbing Wen

PhD Student

← OmniMotionGPT: Animal Motion Generation with Limited Data Mar 1, 2024

CCQ: cross-class query network for partially labeled organ segmentation Feb 1, 2023 →