MuseChat: A Conversational Music Recommendation System for Videos

Tuesday, April 30, 11:15-11:35 am
Room 236
Presenter: Zhikang Dong
Modality: Traditional Talk


Music recommendation for videos attracts growing interest in multi-modal research. However, existing systems focus primarily on content compatibility, often ignoring the users' preferences. Their inability to interact with users for further refinements or to provide explanations leads to a less satisfying experience. We address these issues with MuseChat, a first-of-its-kind dialogue-based recommendation system that personalizes music suggestions for videos. Our system consists of two key functionalities with associated modules: recommendation and reasoning. The recommendation module takes a video along with optional information including previous suggested music and user preferences as inputs and retrieves appropriate music matching the context. The reasoning module, equipped with the power of Large Language Model (Vicuna-7B) and extended to multi-modal inputs, is able to provide a reasonable explanation for the recommended music. To evaluate the effectiveness of MuseChat, we built a large-scale dataset, conversational music recommendation for videos, that simulates a two-turn interaction between a user and a recommender based on accurate music track information. Experimental results show that MuseChat achieves significant improvements over existing video-based music retrieval methods and offers strong interpretability and interactability.


Zhikang Dong

Zhikang Dong is a Ph.D. student at Stony Brook University. His research topics include Multimodal Learning, Large Language Model and AI for science. He received his master's degree at Columbia University in New York City. He has published papers at prestigious conferences such as CVPR, NeurIPS, WACV, and IEEE conferences. During his spare time, he loves basketball, Japanese animae, and video games.


Check out the Program page for the full program!

Questions About the Conference?

Check out our FAQ page for answers and contact information!