Skip to content

Problem Formulation

We systematically study the tasks underlying the topic of multimodal music understanding, evaluating the available datasets and presenting a concrete taxonomy for distinguishing between them.

Categorization of Different Caption Intentions

SOTA Dataset Statistics

Language-Audio Music Dataset Statistics

Comparison of Unique Vocabulary in Human Annotation Dataset

Released under the MIT License.