Skip to content

Experiment

Overview Baseline Models

Music-to-Text Evaluations

Text-to-Music Evaluations Results

DatasetModelFADVMCFADVFMAFADMFMAFADEFMAKLDVendiCS
MusicCapsMusicLM5.7021.5787.39249.721.791.550.28
MusicCapsStableAudio6.9715.6082.21377.021.901.310.31
MusicCapsMusicGen7.0316.2973.22354.070.901.570.29
MusicCapsAudioLDM23.2919.3160.02202.110.611.570.36
MusicCapsMustango1.2722.9655.84161.471.511.480.27
MusicCapsMureka9.45
SongDescriberMusicLM7.2020.5987.12241.950.891.490.28
SongDescriberStableAudio4.4214.9079.16341.921.071.290.31
SongDescriberMusicGen2.6414.6065.74354.070.661.500.35
SongDescriberAudioLDM22.7417.1957.88184.030.621.480.34
SongDescriberMustango2.5818.5056.69170.271.481.460.29
SongDescriberMureka2.429.8535.5847.841.381.380.23
MusicSem(Ours)MusicLM7.2522.5786.97248.421.001.460.27
MusicSem(Ours)StableAudio5.5014.9679.35342.531.151.280.31
MusicSem(Ours)MusicGen3.7514.6768.11229.291.741.500.30
MusicSem(Ours)AudioLDM23.4717.6657.71181.110.551.460.28
MusicSem(Ours)Mustango5.0619.1555.11157.321.461.410.20
MusicSem(Ours)Mureka2.709.6934.7544.751.401.330.18

Inference Latency For T2M Models

Retrieval Eval

CLAP Score Sensitivity Tests

CategoryMetricScore
DescriptiveCd0.55
AtmosphericCa0.36
SituationalCs0.32
ContextualCc0.29
MetadataCm0.36

Released under the MIT License.