Posts from this topic will be added to your daily email digest and your homepage feed.
See All AI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Entertainment
Posts from this topic will be added to your daily email digest and your homepage feed.
See All News
Millions of tracks are freely available in datasets, even if they’re not supposed to be.
Millions of tracks are freely available in datasets, even if they’re not supposed to be.
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Terrence O'Brien
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Terrence O'Brien
Atlantic reporter Alex Reisner recently uncovered four datasets of music being used to train AI models and made them fully searchable for the public. Two of the sets are absolutely enormous at 12 million and 9 million tracks. The other two are much smaller, but still represent a significant amount of training data at over 100,000 songs each.
According to Reisner, the sets have been downloaded thousands of times and, while it’s impossible to know exactly who has used them, Google and Stability have both confirmed they have in research papers. Some of the sources, like the Free Music Archive dataset, are free to stream for personal use but require licensing for commercial applications.
While the datasets are freely available on the internet in theory, using them as training data is not as simple as downloading a ZIP file and feeding it to an AI model. As Reisner explains:
Three of the datasets I found are distributed as a list of links to songs on YouTube or Spotify. AI developers download the actual audio using tools that automate the job, some of which allow developers to bypass logins, advertisements, and mechanisms that might earn money or subscribers for creators. Such tools violate the terms of service of these platforms.
Names that pop up in the dataset range from pop stars like Lady Gaga and Fred Again.., to Radiohead, Aphex Twin, Wu-Tang Clan, Bruce Springsteen, and experimental composer Hainbach. You can hop over to the Atlantic’s AI Watchdog site and search through the songs, books, and other media being used to train the world’s AI models yourself.
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Terrence O'Brien
Posts from this topic will be added to your daily email digest and your homepage feed.
See All AI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Entertainment
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Music
Posts from this topic will be added to your daily email digest and your homepage feed.
See All News
A free daily digest of the news that matters most.
This is the title for the native ad
This is the title for the native ad
© 2026 Vox Media, LLC. All Rights Reserved
Sign in to see your notifications or create an account to join the conversation.

Leave a Reply