Stability AI, an updated version of its music generation platform. The system allows users to create up to three minutes of audio via text prompts. This is about the length of an actual song, so it also has an intro, a full chord progression, and an outro.
First the good news. Three minutes is huge. Earlier versions of the software had a maximum limit of 90 seconds. Imagine you could make a fake birthday song in the style of a Rob Thomas/Santana song. Another boon? The tool is free and publicly available through the company’s website, so use it.
Introducing Stable Audio 2.0 – a new model capable of producing high-quality, complete tracks with a coherent musical structure in 44.1 kHz stereo for up to three minutes from a single cue.
Explore models and start creating for free: https://t.co/E9ZIGagmPf
read… pic.twitter.com/rFGb0KpdeX
— Stability Artificial Intelligence (@StabilityAI) April 3, 2024
It primarily operates via text prompts, but there is an option to upload audio clips as well. The system will analyze the clip and produce similar content. All uploaded audio must be copyright-free, so it is not intended to imitate pre-existing content. Instead, it might work for humming a drum part or extending a 20-second clip into something longer.
Now, the bad news. This is still artificial intelligence generated music. It’s cool as a theme and a symbol of a possible future, which is great for tinkerers but terrible for musicians, but that’s about it. The songs sound really good at first, until the seams start to appear. Then things get a little creepy.
For example, the system likes to add voices, but not any known human language. I guess it’s text in any language that constitutes an AI-generated image. The voices sound vaguely like real people, and at times they sound like Gregorian chant filtered through outer space. It’s right in the middle of that uncanny valley. edge “Soulless and eerie,” comparing them to the sounds of whales. This is the trajectory.
Stable Audio 2.0 makes the odd little mistake that all these systems make, regardless of output type. Parts may disappear without a trace, replaced by something else. Sometimes melodic elements are suddenly doubled, like an audio version of those extra fingers in the AI-generated image.
And, well, it’s all boring. This is music in name only. Without human connection, what’s the point? I listen to music to understand what another person or group of people is thinking. Despite constant claims that Artificial General Intelligence (AGI) is just months away, no one is getting in here.
So for anyone who makes silly birthday videos or bank-reserve music, this technology is an absolute gift. For anyone else? shrug. From personal experience I can say one thing: it’s pretty fast. In a minute or so the system concocted an absolutely terrible big band song about my cat.