OpenAI just announced a new tool called Voice Engine. This is a voice cloning technology that can imitate any speaker by analyzing 15 seconds of audio samples. The company says it can produce “natural-looking speech” and “emotionally rich and authentic sounds.”
The technology is based on the company’s technology and has been under research since 2022. OpenAI is already using a version of this toolset to support the default voices available in the current text-to-speech API and Read Aloud functionality. The company’s official blog has a bunch of samples that sound pretty close to the real thing. I encourage you to listen to them and imagine the possibilities, both good and bad.
OpenAI says it sees the technology being useful for reading assistance, language translation, and helping those with sudden or degenerative speech disorders. The company has come up with a way to help patients with speech impediments by creating speech engine clones extracted from audio recordings for school projects.
Despite the potential benefits, bad actors will certainly misuse this technology to engage in some serious deepfake stupidity. With that in mind, the speech engine isn’t quite ready for prime time, as there are serious privacy concerns that must be addressed before a full rollout.
OpenAI acknowledged that the technology carries “serious risks, which are of particular concern during an election year.” The company said it incorporated feedback from “U.S. and international partners in government, media, entertainment, education, civil society and more” to ensure the product launches with minimal risk. All preview testers agree to OpenAI’s usage policy, which prohibits impersonating others without consent or legal rights.
Additionally, anyone using the technology must disclose to viewers that the sounds are generated by artificial intelligence. OpenAI has implemented security measures such as watermarks to track the source of any audio and “active monitoring” of how the system is used. When the product officially launches, there will be a “banned voice list” that will detect and block AI-generated speakers that too closely resemble celebrities.
OpenAI remains tight-lipped as to when it will launch. TechCrunch It looks like it will weaken. The speech engine costs $15 per 1 million characters, which equates to approximately 162,500 words.That’s about the length of a Stephen King novel The Shining. This certainly sounds like a budget-friendly way to get your audiobook done. Marketing materials also mention that an “HD” version will cost twice as much, but the company hasn’t yet detailed how that will work.
OpenAI is making big moves this week. It just announced another partnership with its good friend Microsoft to build an AI-based supercomputer called “Stargate.” The project will reportedly cost $100 billion, .
This article contains affiliate links; if you click on such links and make a purchase, we may earn a commission.