Critiqs

AI Undergraduates Release Podcast Voice Model

ai-undergraduates-release-podcast-voice-model

Two university students with limited artificial intelligence backgrounds have introduced a new AI model capable of producing podcast style audio clips similar to those generated by advanced platforms. The creators, based in Korea, were driven to offer more flexibility in voice control and creative script options after exploring what was possible in the current market.

The voice synthesis industry continues to expand quickly, attracting significant investments as demand for realistic artificial voices grows. Startups advancing in voice AI, such as ElevenLabs and PlayAI, have already secured hundreds of millions of dollars from venture capital just this past year.

Inside Nari Labs’ Dia Model

The pair behind the project, operating as Nari Labs, relied on Google’s TPU Research Cloud to power the training process for their model, which is called Dia. They spent three months on research and development, ultimately creating a model that boasts 1.6 billion parameters, making it powerful for its size.

Dia’s primary strength is its ability to generate dialogue from user-written scripts, giving individuals the option to adjust tone and embed natural pauses, laughter, or even coughing for greater realism. Users can supply style prompts or attempt to clone a specific person’s voice, while the system is designed to run well on modern computers equipped with a modest 10GB of VRAM.

The AI dev communities on Hugging Face and GitHub can freely access Dia, and public demos have shown that it reliably creates two person conversations on virtually any topic. Its voice replication and generation are on par with larger competitors, and the voice cloning process stands out for its straightforward usability.

Still, the open nature of Dia introduces challenges, especially around responsible use and ethical concerns. While Nari warns users not to misuse the model for deception or fraud, it disclaims responsibility for any resulting abuse.

Another unresolved issue is the dataset used to train Dia, which remains undisclosed. Observers speculate that copyrighted materials may have been included, a common yet controversial method in the development of major AI models.

The debate continues over the legality of training AI systems on copyrighted data, with some asserting fair use protects such actions and others arguing that existing laws prohibit it. Regardless, Nari Labs plans to expand Dia’s language capabilities, enhance its platform with interactive features, and eventually publish a technical deep dive explaining the model’s inner workings.

SHARE

Add a Comment

What’s Happening in AI?

Stay ahead with daily AI tools, updates, and insights that matter.

Listen to AIBuzzNow - Pick Your Platform

This looks better in the app

We use cookies to improve your experience on our site. If you continue to use this site we will assume that you are happy with it.

Log in / Register

Join the AI Community That’s Always One Step Ahead