The Voicebox neural network model, capabilities for working with oral speech. Generation, editing, or stylization according to a model. The authors of the project described it as a breakthrough in speech AI models.

We are one step closer to the immortal celebrity future we have long been promised. Meta has unveiled Voicebox, its generative text-to-speech model that promises to do for the spoken word what ChatGPT and Dall-E, respectfully, did for text and image generation.

Essentially, its a text-to-output generator just like GPT or Dall-E — just instead of creating prose or pretty pictures, it spits out audio clips. Meta defines the system as “a non-autoregressive flow-matching model trained to infill speech, given audio context and text.”

It’s been trained on more than 50,000 hours of unfiltered audio. Specifically, Meta used recorded speech and transcripts from a bunch of public domain audiobooks written in English, French, Spanish, German, Polish, and Portuguese.

Technology newsThat diverse data set allows the system to generate more conversational sounding speech, regardless of the languages spoken by each party, according to the researchers.

“Our results show that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech.”

What’s more the computer generated speech performed with just a 1 percent error rate degradation, compared to the 45 to 70 percent drop-off seen with existing TTS models.

The system was first taught to predict speech segments based on the segments around them as well as the passage’s transcript.

Having learned to infill speech from context, the model can then apply this across speech generation tasks, including generating portions in the middle of an audio recording without having to recreate the entire input

Voicebox is also reportedly capable of actively editing audio clips, eliminating noise from the speech and even replacing misspoken words.
Meta’s AI reportedly outperformed the current state of the art both in intelligibility (a 1.9 percent word error rate vs 5.9 percent) and “audio similarity” (a composite score of 0.681 to the SOA’s 0.580), all while operating as much as 20 times faster that today’s best TTS systems.

When more than 8,000 subreddits went dark for 48 hours earlier this week to protest Reddit’s forthcoming API changes, there were signs the action had an immediate effect on the platform.

On the morning of the first day of the protest, Reddit suffered a “major outage” affecting its desktop and mobile websites, as well as mobile apps.

Days later, company CEO Steve Huffman went on a media offensive where he attempted to cast aggrieved users and moderators, many of whom give countless hours of their free time to make Reddit the vibrant platform it is today, as unreasonable.

“These people who are mad, they’re mad because they used to get something for free, and now it’s going to be not free,” he said in an interview with The Verge.

But beyond those signs, it was hard to tell how much of a practical effect the protest had on the website’s traffic. Now we have a better idea. According to data provided to Engadget by internet analytics firm Similarweb, the impact was small but noticeable.

On the day before the blackout began on June 12th, Similarweb logged more than 57 million daily visits to Reddit across desktop and mobile web clients. By the end of the first day of the protest, daily visits were below 55 million.

Then, at the end of June 13th, Similarweb recorded fewer than 53 million daily visits to Reddit. Compared to the website’s average daily volume over the past month, the 52,121,649 visits Reddit saw on June 13th represented a 6.6 percent drop.

Reddit has been an integral part of the internet for quite a while now, with third-party apps offering a diverse experience for millions of users.

However, the entire ecosystem of third-party Reddit clients has come under threat after the company decided to enforce revised API pricing, which led to third-party Reddit clients like Apollo, RIF, and others deciding to shut down operations altogether.

To nobody’s surprise, millions of subreddits joined forces for a multi-day blackout earlier this week to protest the changes. But these actions may have fallen on deaf ears, especially based on Reddit CEO Steve Huffman’s comments over the past few days.

One of the multiple complaints developers and moderators had with Reddit’s recent decision was about the lack of flexibility in enforcing the new API pricing change deadline, which is scheduled to go into effect on July 1, 2023.

