MLCommons presents free datasets for speech recognition

Dmitry Spodarets

· Dec 17, 2021

MLCommons presents free datasets for speech recognition

MLCommons, an organization that aims to build free, open-source AI development tools and resources, has just released the People’s Speech Dataset and the Multilingual Spoken Words Corpus. This is huge, given the amount of work the team had to do to put it together.

According to the MLCommons team, the People’s Speech Dataset is one of the world’s most complex English speech datasets licensed for academic and commercial usage, with huge amount of hours of recording materials. In the meantime, the Multilingual Spoken Words Corpus ranks among the biggest and best audio speech datasets featuring keywords in 50+ languages.

With the release of the People’s Speech Dataset and the MSWC, the developers now have new tools to add to their toolbox, enabling them to design and build their own speech recognition systems on a smaller budget and having fewer technical challenges than ever before.

Is the future of AI/ML open-source, then? Let us know in the comments!

Comments

Prometheus raises $12B to build an AI to automate physical manufacturing processes

Jeff Bezos's physical AI startup Prometheus has raised $12B at a $41B valuation to build AI tools that automate the design and manufacturing of complex physical products.

Jun 16, 2026

by Ellie Ramirez-Camara

News

Niteshift raises $7M to build the cloud infrastructure layer for AI coding agents

Niteshift, founded by two Datadog veterans, has raised $7M to build a model-agnostic cloud infrastructure layer for AI coding agents, betting that enterprises will want to avoid vendor lock-in with the major AI labs.

Jun 10, 2026

by Ellie Ramirez-Camara

News

PhysicsX raises $300M Series C at $2.4B valuation to scale AI for engineering and manufacturing

PhysicsX, a London-based AI engineering startup, has raised $300M at a $2.4B valuation to scale its physics simulation platform across industries like aerospace, semiconductors, and automotive.

Jun 08, 2026

by Ellie Ramirez-Camara

News

Suno raised a $400M Series D at a $5.4B valuation despite ongoing lawsuits

Suno raised $400 million at a $5.4 billion valuation—more than doubling its worth in seven months—despite facing copyright lawsuits from Universal Music Group and Sony alleging unauthorized use of over 61,000 copyrighted works in its AI training data.

Jun 03, 2026

by Ellie Ramirez-Camara

News

Codex now boasts plugins for white-collar work and other new features for Enterprise users

OpenAI expanded Codex with six role-specific plugins for jobs like sales and investment banking, a Sites feature for sharing work as hosted interactive webpages, and inline Annotations for targeted edits, as non-developer users grow three times faster than developers on the platform.

Jun 02, 2026

by Ellie Ramirez-Camara

Subscribe

MLCommons presents free datasets for speech recognition

Comments

Read Next

Prometheus raises $12B to build an AI to automate physical manufacturing processes

Niteshift raises $7M to build the cloud infrastructure layer for AI coding agents

PhysicsX raises $300M Series C at $2.4B valuation to scale AI for engineering and manufacturing

Suno raised a $400M Series D at a $5.4B valuation despite ongoing lawsuits

Codex now boasts plugins for white-collar work and other new features for Enterprise users