MLCommons presents free datasets for speech recognition

MLCommons, an organization that aims to build free, open-source AI development tools and resources, has just released the People’s Speech Dataset and the Multilingual Spoken Words Corpus. This is huge, given the amount of work the team had to do to put it together.

According to the MLCommons team, the People’s Speech Dataset is one of the world’s most complex English speech datasets licensed for academic and commercial usage, with huge amount of hours of recording materials. In the meantime, the Multilingual Spoken Words Corpus ranks among the biggest and best audio speech datasets featuring keywords in 50+ languages.

With the release of the People’s Speech Dataset and the MSWC, the developers now have new tools to add to their toolbox, enabling them to design and build their own speech recognition systems on a smaller budget and having fewer technical challenges than ever before.

Is the future of AI/ML open-source, then? Let us know in the comments!