News

This Week in AI: June 17–23

DeepMind's Generative Media team has developed a video-to-audio (V2A) technology; Elevenlabs created a demo that allows users to generate sound effects for any video; Anthropic released Claude 3.5 Sonnet; SoftBank is offering customers a one-year Perplexity Pro trial; and more.

by Ellie Ramirez-Camara

Updated June 24, 2024

This week, the excitement spurred by the release of Kling and Luma's Dream Machine continues. Runway started the week by offering an early glimpse of the upcoming Gen 3 Alpha video generation model. Gen 3 Alpha was trained in-house by a multidisciplinary team of experts to ensure the model understands cinematic and style terminology. Moreover, its dataset contains painstakingly annotated content for the complete videos and the scenes within them, resulting in a highly performant model that can handle everything from dramatic transitions to photorealistic humans.

The next logical step in the generative AI industry is to address the lack of sound in most videos generated using AI approaches. Perhaps in an attempt to beat everyone else in that game, Google DeepMind and Elevenlabs scurried to preview their research on the topic. Google DeepMind's Generative Media team has developed a video-to-audio (V2A) technology that can generate synchronized soundtracks for video input, with the ability to create multiple audio options guided by optional text prompts. Elevenlabs, on the other hand, showcased the power of its ElevenLabs Text to Sounds Effects API by creating a demo that allows users to generate sound effects for any video input.

This week was also a reminder that the recent focus on video-generation models has only built on, and not replaced other aspects of the generative AI's potential, including image generation and proficiency in computer vision tasks. As evidence, Meta released and previewed a selection of its early research artifacts, including multimodal generation, text-to-music conversion models, and tools to increase geographic representation in text-to-image models. Moreover, Microsoft's Azure AI team has released Florence-2, a versatile vision foundation model. Florence-2 uses a unified prompt-based approach to excel at various vision and vision-language tasks, outperforming larger task-specific models in several benchmarks.

The cherry on top of this week's headlines on generative AI models was Anthropic's release of Claude 3.5 Sonnet, a mid-tier model surpassing even Claude 3 Opus, Anthropic's top-tier model, on several benchmarks and computer vision tasks such as chart and graph interpretation and transcription from image inputs. Additionally, Anthropic announced it would upgrade the web experience at claude.ai with Artifacts a dedicated window to organize the content users have asked Claude to generate, such as code snippets, text documents, or website designs. With the introduction of Artifacts, Anthropic looks to transform the Claude web experience into a dynamic workspace empowering users to edit, refine, and build on Claude's generations with ease.

Finally, the launch of DeepSeek Coder V2, an open-source Mixture-of-Experts model for coding tasks, shows that even the LLM space still holds room for competition.

Other notable headlines include:

Waabi secured $200M to develop fully autonomous driverless trucks: Waabi, a pioneering generative AI company for autonomous vehicles, has raised $200 million in Series B funding to launch fully driverless trucks in 2025, leveraging its innovative end-to-end AI system and advanced simulator to achieve unprecedented progress in the autonomous trucking industry.

SoftBank is offering its mobile customers a free one-year subscription to Perplexity Pro: As part of a strategic partnership with Perplexity AI, SoftBank is offering its mobile customers a free one-year subscription to Perplexity Pro. Perplexity will soon display answers to simple queries in visually appealing cards to keep users within its platform.

Autify raised $13M and launched Zenes, an AI-powered software quality assurance agent: Autify, an AI-powered test automation platform, recently raised $13 million in Series B funding to expand into new markets like Korea, and further develop Zenes, its generative AI-powered quality engineering product that automatically generates test case code by analyzing product requirements.

TikTok launched Symphony Avatars, AI Dubbing, and the Symphony Collective: TikTok is launching more generative AI tools including realistic digital avatars and language dubbing capabilities to help creators and brands produce more localized content at scale for TikTok's global audience.

FINBOURNE Technology announced a £55M Series B round: FINBOURNE Technology has secured £55m in Series B funding to expand its AI-ready investment management solutions globally, offering financial firms a cloud-native platform that streamlines operations and prepares them for the future of AI-driven efficiencies in the investment sector.

CuspAI emerged out of stealth after securing $30M in seed funding: CuspAI, a company developing an AI-powered platform to accelerate the design of novel materials for sustainability and clean energy, came out of stealth after raising a $30 million seed funding round and securing a partnership with Meta.

Apolitical received a $5M grant from Google.org to expand its Government AI Campus program: Apolitical's Government AI Campus, a knowledge hub for civil servants, has received a $5 million grant from Google.org to expand its AI education offerings. The platform aims to reach one million public sector workers within two years.

An AI-assisted blood test may provide Parkinson's early detection: Researchers have developed an AI-powered blood test that could provide early diagnosis for Parkinson's disease long before symptoms appear, potentially revolutionizing early detection and treatment of this neurodegenerative condition.

Developers now have the option to use Amazon SageMaker's fully managed MLflow: Amazon has made MLflow, an open-source MLOps platform, generally available as a fully managed capability on Amazon SageMaker, offering streamlined ML lifecycle management with comprehensive experiment tracking, unified model governance, and efficient server management across most AWS Regions.

OpenAI co-founder Ilya Sutskever launched Safe Superintelligence Inc.: OpenAI co-founder Ilya Sutskever has launched Safe Superintelligence Inc. (SSI), a new for-profit AI company focused solely on developing safe superintelligent AI systems using an engineering-based approach instead of the traditional guardrails enforced on most AI systems.

Poolside is the latest Paris-based company to negotiate a nine-figure funding round: Paris-based AI startup Poolside.ai reportedly seeks to raise $400 million at a $2 billion valuation, potentially joining other French AI companies in securing substantial funding.

AI-powered English tutor Speak has raised a $20M Series B extension: AI language learning startup Speak secured $20M in Series B-3 funding, reaching a $500M valuation, to further develop its voice-to-voice interface that has helped over 10 million users across 40+ countries learn English through pattern recognition and repetition rather than memorization.

OpenAI acquired Rockset to bolster its products' retrieval infrastructure: OpenAI acquired Rockset, a real-time search and analytics database company, to enhance its data retrieval infrastructure for AI applications. As a result of the acquisition, some Rockset team members will join OpenAI, leading Rockset to support customers in transitioning off its platform.

by Ellie Ramirez-Camara

Updated June 24, 2024