This Week in AI: May 20–26
Undoubtedly, the final vote of approval on the AI Act by the European Council is by far the most momentous announcement this week. The European Union's Parliament and Council agreed on the final version of the regulation after a grueling 40-hour negotiation held in December 2023. Then, in March, the Parliament gave its final endorsement for the proposal to become legislation. Parliament approval was followed by the Council's final vote last Tuesday, May 21. Consequently, now that the AI Act has been approved, the legislation is expected to appear in the EU’s Official Journal shortly and will enter into force twenty days after its publication. Applications of the AI Act are not expected until two years after entry into force for most cases.
Since the final negotiation, the content of the AI Act has remained mostly stable, with its priority being the protection of fundamental rights, democracy, the rule of law, and environmental sustainability from high-risk AI. However, by establishing relevant additional protections and necessary exceptions, the AI Act looks to protect without hindering the innovation that has established the European Union as a leader in the field. For instance, a core tenet of the legislation is that failure to comply with the established obligations will result in hefty fines. However, the bill accounts for startups and small and medium-sized organizations by recognizing the possibility that these entities only pay a proportional amount of the stipulated fine.
The timely arrival of the final vote on the AI Act results is especially pertinent when one considers that in the course of the very same week, the UK's AI Safety Institute (AISI) published the results of its research into the capabilities and vulnerabilities of some of the leading commercially available LLMs. The models were subject to four different evaluations to find out if they could provide biology and chemistry knowledge relevant to beneficial, but also harmful purposes; act as autonomous agents that could escape human oversight; be leveraged to enable cybersecurity attacks; and the extent to which LLMs are still vulnerable to the most common jailbreaking attempts.
Most of the results are unsurprising, since it was more or less evident that some of the top offers currently in the market can deliver knowledge at an expert level, but are not too proficient at agentic tasks, whether generalistic or specialized (in cybersecurity). Thus, it is expected that the evaluated LLMs have biology and chemistry knowledge comparable to a human expert at a PhD level and that there are some niche topics where the models can outperform humans. On the other hand, no evaluated model could go beyond high-school-level cybersecurity challenges or perform software development tasks that would take a human expert over four hours.
Worryingly enough, most models could be jailbroken using rather simple requests such as getting the model to start its response with compliance-suggesting phrases such as "Sure, I’m happy to help", and some even complied with several toxic requests without the need for an attack. When evaluated at a 5-attempt rate, all the models under evaluation delivered near-perfect compliance on harmful queries. The AISI evaluations offer unique insights into model behavior that improve our understanding of their capabilities and vulnerabilities.
However, it is nearly impossible to provide a clear-cut answer regarding a specific model's safety or lack thereof, as it is almost impossible to fully understand the potential harmful behaviors that a model could incur, making AISI's mission even more urgent. The Institute is already planning its next phase of evaluations, which will further deepen our understanding of LLM capabilities and vulnerabilities.
Other noteworthy headlines for the week include:
Slack seems to have sneakily opted-in its customers to AI training: Slack faced backlash after a user figured that Slack's Privacy Principles allowed analyzing customer data for non-generative AI models. This led to confusion surrounding Slack's user privacy policies, especially considering Slack now offers an LLM-powered service, Slack AI.
OpenAI shared more about how it cast ChatGPT's voices: As OpenAI prepares to launch a new, GPT-4o powered Voice Mode for ChatGPT, the company has decided to share some insights into the casting process behind Breeze, Cove, Ember, Juniper, and Sky, the voices that have come to characterize ChatGPT.
Scale has raised $1B to secure data abundance for AI: Scale has announced it closed a $1 billion Series F round. The Series F was led by existing investor Accel, with participation from existing and new investors. The funds will be used so Scale's data foundry can accelerate the abundance of frontier data required by current AI demands.
Pinecone's serverless vector database is generally available on AWS: Pinecone serverless is now generally available on AWS. Four months into the public preview launch, Pinecone serverless has enabled over 20,000 companies to build fast, accurate, cost-effective generative AI. Pinecone also launched the Private Endpoints for AWS PrivateLink public preview.
Stack AI raised $3M to connect the latest AI innovations with the most urgent applications: Y Combinator-backed Stack AI recently announced the closure of its oversubscribed $3 million seed funding round, led by Gradient Ventures. Stack AI will leverage the raised funds to become the go-to platform for deploying AI solutions.
Pixevia secured €1.5M to expand the presence of its smart stores: Pixevia, a Lithuanian startup offering an AI-powered platform for a full suite of retail applications, has raised €1.5M to fuel its expansion in the US and Europe.
Paris-based startup H raised $220M to build AI agents in the race towards AGI: H, formerly Holistic AI, raised a $220M seed round to develop multi-agent AI. Backed by billionaires like Eric Schmidt, top VCs like Accel, and companies like Amazon, H plans to buy massive computing power to train large AI models quickly and pursue various markets.
Neuralink recently got approval to implant its chip in a second patient: Neuralink has received FDA approval to implant its brain-computer interface device in a second human participant after resolving an issue where most of the ultrathin wires in its first participant became dislodged, with a key fix being to embed the wires deeper into the brain's motor cortex.
OpenAI scored a strategic partnership with News Corp: News Corp and OpenAI announced a pioneering multi-year partnership that provides OpenAI access to News Corp's global news content and journalistic expertise, enabling OpenAI to deliver AI-enhanced access to journalistic content that upholds the highest standards of world-class journalism.
Patronus AI recently closed a $17M Series A funding round: Patronus AI, a platform enabling automated evaluation of large language models at scale, has raised a $17 million Series A to further its mission of providing scalable oversight as generative AI capabilities rapidly advance across industries.
TikTok will launch an AI-powered business solutions suite: TikTok announced a series of AI-driven solutions aimed at helping brands make the most out of the platform by optimizing content creation, automating performance increases, simplifying measurement tasks, and enhancing ads to boost engagement and interaction.
FCC proposes requiring disclosure of AI-generated content in political ads: The FCC has proposed requiring broadcasters to disclose when political ads on TV and radio contain AI-generated content to promote transparency and protect consumers from potential deception as the use of AI technology in creating political ads is expected to increase.
Orca AI has secured $23M to push autonomous ship operations forward: Orca AI, an AI platform that enhances ship navigation safety and efficiency through automated watchkeeping, raised $23 million to further develop autonomous ship operations technology after successfully powering the world's first autonomous commercial voyage in 2022 in partnership with NYK.
DeepL raised an astounding $300M at a $2B valuation to drive B2B growth: DeepL, a German AI startup focused on enterprise language translation and writing tools, raised $300 million at a $2 billion valuation led by Index Ventures to drive sales, marketing, and R&D efforts as it aims to scale its offering amid rising competition in the AI translation space.
Microsoft unveiled a new category of AI-optimized Windows PCs: Microsoft has unveiled Copilot+ PCs, a new category of Windows laptops and tablets designed around dedicated AI accelerator chips that enable breakthrough AI-powered experiences like photographic memory recall, real-time AI image generation, optimized audio and video settings, and more.