Google dominated the news this week with a slew of announcements centered on the upcoming Gemini 2.0 model family and the company's current research on agentic AI, which has long been touted as the natural next step in the progression toward artificial general intelligence (AGI). Even without talking about AGI, given that there still doesn't seem to be a clear agreement about what it is or what it should look like, the expectation has always been that the systems that are already proficient at generating text that passes as human-created be able to take part in increasingly complex workflows with progressively diminishing human supervision.
At the core of Google's new launches and previews lies Gemini 2.0 Flash, the company's first next-gen model capable of processing multimodal inputs (text, image, video, audio) and natively supporting text, image, and text-to-speech outputs. Gemini 2.0 Flash also supports tool use, which enables it to call Google services such as Search and execute code and third-party functions through function calling. The experimental version is text-output only and is available in the Gemini API and as a chat-optimized model in the Gemini apps. Google expects the production version to be ready in early 2025.
The company also announced a Gemini feature called 'Deep Research' that allows Gemini 1.5 Pro to generate comprehensive research reports on any topic users may have questions about. When prompted with a question, Gemini designs a research plan that users can review and modify as they see fit. Once the plan is approved, Gemini executes it and presents its findings neatly organized as a research report including sources that can be exported to Google Docs. The reports can be iterated on, refined, and expanded using the chat window.
To showcase Gemini 2.0 Flash's capabilities, Google previewed a series of agents with different purposes. The most notable are Project Mariner, an agent that can browse the web and perform some tasks on behalf of the user, and Jules, a coding agent that integrates into a GitHub workflow and can take over relatively simple workflows like fixing Python and JavaScript bugs. Limited access to Project Mariner and Jules is available via trusted tester waitlists. Other agents not in the testing stage include a gaming companion and agents applied to robotics.
Gemini 2.0 Flash is now powering two AI products on which Google has placed heavy bets: NotebookLM and AI Overviews. As part of its upgrade, NotebookLM now features a redesigned look, allows users to interact with the AI hosts of newly created Audio Overviews, and has a paid tier for organizations called NotebookLM Plus. As for AI Overviews, Google says it is testing AI Overviews capabilities to answer more complex questions, perform multi-step workflows, and even process multimodal inputs. This means that an AI Overviews feature capable of solving complex math and coding questions may be generally available soon.
Other noteworthy headlines this week:
Grok just got a new image generator and a free usage tier: xAI launched Aurora, a new model with image generation and understanding capabilities. According to xAI, Aurora was designed to deliver photorealistic renderings and remarkable instruction-following. X users can now try Aurora's capabilities, regardless of their subscription status.
European AI hyperscaler Nscale raises $155M in Series A funding to keep up with demand: AI hyperscaler Nscale has announced that it has raised $155 million in its Series A funding round led by Sandton Capital Partners to fuel its ambitious growth plans across Europe and North America.
Reddit began testing Answers, an AI-powered conversational search interface: Reddit has started testing Answers, a new AI-powered search feature that enables users to ask questions in plain English about any topic they care about to surface relevant conversation summaries, links to related communities, inline answers written by real redditors, and follow-up questions.
OpenAI-backed Speak has achieved a $1B valuation on its recent $78M Series C round: Speak, an AI-powered English learning platform specializing in helping its users achieve spoken English fluency, recently secured $78M in Series C funding at a valuation of $1 billion.
Microsoft unveiled Phi-4, a new small language model proficient in math and complex reasoning: Phi-4 is Microsoft's latest small language model in the Phi family. The model was trained to focus on tasks requiring complex reasoning skills, such as competition math problems. The model is available for research purposes at the Azure AI Foundry.
Liquid AI has raised $250M to advance its liquid foundation models: Liquid AI, a startup focused on the development of liquid foundation models (LFM) that are efficient and flexible alternatives to the now-standard generative pre-trained Transformer models, has raised a $250M Series A funding round led by AMD that gave the company a $2B valuation.
Cartesia's $27M seed round will enable it to build better alternatives to transformer-based models: Cartesia, having raised $27 million in seed funding, is developing innovative AI architectures that enable more efficient, long-memory, and multi-modal intelligence with their breakthrough State Space Model (SSM) technology, exemplified by their hyper-realistic Sonic voice generation model.
Claude 3.5 Haiku is now generally available for Claude users: Anthropic has confirmed the new version of its smallest model, Claude 3.5 Haiku, is now available in the Claude web experience and mobile apps. First announced in October, Claude 3.5 Haiku is intended to deliver Claude 3 Opus' performance at Claude 3 Haiku's speed.
Poland-based Vivid Mind has raised $200K to detect dementia using a voice-based test: Vivid Mind, a startup that has developed an AI-assisted voice test capable of detecting early-stage dementia with 90% accuracy, has raised $200,000 in pre-seed funding to continue testing and obtain regulatory approvals for its innovative diagnostic tool.
Meta showcased Video Seal, a model that embeds edit-resistant watermarking into videos: Meta has released an open-source neural watermarking model called Video Seal that embeds a robust, imperceptible watermark and optional message in videos. The model aims to address the limitations of existing video watermarking solutions and support responsible AI development.
Comments