Weekly AI Highlights Review: September 24–October 1
News from Meta Connect 2024; the AI Pact has over 100 signatures; Molmo brings open-source SOTA multimodality; OpenAI's Advanced Voice Mode has arrived; Agentic AI may be the next big thing in generative AI; Cloudflare launched more tools to help users fight scrapers; and more.
This week, we have witnessed an interesting trend where open models are catching up to the trend of multimodality. This trend may have started two weeks ago, when Pixtral 12B emerged as the first open-source model from a major AI vendor starting a wave that this week the Allen Institute for AI and Meta have carried on:
AI2's Molmo is bridging the gap between closed and open-source vision-capable models: The Allen Institute for AI has introduced Molmo, a family of open-source multimodal AI models that rival or surpass proprietary systems in performance, featuring novel data collection methods and unique pointing capabilities, with plans for full release of weights, datasets, and code.
Meta Connect 2024: Llama 3.2 brings open multimodal 11B and 90B to the race: Meta announced Llama 3.2, introducing multimodal capabilities to its 11B and 90B models for image analysis tasks. Meta also launched 1B and 3B models for edge devices. The 90B and 11B models will not be available in Europe, following EU regulators' concerns about Meta's data collection practices.
The Llama 3.2 11B and 90B release means that Meta AI can now help users with various tasks involving images as input, such as changing, removing, or adding elements to an image, and even automating the creation of content that users may be interested in posting to their Instagram and Facebook feeds, perhaps contributing to accelerating the recent flood of AI slop that has become prevalent in social media platforms. The company is also giving the AI-powered assistant a voice, although not nearly as sophisticated as OpenAI's Advanced Voice Mode, which started shipping for Pro and Team users, although it lacks the vision and screen-sharing capabilities that were showcased during the AVM initial demo.
Meta Connect 2024: Meta AI's voice mode, visual search, and automated content generation: Meta announced significant upgrades to Meta AI, including voice chat optionally powered by celebrity voice clones, multimodal capabilities enabling image processing and editing, live translation for Reels, expanded 'Imagine me' features, and AI-generated content for social media feeds.
OpenAI's Advanced Voice Mode is finally available for ChatGPT Plus and Team users: OpenAI is expanding its Advanced Voice Mode for ChatGPT to more paying customers, introducing new voices, improved functionality, and customization features, while addressing previous concerns and limitations.
As expected, Meta confirmed that the multimodal models will not be available in the European Union, given the recent controversy that forced the company to pause training models on public data from European users. Additionally, Meta confirmed it will not sign on to the AI Pact soon, citing the same concerns about the 'unpredictable' nature of the European regulatory landscape. Meanwhile, the European Commission celebrated the AI Pact has surpassed its first hundred signatures.
Over 100 organizations have signed the voluntary commitments in the EU's AI Pact: The European Commission released a list of over 100 early adopters of the AI Pact's voluntary commitments, including major tech companies and representatives from various sectors. Still, notable absences like Apple and Meta highlight ongoing challenges in AI regulation compliance across the EU.
Another trend that appears to be growing is that of making language-based assistants more useful by giving them memory, limited autonomy, and more tools to work with. We've known for some time that this was the natural next step, but it seems that phrases like "AI agent" and "agentic AI" are well on their way toward becoming the season's buzzwords.
Letta, a UC Berkeley spin-off, emerged from stealth with $10M in seed funding: Letta, a startup emerging from stealth with $10 million in funding, aims to revolutionize AI development by offering an open-source, model-agnostic platform that enables the creation of stateful AI agents and APIs.
Convergence secured $12M to develop personal AI assistants able to learn new skills: Convergence launched Proxy, a customizable AI assistant powered by Large Meta Learning Models, and raised $12 million in pre-seed funding to develop AI agents capable of learning and remembering tasks through continual interaction with users.
Other notable headlines from this week:
Cloudflare's tools enable websites to monitor and control how AI model providers access their content: Cloudflare has launched AI Audit, a new tool that empowers website owners to monitor, control, and potentially monetize how AI models interact with their content, aiming to rebalance the relationship between content creators and AI companies.
PyTorch Conference 2024 recap: The 2024 PyTorch Conference, held in San Francisco on September 18-19, featured keynotes, talks, and a new Startup Showcase, bringing together AI and ML professionals to discuss trends, network, and advance the PyTorch framework.
Researchers are leveraging AI to identify geoglyphs near Nazca Lines in Peru: Researchers using AI and drone technology have discovered 303 new geoglyphs near Peru's Nazca Lines in just six months, revolutionizing archaeological research and providing new insights into ancient Peruvian cultures.
Snap now leverages Gemini on Vertex AI to power generative AI experiences on MyAI: Snap Inc. has partnered with Google Cloud to integrate Gemini's multimodal AI capabilities into Snapchat's My AI chatbot to enable more innovative features for its 850 million monthly active users.
Meta Connect 2024: Orion, Quest 3S, and new features for the Ray-Ban Meta smart glasses: At Meta Connect 2024, Meta unveiled various AR/VR devices, including the Orion AR glasses concept, an affordable Quest 3S headset, and AI-enhanced Ray-Ban Meta smart glasses, showcasing the company's commitment to advancing consumer AR technology.
Google keeps building on NotebookLM's features with YouTube and audio files support: Google expanded NotebookLM's capabilities by adding support for various input sources, including YouTube videos and audio files, and making it easier to share the recently introduced Audio Overviews.
Runway launched a $5M 'Hundred Film Fund': Runway launched a $5M fund to support up to 100 filmographic projects using AI-generated content. Grants range from $5K to $1M+ and up to $2M in Runway credits. Creators retain full IP rights over their projects but must grant Runway permission to showcase and distribute the finished product.
The Competition and Markets Authority ended its investigation of Amazon and Anthropic: The UK's Competition and Markets Authority (CMA) closed its investigation into Amazon's $4 billion investment in Anthropic, concluding that despite potential material influence, the partnership does not qualify as a 'relevant merger situation' due to Anthropic's insufficient UK market presence.