Data Phoenix Digest - ISSUE 6.2024
AI Index Report, Llama 3, Idefics 2, Qdrant Hybrid Cloud, Deploying LLMs Into Production Using TensorRT LLM, Neural Network Diffusion, Intro to DSPy, LLM Evaluation at Scale with the NeurIPS Large Language Model Efficiency Challenge, T-RAG, Direct-a-Video, EMO, DistriFusion and more.
Welcome to Data Phoenix Digest! Dive into our curated selection of the latest news, groundbreaking research papers, and insightful articles in the Data and AI landscape. Also, explore the upcoming events organized by our team and partners, designed to keep you on trends and help you learn cutting-edge technologies and approaches.
Be active in our community and join our Slack to discuss the latest news, events of our community, research papers, articles, jobs, and more...
Data Phoenix's upcoming webinar:
In this webinar, we’ll zoom in on prototyping LLM applications, provide mental models for how to think about using RAG, and how to think about using fine-tuning. We’ll dive into RAG and how fine-tuned models, including LLMs and embedding models, are typically leveraged within RAG applications.
Specifically, we will break down Retrieval Augmented Generation into dense vector retrieval plus in-context learning. With this in mind, we’ll articulate the primary forms of fine-tuning you need to know, including task training, constraining the I-O schema, and language training in detail.
Finally, we’ll provide an end-to-end domain-adapted RAG application to solve a use case. All code will be demoed live, including what is necessary to build our RAG application with LangChain v0.1 and to fine-tune an open-source embedding model from Hugging Face!
AI Highlights of the Past Week
Our headline selection for this week reveals that the increasing concerns about AI safety and evaluations run deeper than the need for government-level regulation. It may have just been that now that government and law-making policies have started falling into place, we have finally turned our sights to the fact that while the leading developers tend to benchmark their models using a reasonably standardized set of task benchmarks, their usage of responsible AI benchmarks isn't as uniform, obscuring any attempt to compare and evaluate features such as safety and trustworthiness across models.
The lack of standardized evaluation frameworks is one of the key takeaways from this year's edition of the AI Index Report, mirrored by the Linux Foundation AI & Data's Open Platform for Enterprise AI (OPEA) launch. Relatedly, there is no guideline to follow when calling a model release "open-source", as demonstrated by the models released under non-traditional opaque licenses, or those that only have the most basic components released as open-source, an issue that the Linux Foundation AI & Data wants to address with the release of the Model Openness Framework (MOF). Finally, Hugging Face has come out as a pioneer in exploring something that should have been obvious for quite some time now: specialized LLM use in sensitive sectors, such as healthcare, should undergo more stringent benchmarking designed specifically to test competence in key areas related to the relevant field. Thus, this week saw the launch of the Open Medical-LLM Leaderboard: a standardized framework enabling the evaluation and comparison of various LLMs' performance in medicine-related tasks and datasets.
Another substantial trend that made its way to the AI Index Report that has been fairly evident for some time now, is that multimodality is the next frontier to conquer. One of the more established derivatives of multimodality is text-to-image generation, as witnessed by the commonplace availability of image-generation models and systems. This week saw the release of the first two Meta Llama 3 models, which power the newest iteration of the Meta AI assistant, which features a completely revamped image generator. Stability AI announced the availability of Stable Diffusion 3 and Stable Diffusion 3 Turbo on the Stability AI Developer Platform API and via the Fireworks AI distributed inference service.
Even Adobe is joining the party with the announcement of AI-powered video editing tools coming to Premiere Pro, powered by a selection of industry-leading models in parallel with Adobe's proprietary Firefly Video Model. Moreover, in a rather interesting crossover between AI-powered media generation and safety, Snapchat announced updated transparency and safety measures, including improved labeling and watermarking strategies. Finally, on the less-traveled road of multimedia-to-text generation, Reka officially announced Core, one of two offerings capable of admitting images, video, and audio as input. Relatedly, Hugging Face came through for the open-source community and officially launched Idefics 2, a multimodal model capable of performing tasks such as answering questions about images, describing visual content, and generating narratives grounded on image inputs.
Some other noteworthy stories this week include:
Boston Dynamics introduced the next-generation, fully electric Atlas humanoid robot: After recently announcing the official retirement of its hydraulic humanoid robot Atlas, Boston Dynamics has announced a fully electric, new-generation Atlas. Initial real-world Atlas testing will happen in partnership with Hyundai before the robot is available to a select group of early customers.
OpenAI celebrates its new Japan office with a Japanese-optimized GPT-4 custom model: OpenAI is expanding its global presence by setting up a new Tokyo-based office. To kickstart this new stage, the company is granting access for local Japanese businesses to a custom Japanese-optimized GPT-4 version.
GovDash secured $10 million in a Series A funding round: GovDash recently completed a successful $10 million Series A funding round led by Northzone, with the participation of existing investor Y Combinator. The company plans to invest the funding into expanding its system of records platform for government contractors.
SERIES AI is Microsoft and Seedrs' new AI-focused startup accelerator: Seedrs and Microsoft will hold a four-week AI accelerator program for startups at the seed to Series A funding levels. The program offers specialized webinars, insights, and resources. Six finalists will be selected to pitch their businesses to leading VCs in the UK at Microsoft’s Reactor Space.
Qdrant Hybrid Cloud: a vector database that can be deployed anywhere: Qdrant Hybrid Cloud is a managed vector database deployable anywhere, on-premises, on the cloud, or even on edge devices. Qdrant Hybrid Cloud's versatility is due to its Kubernetes-native architecture and multiple infrastructure, development, and framework launch partners.
HealthSage AI secured 3 million EUR to scale its open generative AI platform: HealthSage AI announced it secured 3 million EUR in an oversubscribed seed funding round led by Peak, with the participation of additional investors. The funds will enable the company to grow its team, commercialize its offerings, and scale the adoption of its platform.
Langdock secured $3 million to help organizations avoid LLM vendor lock-in: Langdock recently completed a successful $3 million seed funding round led by General Catalyst and European partner La Famiglia, alongside a selection of German founders and angel investors. The company develops a chat interface as an intermediary between organizations and LLM providers.
Qtum deployed 10,000 GPUs to power a blockchain AI ecosystem: Qtum set up 10,000 NVIDIA GPUs to develop, power, and deliver over 10 AI-related experiences on a decentralized model inference computing infrastructure. In parallel, the platform released Solstice, a conversational chatbot, and Qurator, a text-based image generator.
Articles
Deploying LLMs Into Production Using TensorRT LLM
TensorRT-LLM is an open-source framework by Nvidia that helps boost the performance of LLMs in production. Such companies as Anthropic, OpenAI, Anyscale, etc. are already using this framework to serve LLMs to millions of users. It’s time we learned more about it!
LLM Evaluation at Scale with the NeurIPS Large Language Model Efficiency Challenge
How should we evaluate LLMs to figure out if they are good enough to solve the challenges we expect them to? This article dives deep into the problem, sharing learnings from "NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day".
Intro to DSPy: Goodbye Prompting, Hello Programming!
DSPy is a framework designed to solve the fragility problem in LM)-based applications by prioritizing programming over prompting. It allows users to recompile the entire pipeline to optimize it to their specific tasks. Learn more about it!
Introducing Gemma Models in Keras
Gemma is a family of lightweight, state-of-the art open models built based on the same research and technology that Google used to create the Gemini models. Gemma is now available in the KerasNLP collection. Check out its new features, like a new LORA API!
T-RAG = RAG + Fine-Tuning + Entity Detection
Retrieval-Augmented Generation (RAG) is a framework used for building LM-based question answering applications over private enterprise documents. This article explores Tree-RAG (T-RAG), a system that incorporates entity hierarchies for improved performance.
Papers & Projects
Neural Network Diffusion
Diffusion models have achieved remarkable success in image and video generation. In this paper, the authors demonstrate that diffusion models can also generate high-performing neural network parameters, utilizing an autoencoder and a standard latent diffusion model. The benefits of the new approach: improved performance, better cost-efficiency, speed.
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Direct-a-Video is a system that allows users to independently specify motions for one or multiple objects and/or camera movements, as if directing a video. The authors propose a simple yet effective strategy for the decoupled control of object motion and camera movement. Dive in to learn more about their findings!
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
EMO is a framework that uses a direct audio-to-video synthesis approach, bypassing the need for intermediate 3D models or facial landmarks. It allows enhancing the realism and expressiveness in talking head video generation by focusing on the dynamic and nuanced relationship between audio cues and facial movements. Check out the results!
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Diffusion models are efficient in synthesizing high-quality images, yet enormous computational costs act as a limiting factor, resulting in a prohibitive latency for interactive applications. In this paper, the authors propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Explore their learnings!
Magic-Me: Identity-Specific Video Customized Diffusion
Video Custom Diffusion (VCD) is a simple yet effective subject identity controllable video generation framework. VCD reinforces the identity information extraction and injects frame-wise correlation at the initialization stage for stable video outputs with identity preserved to a large extent. Learn more about the authors’ approach!