Data Phoenix Digest

Data Phoenix Digest - ISSUE 20.2023

A Whirlwind Tour of ML Model Serving Strategies, Evaluating RAG Applications with RAGAs, Mastering Customer Segmentation with LLM, Training XGBoost with MLflow Experiments and HyperOpt Tuning, Generative Powers of Ten, ProlificDreamer, PromptBench, PoseGPT, EdgeSAM, Material Palette, and more.

by Dmitry Spodarets

Updated December 20, 2023

Welcome to this week's edition of Data Phoenix Digest! This newsletter keeps you up-to-date on the news in our community and summarizes the top research papers, articles, and news, to keep you track of trends in the Data & AI world!

Be active in our community and join our Slack to discuss the latest news of our community, top research papers, articles, events, jobs, and more...

Join our Slack

📣

Want to promote your company product, event, or job to the Data Phoenix community of Data & AI researchers and engineers?
Click here for details.

Data Phoenix's upcoming webinar:

A Whirlwind Tour of ML Model Serving Strategies (Including LLMs)
There are many recipes to serve machine learning models to end users today, and even though new ways keep popping up as time passes, some questions remain: How do we pick the appropriate serving recipe from the menu we have available, and how can we execute it as fast and efficiently as possible? In this talk, we’re going to go through a whirlwind tour of the different machine learning deployment strategies available today for both traditional ML systems and Large Language Models, and we’ll also touch on a few do’s and don’ts while we’re at it. This session will be jargonless but not buzzwordy- or meme-less.

💡

Follow us on LinkedIn, X, and YouTube to stay updated with our community events and the latest AI & Data industrial news.

Summary of the top articles and papers

Articles

Evaluating RAG Applications with RAGAs
Building a proof-of-concept (PoC) RAG application is easy, but getting its performance production-ready is hard. This article offers a comprehensive framework, with metrics & LLM-generated data, to help you evaluate the performance of your RAG pipeline.

Deploy a Custom ML Model as a SageMaker Endpoint
Developing a machine learning (ML) model involves key steps, from data collection to model deployment. This practical guide covers the basic steps required to develop a custom ML as an Amazon SageMaker endpoint. Check it out!

Improve Your Stable Diffusion Prompts with Retrieval Augmented Generation
Have you ever wondered how to craft prompts for high-quality images? This article explains how to use RAG to enhance the prompts sent to Stable Diffusion. Create your own AI assistant for prompt generation with LLMs.

Training XGBoost with MLflow Experiments and HyperOpt Tuning
Whether you like it or not, MLOps is a critical part of building efficient, scalable, and resilient machine learning systems. Even after realizing the importance of MLOps, finding where to start is a challenge. In this article, the author shares his perspective on the matter.

Mastering Customer Segmentation with LLM
What's your strategy for your next customer segmentation project? Will you rely on K-Means or K-Prototype, or will you try to utilize LLMs? What approach is the most efficient? Read this guide to learn why the model created with the help of the LLMs stands out.

How to Finetune Mistral AI 7B LLM with Hugging Face AutoTrain
As LLM research advances globally, models like Mistral AI 7B LLM are becoming more accessible. This open-source model outperforms LlaMA 2 13B across all benchmarks, thanks to its sliding window attention mechanism and ease of deployment. Check it out!

Papers & projects

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
Score distillation sampling excels in text-to-3D generation, but it suffers from over-saturation, over-smoothing, and low-diversity problems. ProlificDreamer is a new approach to resolving these issues through variational score distillation (VSD) and other methods.

PromptBench: A Unified Library for Evaluation of Large Language Models
PromptBench is a unified library to evaluate LLMs. It consists of several key components: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluation protocols, and analysis tools.

DiffusionLight: Light Probes for Free by Painting a Chrome Ball
DiffusionLight is a technique to estimate lighting in a single input image. It applies diffusion models to render a chrome ball into the input image. Uncover a surprising relationship between the appearance of chrome balls and the initial diffusion noise map.

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM
EdgeSAM is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. EdgeSAM achieves a 40-fold speed increase compared to the original SAM. Check it out!

Generative Powers of Ten
Generative Powers of Ten is a new method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene. It enables deeper levels of zoom than traditional super-resolution methods.

PoseGPT: Chatting about 3D Human Pose
PoseGPT is a multi-model LLM designed for chatting about human pose that produces 3D human poses (SMPL pose parameters) upon user request. PoseGPT features a specialized SMPL projection layer trained to convert language embeddings into 3D human pose parameters. Learn more about this work and stay tuned for more!

Deep-learning-based Acceleration of MRI for Radiotherapy Planning of Pediatric Patients with Brain Tumors
DeepMRIRec is a deep learning-based method for MRI reconstruction from undersampled data acquired with RT-specific receiver coil arrangements. DeepMRIRec reduces scanning time by a factor of four producing a structural similarity score surpassing the evaluated state-of-the-art method (0.960 vs 0.896). Give it a run!

Material Palette: Extraction of Materials from a Single Image
Material Palette is a novel method for extracting Physically-Based-Rendering (PBR) materials from a single real-world image. It builds on existing synthetic material libraries with SVBRDF ground truth, but also exploits a diffusion-generated RGB texture dataset to allow generalization to new samples using unsupervised domain adaptation (UDA).

DataPhoenix is free today. Do you enjoy our digests and webinars? Value our AI coverage? Your support as a paid subscriber helps us continue our mission of delivering top-notch AI insights. Join us as a paid subscriber in shaping the future of AI with the DataPhoenix community.

Upgrade to paid

by Dmitry Spodarets

Updated December 20, 2023

Subscribe to Our Newsletter

Data Phoenix Digest - ISSUE 20.2023

Data Phoenix's upcoming webinar:

Summary of the top articles and papers

Articles

Papers & projects

Stable Video 4D showcases Stability AI's research into multi-angle video generation

Mistral AI released Mistral Large 2, a multilingual, tool use-capable, open model of its own

The FTC is gathering information on surveillance pricing products and services

A new Meta AI update brings multilingual support, Llama 3.1 models, and "Imagine me" prompts

Meta wants the open-source Llama 3.1 405B to compete with heavyweights like GPT-4 and Claude 3.5 Sonnet