Data Phoenix Digest - ISSUE 10.2023

Hey folks,

Welcome to this week's edition of Data Phoenix Digest! This newsletter keeps you up-to-date on the news in our community and summarizes the top research papers, articles, and news, to keep you track of trends in the Data & AI world!

Be active in our community and join our Slack to discuss the latest news of our community, top research papers, articles, events, jobs, and more...

📣
Want to promote your company, conference, job, or event to the Data Phoenix community of Data & AI researchers and engineers? Click here for details.

Data Phoenix community news

AI Events Calendar

We are happy to announce that our new AI events calendar has launched with a weekly newsletter. The calendar is already filled with exciting and valuable events, and the first issue of our newsletter, featuring a selection of upcoming events for the week, will kick off this weekend. If you're organizing webinars, workshops, meetups, conferences, or hackathons, add them to our calendar, and we'll gladly help spread the word to our community.

Upcoming webinars:

Connecting Large Language Models with embeddings and semantic search on your own data has become widely popular. But how does this work in other languages and across languages? Join me for this talk why multilingual semantic search is amazing, how respective models are trained, and new use-cases this unlocks.

Rise in the use of synthetic data for regulated industries

Synthetic data is evolving and becoming extremely important for organizations. This session will uncover facts about synthetic data. It will also talk about some of the most impactful use cases associated with it, along with challenges that companies face while harnessing its power.


How to use LLMs to Interface with Multiple Data Sources

Following emerging Large Language Model Operations (LLM Ops) best practices in the industry, you’ll learn about the key technologies that enable Generative AI practitioners like you to build complex LLM applications. Specifically, we’ll deep dive on “data frameworks” like LlamaIndex, and we’ll demonstrate how to create state-of-the-art hierarchical indexes from different data sources. During the event, we will also show you how another commonly known LLM Ops framework (LangChain) underlies much of the functionality of LlamaIndex. All demo code will be provided via GitHub links during and after the event!

Video records of past events:

📣
Don't miss out! Subscribe to our YouTube channel now and be the first to receive notifications about the video records of past events and other valuable content to help you stay ahead!

Summary of the top articles and papers

Articles

Time-Series Forecasting: Deep Learning vs Statistics — Who Wins?
This article provides a comprehensive and unbiased view on the application of Deep Learning in the field of Natural Language Processing (NLP) and time-series forecasting, particularly focusing on the use of pre-trained transformers. Check it out!

Accelerating Stable Diffusion Inference on Intel CPUs
Recently, the HuggingFace team has introduced the latest generation of Intel Xeon CPUs (code name Sapphire Rapids). In this article, they demonstrate different techniques to accelerate Stable Diffusion models on Sapphire Rapids CPUs.

WALTS: Walmart AutoML Libraries, Tools and Services
WALTS is an enterprise-scale AutoML framework designed to meet the rising demand of employing ML for business. In this article, the authors elaborate on how they explore models from a pool of candidates and test the selected one with a business use-case.

Introduction to mypy
The article explores how mypy, by adding type annotations and checks, can help discover bugs at compile-time, thereby enhancing the efficiency of Python projects. It guides readers from beginners to a solid understanding of mypy through the use of various examples.

Papers & projects

Llama 2: Open Foundation and Fine-Tuned Chat Models
This article introduces Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) with a parameter range of 7 billion to 70 billion. The optimized Llama 2-Chat models for dialogue achieve superior performance compared to open-source chat models on multiple benchmarks. The authors comprehensively describe their fine-tuning approach, safety enhancements, and human evaluations, aiming to facilitate community engagement and responsible development of LLMs.

MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset
This work introduces Volume Fusion (VF), a novel self-supervised learning strategy for 3D segmentation model pretraining using unannotated medical images. VF fuses random patches from foreground and background sub-volumes, leveraging fusion coefficients as self-supervised segmentation targets. The proposed model, pretrained on 110k unannotated 3D CT volumes, demonstrates superior performance compared to training from scratch and state-of-the-art self-supervised methods on various downstream segmentation tasks involving head and neck organs, as well as thoracic/abdominal organs.

A Survey of Large Language Models
In this survey, the authors review the recent advances of LLMs. In particular, they focus on four major aspects of LLMs: pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, they also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
LLaMA-Adapter is a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs.

Animated Drawings
Animated Drawing is a system that automatically animates children's drawings of the human figure, is robust to the variance inherent in these depictions, and is simple enough for anyone to use. Here you can find the Animated Drawings Demo, a freely available public website that has been used by millions of people around the world.

🤗
If you enjoy our work, we would greatly appreciate your support by sharing our digest with your friends on Twitter, LinkedIn, or Facebook using the hashtag #dataphoenix. Your help in reaching a wider audience is invaluable to us!