Data Phoenix Digest - ISSUE 55

Charity AI webinar about synthetic data, introduction to DVC and MLflow for experiment tracking, vision transformer model, neural density-distance fields, 40 open-source audio datasets for ML, MDM, DiffDock, YOLO-FaceV2, news, tools, and more.

Dmitry Spodarets
Dmitry Spodarets

Data Phoenix Events reminds you that on October 19, we will organize our second charity AI webinar. The topic -"The promising role of synthetic data to enable responsible innovation".

You will leave this webinar with a better understanding of how synthetic data actually functions, how to use open-source libraries to evaluate data quality at scale, and the metrics needed to gauge the quality of the resulting synthetic data.

Speaker: Shalini Kurapati - Co-founder and CEO Clearbox AI.

Price: Free (donate). All donations go to the KOLO ( it is a project was created by Ukrainian technology industry experts to help Ukraine fight the war against Russia by supplying high-tech equipment to the front lines)

If you have interesting topics that you would like to share with the world in our webinars, we would appreciate it! Your participation could help save someone's life!

NEWS

ARTICLES

5 Tools That Will Help You Setup Production ML Model Testing
Without testing, it becomes difficult to deploy highly accurate ML models into production in the real world. Do not compromise quality — make sure that you test your models! Here’s how.

Introduction to DVC and MLflow for Experiment Tracking
Any data science work should include experiment tracking. If you do not do it, you will simply lose track of what is going on with your models. Find out how to implement DVC and MLflow to this.

Deploying a Sentiment Analysis Text Classifier with FastAPI
FastAPI provides an easy way to create APIs. By combining it with Cohere's LLMs, you can build custom API endpoints to access state-of-the-art NLP. Learn how in this article!

Diffusion Models Are Autoencoders
Do you know what a venerable autoencoder is? And how can diffusion models help you perform any task that requires producing perceptual signals? You are about to find out!

Graph Neural Networks with PyG on Node Classification, Link Prediction, and Anomaly Detection
GNNs is an ML algorithm designed for graph-structured data. In this article, we will review their code implementations on major graph problems along with all the basics of GNNs. Dig in!

How Undesired Goals Can Arise with Correct Rewards
AI can pursue undesired goals, which should be studied to avoid mistakes in the future. In this article, you will learn about some of such failed goals and find out how to resolve them.

AI Music Generators Could Be a Boon for Artists — But Also Problematic
AI can be creative. Art, music, videos — it can generate all. And the question is, is it good or bad for creative people? Read this article to explore the problem in more detail!

The Vision Transformer Model
Vision Transformers (ViT) are great. In this tutorial, you will discover the architecture of the Vision Transformer model, and its application to the task of image classification. Dig in!

Introduction to Infrared Vision: Near vs. Mid-Far Infrared Images
In this tutorial, you will learn the basics of infrared imaging, including what an infrared image is, types of infrared cameras, and what they are useful for. Check it out! (Part 1 of 4)

PAPERS

YOLO-FaceV2: A Scale and Occlusion Aware Face Detector
In this paper, the authors propose a real-time face detector based on the one-stage detector YOLOv5, named YOLO-FaceV2. They present various advanced modules for the task.

Neural Density-Distance Fields
In this paper, the authors propose a novel 3D representation that reciprocally constrains the distance and density fields, called Neural Density-Distance Field (NeDDF). Learn more!

Exploring Plain Vision Transformer Backbones for Object Detection
In this paper, the authors explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. Their detector, ViTDet, can compete with previous leading methods.

MDM: Human Motion Diffusion Model
The authors introduce a carefully adapted classifier-free diffusion-based generative model for the human motion domain that enables different modes of conditioning and generation tasks.

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Predicting the binding structure of a small molecule ligand to a protein is critical to drug design. This paper describes a new method of solving the generative modeling problem with DiffDock.

CODE & TOOLS

Stable Dreamfusion
Stable Dreamfusion is a PyTorch implementation of the text-to-3D model Dreamfusion powered by the Stable Diffusion text-to-2D model. Note that this project is a work-in-progress.

DATASETS

40 Open-Source Audio Datasets for ML
The DagsHub’s Hacktoberfest challenge is over. And it means that now you can access 40 new audio datasets, publicly available and parseable on DagsHub. Check out this amazing collection!

Digest