Data Phoenix Digest - ISSUE 52

DALL-E is now available in beta, and DALL-E 2 prompt book, introduction to diffusion models for ML, distribute your PyTorch model in less than 20 lines of code, k-means mask transformer, explaining chest X-ray pathologies in natural language, FIGS, NU-Wave, VQAD, datasets, tools, courses, and more.

Dmitry Spodarets
Dmitry Spodarets


By answering some questions related to your experience, skills, and toolset, you will help us determine the industry's state in 2022 and prepare the report.


Track your ML experiments end to end with Data Version Control and Amazon SageMaker Experiments
This post walks you through an example of how to track experiments across code, data, artifacts, and metrics by using Amazon SageMaker Experiments and Data Version Control (DVC).

Text Embeddings Visually Explained
Text Embeddings allow you to turn unstructured text data into a structured form. In this post, you’ll learn about their basics, use cases, customizations, and finetuning. Check it out!

Introduction to Diffusion Models for Machine Learning
Diffusion models are a conceptually simple and elegant approach to the problem of generating data. In this guide, you’ll learn everything you need to know about them.

Distribute Your PyTorch Model in Less Than 20 Lines of Code
In this guide, you’ll find out how to distribute a minimal training pipeline on more than one GPU. A simple, practical guide with only 15 lines of code to distribute your pipeline.

Inside NLLB-200, Meta AI’ New Super Model that Achieved New Milestones in Machine Translations Across 200 Languages
Meta AI’s NLLB-200 is one of the most impressive attempts to make AI more inclusive. It achieved impressive milestones outperforming state-of-the-art models in both 100 and 200 languages.

Accelerating and Scaling Temporal Graph Networks on the Graphcore IPU
In this comprehensive post, the authors explore the application of TGNs to dynamic graphs of different sizes and study the computational complexities of this class of models.

Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation
Recent studies demonstrate that, with proper input representation and hyper-parameter tuning, multi-agent PG can achieve strong performance. Find out why!

FIGS: Attaining XGBoost-level performance with the interpretability and speed of CART
FIGS is a new method for fitting an interpretable model that takes the form of a sum of trees. Real-world experiments show that FIGS can effectively adapt to a wide range of structure in data.


k-means Mask Transformer
In this paper, the authors reveal a k-means Mask Xformer (kMaX-DeepLab) for segmentation tasks, which improves the state-of-the-art and enjoys a simple and elegant design. Learn more!

Masked Autoencoders that Listen
The authors study an extension of image-based Masked Autoencoders. Audio-MAE encodes audio spectrogram patches with a high masking ratio, feeding the tokens through encoder layers.

HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
The authors demonstrate how to implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption.

Zero-shot Cross-lingual Transfer is Under-specified Optimization
Pretrained multilingual encoders often produce unreliable models that exhibit high performance variance on the target language. The authors provide a way to solve this issue.

Synergistic Self-supervised and Quantization Learning
The authors propose a method, synergistic self-supervised and quantization learning (SSQL), to pretrain quantization-friendly self-supervised models facilitating downstream deployment.

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Recent text-to-image generation methods have improved the generated image fidelity and text relevancy, but several gaps remain. The authors propose a novel method that addresses them.

Re2G: Retrieve, Rerank, Generate
Re2G combines both neural initial retrieval and reranking into a BART-based sequence-to-sequence generation. Learn more about this novel approach to retrieve, rerank, generate!

Explaining Chest X-ray Pathologies in Natural Language
The authors introduce natural language explanations (NLEs) for predictions made on medical images. NLEs are human-friendly and enable the training of intrinsically explainable models.


NU-Wave: A Diffusion model for Neural Audio Upsampling
NU-Wave is the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs. Check out the samples for yourself!

Improving GAN Equilibrium by Raising Spatial Awareness
EqGAN-SA is an interactive interface that enables the interactive spatial editing of the output images. The introduced spatial awareness facilitates interactive editing over the output synthesis.

Variable Bitrate Neural Fields
VQAD is a dictionary method for compressing feature grids, reducing their memory consumption by 100x and permitting a multiresolution representation which is useful for out-of-core streaming.


DeepMind Educational Resources
The repository that contains a massive collection of educational tutorials for teaching the basics of machine learning to various audiences. The tutorials are presented as notebooks.


MLEM is a tool that allows anyone to package, deploy, and serve ML models. It supports real-time serving and batch processing, and allows creating a Model Registry out of any Git repository!


Diabetic Macular Edema VQA Dataset
Medical VQA dataset built from the IDRiD and eOphta datasets, and contains both healthy and unhealthy fundus images. It can be used for general VQA purposes.

Computer Vision Datasets
A comprehensive collection of datasets containing computer vision data, including images of human face, human body, gestures, vehicles, street views, OCR, and more.