Data Phoenix Digest - ISSUE 48

MLOps with SageMaker, introduction to the YOLO family, multi GPU model training, neural 3D scene reconstruction with the manhattan-world assumption, learning to answer questions from millions of narrated videos, ConvMAE, ARTEMIS, EasyNLP, and more.

Dmitry Spodarets
Dmitry Spodarets


MLOps with SageMaker
In the first of a series of articles on MLOps, you’ll learn how to train models using popular frameworks (sklearn, pytorch, and transformers) with pre-configured containers in SageMaker.

Using Elyra to create Machine Learning pipelines on Kubeflow
Elyra is a framework that makes it easy to create pipelines and run them on existing pipeline platforms (Kubeflow Pipelines and Apache Airflow). Find out more about it!

Using Kaggle in Machine Learning Projects
In this detailed post, you’ll learn about the basics of Kaggle, including the ways of using it as part of your ML pipeline and using its API’s Command Line Interface (CLI). Enjoy!

Introduction to the YOLO Family
YOLO (You Only Look Once) is a single-stage object detector, enabling higher speed and accuracy of object detection in a variety of use cases. Check out the list to choose the one for you!

An Illustrated Tour of Applying BERT to Speech Data
In this article, you’ll learn about the wav2vec 2.0 and HuBERT  models that allow us to more efficiently apply a BERT-like approach to speech or acoustic data.

Testing Container Images Against Multiple Platforms with Container Canary
Container Canary is a new open-source tool that captures requirements for container images and automatically test against them, allowing you to use them within your software environment.

Multi GPU Model Training: Monitoring and Optimizing
In this article, you’ll find out about multi GPU training with Pytorch Lightning, best practices for optimizing the training process, and ways of monitoring the usage of the GPUs.


ConvMAE: Masked Convolution Meets Masked Autoencoders
The researchers demonstrate how ConvMAE framework enables multi-scale hybrid convolution-transformer to learn more discriminative representations via the mask auto-encoding scheme.

Neural 3D Scene Reconstruction with the Manhattan-world Assumption
This paper addresses reconstructing 3D indoor scenes from multi-view images. The authors integrate the planar constraints into the implicit neural representation reconstruction methods.

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective
The researchers use SPLADE, a sparse expansion-based retriever, to show how it benefits from dense models training by studying the effect of distillation, hard-negative mining, and pre-trained Language Model initialization.


Artemis: Articulated Neural Pets with Appearance and Motion Synthesis
ARTEMIS is a novel neural modeling and rendering pipeline for generating ARTiculated neural pets with appEarance and Motion synthesIS that enables interactive motion control, real-time animation, and photo-realistic rendering of furry animals.

Just Ask: Learning to Answer Questions from Millions of Narrated Videos
In this work, the authors demonstrate how to leverage a question generation transformer trained on text data to generate question-answer pairs from transcribed video narrations.


EasyNLP is an easy-to-use NLP development and application toolkit in PyTorch that uses scalable distributed training strategies to support a suite of NLP algorithms for various NLP applications.