Data Phoenix Digest - 12.08.2021

Webinar "The A-Z of Data: Introduction to MLOps", new AI system that translates natural language to code, AI for drug delivery and dementia diagnosis, data monetization 101, introducing Triton, sketch your own GAN, StyleGAN-NADA, Droidlet, AutoTinyBERT, courses, tools, jobs, and more...

Dmitry Spodarets

The Data Phoenix team invites you all on August 17 to the first of our series of webinars entitled "The A-Z of Data". During the pilot webinar — "The A-Z of Data: Introduction to MLOps" — we will explore what MLOps is, MLOps principles and best practices, major tools for MLOps implementation, and several architecture implementations. We will start with a basic ML lifecycle and move forward to best practices of building complicated, fully automated MLOps pipelines.

Speaker: Dmitry Spodarets — founder and chief editor of Data Phoenix, head of R&D and ML competency at VITech; active participant of the Open Data Science community.


What's new this week?

New AI system that translates natural language to code. AI spending is through the nose. Also: AI for Alzheimer's, advancements in medical data, drug delivery, and dementia diagnosis.

  • OpenAI releases OpenAI Codex, an advanced AI system that translates natural language to code. The release is available through their API in private beta.
  • Companies could spend nearly $342 billion on AI software, hardware, and services in 2021. The spending is to rise to $500 billion by 2024.
  • C. Light Technologies is to design and build the AI solution that spots changes in eye motion to detect the earliest stage of Alzheimer’s disease.
  • Stanford’s AIMI center is expanding its free repository of datasets for researchers around the world. The valuable medical datasets are offered at no cost.
  • MIT researchers employ machine learning to find powerful peptides that could improve a gene therapy drug for Duchenne muscular dystrophy.
  • Scientists at Addenbrooke's Hospital are testing an AI system that may be capable of diagnosing dementia after processing a single brain scan.


Building Architectures that Can Handle the World’s Data
Perceiver is a general-purpose architecture that can process data including images, point clouds, audio, video, and their combinations. Learn more about this universal architecture!

Make a Rock-Solid ML Model Using Sklearn Pipeline
Most of the data is useless unless you perform a decent amount of transformation and preprocessing. In this article, you'll learn how to use Sklearn to design and build robust ML models.

Continuous Integration and Continuous Deployment (CI/CD) Tools for Machine Learning
Continuous integration and continuous deployment are standard software development practices. Naturally, they're used in ML as well. Check out this article to learn how.

How to Detect Seasonality, Outliers, and Changepoints in Your Time Series
Detecting patterns in time series can be challenging. In this article, the author demonstrates how you can use Kats to detect seasonality, changepoints, and outliers in data more efficiently.

Open MLOps: Open Source Production Machine Learning
Open MLOps is an open-source platform for building machine learning solutions; delivered as a set of terraform scripts and user guides to set up a complete MLOps platform in a Kubernetes cluster.

Creating a Modern, Open Source MLOps Stack at Home
In this article (part 1), you'll look at developing an MLOps framework. The author will show how you can plug different tools into the proposed framework to achieve better ML results.

Data Monetization 101
In this article, John Farrall, co-founder of 90 West Data, explains how he has been monetizing a unique panel of US Consumer Transaction Data. Several insides for data scientists inside!

Introducing Triton: Open-Source GPU Programming for Neural Networks
Triton is an open-source Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU code. Dig in to learn the details!

Pinot Real-Time Ingestion with Cloud Segment Storage
Uber explains how they added a deep store to Pinot real-time ingestion protocol to solve operational pains. Pinot’s Real-Time Ingestion is a distributed data sync protocol at its core.


Elastic Graph Neural Networks
In this paper, the authors introduce a family of GNNs (Elastic GNNs) based on ℓ1 and ℓ2-based graph smoothing and propose a novel and general message passing scheme into GNNs. Experiments demonstrate that Elastic GNNs obtain better adaptivity on benchmark datasets.

Alias-Free Generative Adversarial Networks
The synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. The authors trace its root cause and derive architectural changes that guarantee that unwanted information cannot leak into hierarchical synthesis.

Image Super-Resolution via Iterative Refinement
Chitwan Saharia et al. present SR3, an approach to image Super-Resolution via Repeated Refinement. SR3 adapts denoising diffusion probabilistic models to conditional image generation and performs super-resolution through a stochastic denoising process.

Sketch Your Own GAN
In this paper, Sheng-Yu Wang et al. present a method, GAN Sketching, for rewriting GANs with one or more sketches, to make GANs training easier for novice users. It allows to mold GANs to match shapes and poses specified by sketches while maintaining realism and diversity.

MixLacune: Segmentation of Lacunes of Presumed Vascular Origin
The authors present a two-stage approach to segment lacunes of presumed vascular origin: (1) detection with Mask R-CNN followed by (2) segmentation with a U-Net CNN. Data originates from Task 3 of the "Where is VALDO?" challenge and consists of 40 training subjects.

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
In this paper, you'll find out how StyleGAN-NADA converts a pre-trained generator to new domains using only a textual prompt and no training data, by leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models.

NeX: Real-Time View Synthesis with Neural Basis Expansion
NeX is a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce NeXt-level view-dependent effects in real time. The authors propose a hybrid implicit-explicit modeling strategy that produces state-of-the-art results.

AutoTinyBERT: Automatic Hyper-Parameter Optimization for Efficient Pre-Trained Language Models
In this paper, Yichun Yin et al. adopt one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters — to carefully design the techniques of one-shot learning and the search space to provide an adaptive development way of tiny PLMs.

Droidlet: Modular, Heterogenous, Multi-Modal Agents
Droidlet is a modular, heterogeneous agent architecture and platform. It allows exploiting both large-scale static datasets in perception and language and sophisticated heuristics often used in robotics; and it provides tools for interactive annotation.

Contextual Transformer Networks for Visual Recognition
The authors present a novel Transformer-style module, Contextual Transformer (CoT) block, for visual recognition. It fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and strengthens the capacity of visual representation

Double-Robust Two-Way-Fixed-Effects Regression for Panel Data
In this paper, Dmitry Arkhangelsky et al. propose a new estimator for the average causal effects of a binary treatment with panel data in settings with general treatment patterns. The approach augments the two-way-fixed-effects specification with the unit-specific weights.


Designing, Visualizing and Understanding Deep Neural Networks
A collection of lectures on Deep Learning delivered by Sergey Levine at UC Berkeley in 2020/21. In total, the course features 66 lectures, from the ML basics to policy gradients and meta-learning.


ZPY [by Zumo Labs]
ZPY is a tool that makes it easy to generate synthetic data. It simplifies the simulation creation process and provides an easy way to generate synthetic data at scale. Check out the tool now!


Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!