Data Phoenix Digest - 09.09.2021

AI preventing security threats, a deep dive on ARIMA models, a gentle introduction to GNN, how to detect, evaluate, and visualize historical drifts in the data, multiplying matrices without multiplying, books, videos, jobs, and more...

Dmitry Spodarets

The Data Phoenix Events team is happy to remind you that in September, we will organize the following events:

Register for all events or for some specific which you like, freedom of choice is yours! We would love to see you anytime!

NOTE: If you missed any of our previous webinars, they are available on our YouTube channel. Please watch and share your feedback in the comment's section.


What's new this week?

AI preventing security threats. AI-led developments in additive manufacturing. Neural networks predicting uncertainty in molecular energies. Robotic dogs and predictions of Alzheimer’s disease.

Funding news:

  • Databricks Raises $1.6 Billion Series H Investment at $38 Billion Valuation
  • Explosion Raises a $6 Million Series A on a $120 Million Valuation
  • Mobius Labs Raises €5.2 Million in Series A led by Ventech Europe


Anomaly Detection with TensorFlow Probability and Vertex AI
In this article, you'll learn how Google's AI team uses an ML solution for anomaly detection on Vertex AI to automate these laborious processes of building time series models.

Use a SageMaker Pipeline Lambda Step for Lightweight Model Deployments
In this article, you'll explore the Lambda step and how you can use it to add custom functionality to your ML pipelines. Also, the specifics of using the Lambda step for lightweight model deployments.

How to Detect, Evaluate, and Visualize Historical Drifts in the Data
Analyzing historical drift in data is a nice way of understanding how your data changes, to choose monitoring thresholds. Check out this tutorial for details.

Complete Guide to A/B Testing Design, Implementation and Pitfalls
In this guide, the author covers a wide range of topics on end-to-end A/B testing for your Data Science experiments, with examples and Python implementation.

A Deep Dive on ARIMA Models
In this post, you'll take a deep dive into the ARIMA family of time series forecasting models, from foundational theory of forecasting models to training a SARIMAX model in Python.

A Gentle Introduction to Graph Neural Networks
In this article, the authors explore the components needed for building a graph neural network - and motivate the design choices behind them. Check out the reference section too.


SummerTime: Text Summarization Toolkit for Non-Experts
In this paper, the authors present SummerTime, a toolkit for text summarization, including various models, datasets and evaluation metrics, for a full spectrum of summarization-related tasks.

Materials Fingerprinting Classification
In this paper, the authors propose a machine learning algorithm coupled with topological data analysis that provides an easy way to extract structural information from APT datasets.

Accelerating Materials Discovery with Bayesian Optimization and Graph Deep Learning
The authors show that Bayesian optimization with symmetry constraints using a graph deep learning energy model can be used to perform "DFT-free" relaxations of crystal structures.

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Simple Visual Language Model (SimVLM) is a minimalist pretraining framework that reduces the training complexity and is trained end-to-end with a single prefix language modeling objective.

Multiplying Matrices Without Multiplying
The authors introduce a learning-based algorithm for approximating matrix multiplies that runs 100× faster than exact matrix products and 10× faster than current approximate methods.

Deep Learning for Distinguishing Normal Versus Abnormal Chest Radiographs and Generalization to Two Unseen Diseases Tuberculosis and COVID-19
The paper presents the AI system trained using a large dataset containing a diverse array of CXR abnormalities that is capable of generalizing to new patient populations and unseen diseases.


Analyzing US Census Data: Methods, Maps, and Models in R
This book illustrates the utility of R for handling numerous research and applied fields, allowing Census data users to manage their projects in a single computing environment.


Defining Success: Metrics and KPIs
Adam Sroka is Head of Machine Learning Engineering at Origami Energy. Listen to his talk about metrics and KPIs for data/ML/learning at Data.Talks Club.


Vision Transformer - PyTorch
A simple tool to implement Vision Transformer, enabling engineers to achieve SOTA in vision classification with only a single transformer encoder, in PyTorch.


Make Your First ML Chatbot
A conversation with Rachael Tatman, a developer advocate at RASA, about chatbots, and designing and developing ML chatbots for various tasks. Check out GitHub in the description.

Intro to Graph Neural Networks (ML Tech Talks)
Petar Veličković, Senior Research Scientist at DeepMind, gives an introductory presentation and Colab exercise on graph neural networks (GNNs).


Common Objects in 3D
CO3D is a large-scale data set by Facebook AI. It comprises a total of 1.5 million frames from nearly 19,000 real-world videos capturing objects from 50 categories.


Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!