Data Phoenix Digest

Data Phoenix Digest - 09.09.2021

AI preventing security threats, a deep dive on ARIMA models, a gentle introduction to GNN, how to detect, evaluate, and visualize historical drifts in the data, multiplying matrices without multiplying, books, videos, jobs, and more...

by Dmitry Spodarets

Updated September 09, 2021

The Data Phoenix Events team is happy to remind you that in September, we will organize the following events:

September 15 - Open Data Science Odessa Meetup #4
September 16 - Webinar "Re-usable pipelines for ML projects with DVC"
September 22 - Webinar "From research to the product with Hydrosphere"

Register for all events or for some specific which you like, freedom of choice is yours! We would love to see you anytime!

NOTE: If you missed any of our previous webinars, they are available on our YouTube channel. Please watch and share your feedback in the comment's section.

NEWS

What's new this week?

AI preventing security threats. AI-led developments in additive manufacturing. Neural networks predicting uncertainty in molecular energies. Robotic dogs and predictions of Alzheimer’s disease.

Researchers at Carnegie Mellon and the KAIST Cybersecurity Research Center have developed a new technique employing unsupervised learning to detect adversarial attacks.
Hyundai and Singapore university have agreed to run pilots on AI and additive manufacturing, to explore the use of 3D printers for tailoring components of electric vehicles.
MIT researchers have identified a new way to quantify the uncertainty in molecular energies predicted by neural networks by using “adversarial attacks.”
Utility company SA Power Networks is using the AI dog by Boston Dynamics to stroll the streets of suburban Adelaide to monitor power lines.
Researchers in Lithuania have developed a new, DL-based method that can predict the possible onset of Alzheimer’s disease with over 99% accuracy.

Funding news:

Databricks Raises $1.6 Billion Series H Investment at $38 Billion Valuation
Explosion Raises a $6 Million Series A on a $120 Million Valuation
Mobius Labs Raises €5.2 Million in Series A led by Ventech Europe

ARTICLES

Anomaly Detection with TensorFlow Probability and Vertex AI
In this article, you'll learn how Google's AI team uses an ML solution for anomaly detection on Vertex AI to automate these laborious processes of building time series models.

Use a SageMaker Pipeline Lambda Step for Lightweight Model Deployments
In this article, you'll explore the Lambda step and how you can use it to add custom functionality to your ML pipelines. Also, the specifics of using the Lambda step for lightweight model deployments.

How to Detect, Evaluate, and Visualize Historical Drifts in the Data
Analyzing historical drift in data is a nice way of understanding how your data changes, to choose monitoring thresholds. Check out this tutorial for details.

Complete Guide to A/B Testing Design, Implementation and Pitfalls
In this guide, the author covers a wide range of topics on end-to-end A/B testing for your Data Science experiments, with examples and Python implementation.

A Deep Dive on ARIMA Models
In this post, you'll take a deep dive into the ARIMA family of time series forecasting models, from foundational theory of forecasting models to training a SARIMAX model in Python.

A Gentle Introduction to Graph Neural Networks
In this article, the authors explore the components needed for building a graph neural network - and motivate the design choices behind them. Check out the reference section too.

PAPERS

SummerTime: Text Summarization Toolkit for Non-Experts
In this paper, the authors present SummerTime, a toolkit for text summarization, including various models, datasets and evaluation metrics, for a full spectrum of summarization-related tasks.

Materials Fingerprinting Classification
In this paper, the authors propose a machine learning algorithm coupled with topological data analysis that provides an easy way to extract structural information from APT datasets.

Accelerating Materials Discovery with Bayesian Optimization and Graph Deep Learning
The authors show that Bayesian optimization with symmetry constraints using a graph deep learning energy model can be used to perform "DFT-free" relaxations of crystal structures.

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Simple Visual Language Model (SimVLM) is a minimalist pretraining framework that reduces the training complexity and is trained end-to-end with a single prefix language modeling objective.

Multiplying Matrices Without Multiplying
The authors introduce a learning-based algorithm for approximating matrix multiplies that runs 100× faster than exact matrix products and 10× faster than current approximate methods.

Deep Learning for Distinguishing Normal Versus Abnormal Chest Radiographs and Generalization to Two Unseen Diseases Tuberculosis and COVID-19
The paper presents the AI system trained using a large dataset containing a diverse array of CXR abnormalities that is capable of generalizing to new patient populations and unseen diseases.

BOOKS

Analyzing US Census Data: Methods, Maps, and Models in R
This book illustrates the utility of R for handling numerous research and applied fields, allowing Census data users to manage their projects in a single computing environment.

PODCASTS & INTERVIEWS

Defining Success: Metrics and KPIs
Adam Sroka is Head of Machine Learning Engineering at Origami Energy. Listen to his talk about metrics and KPIs for data/ML/learning at Data.Talks Club.

CODE & TOOLS

Vision Transformer - PyTorch
A simple tool to implement Vision Transformer, enabling engineers to achieve SOTA in vision classification with only a single transformer encoder, in PyTorch.

VIDEOS

Make Your First ML Chatbot
A conversation with Rachael Tatman, a developer advocate at RASA, about chatbots, and designing and developing ML chatbots for various tasks. Check out GitHub in the description.

Intro to Graph Neural Networks (ML Tech Talks)
Petar Veličković, Senior Research Scientist at DeepMind, gives an introductory presentation and Colab exercise on graph neural networks (GNNs).

DATASETS

Common Objects in 3D
CO3D is a large-scale data set by Facebook AI. It comprises a total of 1.5 million frames from nearly 19,000 real-world videos capturing objects from 50 categories.

JOBS

Lead Computer Vision Engineer, SoftServe, Lviv, Kyiv, Remote ...
Senior Data Scientist / ML Engineer, Xenoss, Kyiv, Kharkiv, Odesa, Remote ...
ML Engineer, Scalarr, Kyiv, Kharkiv, Ukraine (Remote)
Machine Learning Engineer, Competera, Kyiv, Remote
Deep Learning Engineer, SQUAD, Kyiv, Lviv, Remote
Data Engineering Manager, Atlassian, Remote, United States
Data Engineer, Wikimedia Foundation, Remote
Senior Data Engineer, FreshBooks, Canada - Remote
Machine Learning Engineer, SEDNA, Remote - London, England, UK
Machine Learning Intern, Netflix, Los Gatos, California

Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!

by Dmitry Spodarets

Updated September 09, 2021