Data Phoenix Digest

Data Phoenix Digest - 08.07.2021

ODS.ai Odessa meetup, what is MLOps, advancing AI to make shopping easier for everyone, ChaCha for Online AutoML, AutoFormer, DivergentNets, MultiBERTs, papers from CVPR 2021 and more...

by Dmitry Spodarets

Updated July 08, 2021

Are you tired of lockdowns? We for sure are!

We at Data Phoenix, together with Autodoc and VITech, are excited to invite you to an offline meetup of Odesa’s Open Data Science community that’s going to take place July 14, 6:30 PM — 9 PM. We’ll cover such topics as data management, object detection, and more. Most importantly, though, we’re going to network for real — that’s what we’ve been missing all these long quarantine months, right?

Because the number of seats is limited, we’re going to have an online session as well. The talks will be in Russian.

The event is free, but registration is required. So kindly register right away!

NEWS

What’s new this week?
Facebook releases AI that forecasts COVID-19 spread. Another batch of AI advancements in healthcare and physics. China’s smart cities and the dangers of AI-driven surveillance.

COVID-19 remains a burning issue across the globe (despite the rise in vaccinations). To help tackle the problem, the team at Facebook AI has released a new, state-of-the-art neural relational autoregression network for highly accurate COVID-19 forecasting.

The pandemic has highlighted the importance of AI for healthcare. New AI solutions for smart healthcare are released almost every day. For instance, AI-driven insights are used to create digital twins and cure cancer. The other side of the coin, though, is unethical decisions made by AI systems.

Using AI in agriculture can also offer practical solutions to the challenges threatening global food security. So-called “precision agriculture”, as estimated by researchers at the UK’s University of Birmingham, can eliminate hunger for millions of people globally by ensuring safe and sustainable agriculture.

AI keeps helping humanity learn more about the universe. Japanese astronomers have developed a new AI technique to remove noise in astronomical data due to random variations in galaxy shapes. The method is consistent with the currently accepted models of the Universe. This is a powerful new tool for analyzing big data from current and planned astronomy surveys.

The teams from China took first and second place in all five categories at the international AI City Challenge, a competition designed to develop AI for real-world scenarios like counting cars traveling through intersections or spotting accidents on freeways. China’s success demonstrates the country’s commanding position in building smart cities, and also raises concerns about mass surveillance.

ARTICLES

Comparing Random Forest and Gradient Boosting
In this article, you’ll find an overview of similarities and differences of Random Forest and Gradient Boosting Machine algorithms. Bear in mind that what was summarised in this post is generic; make sure that you look into specifics for your implementations.

Advancing AI to Make Shopping Easier for Everyone
GrokNet is Facebook AI’s breakthrough product recognition system that is used as part of the world’s largest shoppable social media platform, where billions of items can be bought and sold in one place. Learn how AI & ML of this revolutionary product work.

A Bayesian Analysis of Lego Prices in Python with PyMC3
In this article, we’ll explore analyzing Lego pricing data scraped from brickset.com, to build several more formal models of the price of Lego sets based on their size. This is the second post of the series; the first one is available here.

HuBERT: Self-Supervised Representation Learning for Speech Recognition, Generation, and Compression
HuBERT is Facebook AI’s new approach for learning self-supervised speech representations in audio. It matches or surpasses the SOTA approaches for speech representation learning for speech recognition, generation, and compression.

Harnessing the Power of Machine Learning to Fuel the Growth of Halodoc
In this article, you’ll learn about Halodoc, a secure health-tech platform with a mission to simplify access to healthcare, and how its Data Science team leverages machine learning to build data products for digital outpatients, insurance, and pharmacy.

What Is MLOps? — Everything You Must Know to Get Started
MLOps is a buzzword right now. Everyone talks about it; everybody wants to implement it and drive MLOps transformations. If you’re interested in what MLOps is too, this article will provide a scoop of ML systems development lifecycle and explain why you need MLOps.

To Retrain, or Not to Retrain? Let's Get Analytical About ML Model Updates
In this ML 101 article, you’ll find answers to questions like, “How often should I retrain a model?”, “Should I retrain the model now?”, and “Should I retrain, or should I update the model?”. Dig in for an easy but important piece to read!

An Introduction to Object Detection with Deep Learning
In this article, you’ll learn the basics of using object-detection deep learning networks. It features a review of CNNs, object detection datasets, R-CNN models, and YOLO. If you’re looking for a review of deep learning architectures for object detection, this one's for you.

Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps
In this article, you’ll explore how to use NVIDIA Merlin, an application framework that accelerates all phases of recommender system development on NVIDIA GPUs, to implement a complete MLOps pipeline.

A Discourse on Reinforcement Learning [Part 1]
This is the first of the 3-article series “A Discourse on Reinforcement Learning” that kicks off with a holistic overview of Reinforcement Learning with an expansive setting. Save the article not to miss parts 2 and 3 about more advanced RL topics.

Habitat 2.0: Training Home Assistant Robots with Faster Simulation and New Benchmarks
In this exploratory article by Facebook AI, you’ll learn about Habitat 2.0, a next-generation simulation platform that lets AI researchers teach machines not only to navigate through photo-realistic 3D virtual environments but also to interact with objects.

Vectorization Techniques in NLP [Guide]
This comprehensive guide by the Neptune team will help you dig deep into Natural Language Processing and explore all the main branches of word embeddings, starting from naive count-based methods to sub-word level contextual embeddings.

PAPERS

AutoFormer: Searching Transformers for Visual Recognition
AutoFormer is a new one-shot architecture search framework for vision transformer search. It entangles the weights of different blocks in the same layers during supernet training. The trained supernet allows thousands of subnets to be well-trained. Their performance with weights inherited from the supernet is comparable to those retrained from scratch.

DivergentNets: Medical Image Segmentation by Network Ensemble
In this paper, the team of researchers explore new methods of detection of colon polyps with machine learning. They propose DivergentNets, an ensemble of such well-known segmentation models as UNet++, FPN, DeepLabv3, and DeepLabv3+, to produce more generalizable medical image segmentation masks.

PlanSys2: A Planning System Framework for ROS2
In this paper, the researchers reveal the ROS2 Planning System (PlanSys2), a framework for symbolic planning that incorporates novel approaches for execution on robots working in demanding environments. PlanSys2 aims to be the reference task planning framework in ROS2, the latest version of the {\em de facto} standard in robotics software development.

The MultiBERTs: BERT Reproductions for Robustness Analysis
In this paper, the international team of researchers introduce MultiBERTs: a set of 25 BERT-base checkpoints, trained with similar hyper-parameters as the original BERT model but differing in random initialization and data shuffling. The aim is to enable researchers to draw robust and statistically justified conclusions about pre-training procedures.

ChaCha for Online AutoML
ChaCha is the algorithm for making an online choice of hyperparameters in online learning settings. It handles the process of determining a champion and scheduling a set of live challengers over time based on sample complexity bounds. ChaCha provides good performance across a wide array of datasets when optimizing over featurization and hyperparameter decisions.

Pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP Tasks
In this paper, Juan Manuel Perez et al. present pysentimiento, a multilingual Python toolkit for Sentiment Analysis and other Social NLP tasks. This open-source library brings state-of-the-art models for Spanish and English in a black-box fashion, allowing researchers to easily access these techniques.

BOOKS

Introduction to Modern Statistics
This is the first edition of “Introduction to Modern Statistics” by Mine Cetinkaya-Rundel and Johanna Hardin. You’ll learn all the basics you need, from exploratory data analysis and regression modelling to regression modelling and inferences. The book is available for free, but you can also purchase a print version if you want to support the project.

EVENT MATERIALS

Selection of Free Papers from CVPR 2021
Here you’ll find the collection of CVPR 2021 papers, provided by the Computer Vision Foundation. They are identical to the accepted versions; the final published version of the proceedings is available on IEEE Xplore.

PROJECTS

Kats by Facebook Research
Kats is a toolkit released by Facebook's Infrastructure Data Science team, designed to help engineers analyze time series data in Python. It provides a lightweight, easy-to-use, and generalizable framework to perform time series analysis tasks, such as detection, forecasting, feature extraction and embedding, multivariate analysis, etc.

by Dmitry Spodarets

Updated July 08, 2021