Data Phoenix Digest - 05.08.2021

Webinar "The A-Z of Data: Introduction to MLOps", chip design with ML, KNN algorithm, model health assurance at LinkedIn, YOLOX, CoBERL, SynLiDAR, PaddleSeg, Real-ESRGAN, courses, competitions, jobs, and more...

Dmitry Spodarets

Webinar "The A-Z of Data: Introduction to MLOps"

The Data Phoenix team invites you all on August 17 to the first of our series of webinars entitled "The A-Z of Data". During the pilot webinar — "The A-Z of Data: Introduction to MLOps" — we will explore what MLOps is, MLOps principles and best practices, major tools for MLOps implementation, and several architecture implementations. We will start with a basic ML lifecycle and move forward to best practices of building complicated, fully automated MLOps pipelines.

Speaker: Dmitry Spodarets — founder and chief editor of Data Phoenix, head of R&D and ML competency at VITech; active participant of the Open Data Science community.


What's new this week?

AI-driven HR. Chip design with Machine Learning. AI monopolies to thrive in the US. Adopting AI with Andrew Ng. Robots playing basketball. Breakthroughs in AI legal status and physics.

  • HR is rife with complex, time-consuming processes. AI-driven HR may change this by automating and streamlining various HR tasks, from hiring and onboarding to scheduling and benefits management, and all the way to termination and access control. Learn about AI in HR.
  • Cadence, one of the chief tool builders for chip designers, uses machine learning to revolutionize chip design. The company employs reinforcement learning to find the perfect balance between power, performance, and area in chips. Push to explore AI-driven chip design.
  • AI is getting more and more monopolized. A handful of US tech companies, including Amazon,  Alphabet, Facebook, and Netflix, along with Chinese Baidu and Alibaba, are responsible for $2 of every $3 spent globally on AI. Discover a four-step antitrust strategy to tackle the problem.
  • Small and medium-sized businesses (and even some corporations) in industries such as manufacturing, agriculture, and healthcare still need to find ways to make AI work for them. Learn from one of the AI industry leaders, Andrew Ng, how to adopt AI in your organization.
  • Tokyo Olympics are going on without spectators... and many robots that were going to be presented by Japan's tech giants like Toyota and Panasonic. One exception is CUE, an incredible AI robot created by Toyota, that can play basketball. Learn more about AI robot players.
  • An Australian Court has decided that artificial intelligence can be recognized as an inventor in a patent submission. IP lawyer says decision is bad because the last thing we need is robot patent trolls. What do you think? Is patenting AI good or bad?
  • Researchers at the Department of Energy's SLAC National Accelerator Laboratory use machine learning to optimize the performance of particle accelerators by teaching algorithms the basic physics principles behind accelerator operations. Learn more.


Customer Support Automation Platform at Uber
In this comprehensive article, you'll learn how Uber manages customer support by using it AI/ML-powered automation platform. Dig in to explore the technological belly of their procedures, architecture, and infrastructure for democratizing the customer support policies

Model Health Assurance at LinkedIn
Learn about Pro-ML, LinkedIn's centralized ML platform that hosts hundreds of AI models running in production, helping ensure a world-class product experience to its customers and members. Health assurance (HA) is a key component of the platform.

In-depth Guide to ML Model Debugging and Tools You Need to Know
ML systems are trickier to test than traditional software. In this guide, you'll learn some debugging strategies for ML models and the tools to implement them. Model interpretability will also be discussed, showing how to trace the path of errors from the input to the output.

The KNN Algorithm – Explanation, Opportunities, Limitations
K Nearest Neighbor (KNN) is a very simple, easy-to-understand, and versatile machine learning algorithm. In this article, you'll find all the details and basics you need to start using it.

MLOps with MLflow and Amazon SageMaker Pipelines
In this step-by-step guide, you'll learn how to automate an end-to-end ML lifecycle using MLflow and Amazon SageMaker Pipelines. A Random Forest model is used as an example.

Scaling Deep Learning Workloads with PyTorch / XLA and Cloud TPU VM
Google's team talks about the challenges of scaling DL jobs to distributed training settings, using Cloud TPU VM interface, and streaming training data from GCS) to PyTorch / XLA models.

100 Days D3 Dataviz
In this article, you can follow Sandra Becker on her 100 days challenge to use D3.js/Observable to visualize data from her teaching work and presentations.


YOLOX: Exceeding YOLO Series in 2021
The paper featuring some improvements that were made to YOLO series, to create a new high-performance detector called YOLOX. It  includes the results of testing the detector.

You Do Not Need a Bigger Boat: Recommendations at Reasonable Scale in a (Mostly) Serverless and Open Stack
The authors argue that immature data pipelines are preventing practitioners from leveraging the latest research on recommender systems. Check out what they propose with Serverless.

Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation
The authors investigate the prediction discriminability and diversity by studying the structure of the classification output matrix of a randomly selected data batch. They argue that their method can boost the adaptation accuracy and robustness under three typical domain adaptation scenarios.

CoBERL: Contrastive BERT for Reinforcement Learning
Contrastive BERT for RL (CoBERL) is an agent that combines a new contrastive loss and a hybrid LSTM-transformer architecture to tackle the challenge of data efficiency. It improves performance across the full Atari suite, a set of control tasks and a challenging 3D environment.

SynLiDAR: Learning From Synthetic LiDAR Sequential Point Cloud for Semantic Segmentation
SynLiDAR is a synthetic LiDAR point cloud dataset that contains large-scale point-wise annotated point cloud with accurate geometric shapes and comprehensive semantic classes, which the authors used to design PCT-Net, to narrow down the gap with real-world point cloud data.

PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation
PaddleSeg is a high-efficient development toolkit for image segmentation that aims to help both developers and researchers in the whole process of designing segmentation models, training models, optimizing performance and inference speed, and deploying models.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
In this paper, the authors we extend the powerful ESRGAN to a practical restoration application (Real-ESRGAN), which is trained with pure synthetic data. Specifically, a high-order degradation modeling process is introduced to better simulate complex real-world degradations.


Natural Language Processing [Huggingface Course]
During this course, you'll learn the basics of NLP using libraries from the Hugging Face ecosystem — Transformers, Datasets, Tokenizers, and Accelerate — as well as the Hugging Face Hub.


Shifts Challenge: Robustness and Uncertain­ty under Real-World Distributional Shift
NeurIPS 2021 Shifts Challenge raises awareness of distributional shifts. The goal is to develop models robust to distributional shifts and to detect such shifts via uncertainty in predictions.


Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!