Data Phoenix Digest - ISSUE 32

NVIDIA's Omniverse and BMW, industrial data revolution, AI trends for 2022, K-Means clustering explained, AutoML, EditGAN, DScribe, CFPNet, StyleCLIPDraw, jobs, and more ...

Dmitry Spodarets
Dmitry Spodarets


What's new this week?

A win parade of MIT Lincoln Laboratory. NVIDIA's Omniverse and BMW. Industrial data revolution. The dangers of pervasive monitoring. AI trends for 2022, and more.

  • MIT Lincoln Laboratory has won nine R&D 100 Awards for 2021, including a life-detecting radar, a microscale motor, a quantum network architecture, and six others.
  • BMW has unveiled its plans to advance smart manufacturing by using NVIDIA's Omniverse, to simulate every aspect of its manufacturing operations.
  • Did the so-called data revolution actually happened in manufacturing? About ten years, the answer was obvious, but what experts think now? More data = more value?
  • According to the New Frontier: Artificial Intelligence at Work report, monitoring of workers and setting performance targets through algorithms is damaging employees’ mental health.
  • A fresh batch of AI trends for 2022 is released. Data marketplaces, metaverse, personalization, home robots, augmented creativity, and more. Check it out for yourself!
  • Materials advancement accelerates the advance of AI. Using a new-concept memtransistor structure-based AI semiconductor can greatly reduce the circuit density and driving energy.

Funding News

  • Grammarly, an AI-powered writing assistant developer, raises $200M in a round led by Baillie Gifford, valuing the company at $13 billion post-money.
  • Mantium closes $12.75M in seed round co-led by venture funds Drive Capital and Top Harvest. The startup also launches a cloud-based AI platform for building large language models.
  •, creator of first semantic 3D digital twin of entire Earth, announces $20M funding round led by Microsoft’s venture fund M12 and Point72 Ventures.


Solving Math Word Problems
Learn about a system trained by the OpenAI team that solves grade school math problems with  twice the accuracy of a fine-tuned GPT-3 model. It solves ~90% as many problems as real kids.

Hugging Face Transformer Inference Under 1 Millisecond Latency
Hugging Face has released “Infinity’’, a server product that performs inference at enterprise scale. It can perform Transformer inference at 1 millisecond latency on the GPU.

Design Patterns for Machine Learning Pipelines
ML pipeline design keeps evolving. In this article, you'll learn how these design patterns changed, what processes they went through, and their future direction.

ORDAINED: The Python Project Template
Creating Python packages can be annoying. Learn about a project boilerplate template for Python packages that can be used instead of copying a directory tree and doing find and replace.

K-Means Clustering Explained
In this article, you'll explore one of the most popular clustering algorithms — k-means. Let's find out how it works and how you can implement it from scratch and via sklearn.

AutoML: An Introduction Using Auto-Sklearn and Auto-PyTorch
AutoML is a valuable addition to any ML or data science practitioner’s toolbox, whether they use Auto-Sklearn/Auto-PyTorch, Auto-WEKA, some other package, or even roll their own solutions.

Deploy Fast and Scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker
Learn how to use the NVIDIA Triton Inference Server and SageMaker, the benefits of using Triton  containers, and find out how to deploy ML models using Triton and SageMaker.


EditGAN: High-Precision Semantic Image Editing
EditGAN is a novel method for high quality, high precision semantic image editing, allowing users to edit images by modifying their highly detailed part segmentation masks.

On the Frequency Bias of Generative Models
In this paper, the authors provide insights on measures against high-frequency artifacts and what makes them effective, with focus on a frequency bias.

DScribe: Library of Descriptors for Machine Learning in Materials Science
DScribe is a software package for ML that provides "descriptors" for atomistic materials simulations, to accelerate and simplify the application of ML for atomistic property prediction.

CFPNet: Channel-wise Feature Pyramid for Real-Time Semantic Segmentation
Ange Lou and Murray Loew propose a Channel-wise Feature Pyramid (CFP) to balance performance, model size, and inference speed for real-time semantic segmentation.

StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis
StyleCLIPDraw adds a style loss to the CLIPDraw text-to-drawing synthesis model to allow artistic control of the synthesized drawings in addition to control of the content via text.

From Global to Local MDI Variable Importances for Random Forests and When They Are Shapley Values
The authors show that the global MDI variable importance scores correspond to Shapley values under some conditions and derive a local MDI importance measure of variable relevance.

Efficiently Modeling Long Sequences with Structured State Spaces
The proposed Structured State Space sequence model is based on a new parameterization for the SSM. It can be computed more efficiently than prior approaches while preserving their strengths.


Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!