Data Phoenix Digest - ISSUE 31

NVIDIA GTC, Isomorphic Labs, and AI-powered drug discovery, algorithms predicting stress at the atomic scale, training a DCGAN in PyTorch, deploying your first ML API, alias-free Generative Adversarial Networks, causal ImageNet, SchNet, CaraNet, obs, and more ...

Dmitry Spodarets
Dmitry Spodarets


What's new this week?

NVIDIA GTC. AI-driven meldR platform. Isomorphic Labs and AI-powered drug discovery. AI skin cancer diagnosis. EU regulations for AI. Algorithms predicting stress at atomic scale.

  • GTC Wrap-Up: Jensen Huang outlines NVIDIA's vision for accelerated computing, data center architecture, AI, robotics, omniverse avatars, and digital twins. Find more news here.
  • Data Society launches meldR, a learning experience and communication platform, to enable healthcare and life sciences companies to deliver AI/ML data science learning pathways.
  • Demis Hassabis, Founder and CEO of DeepMind, announces the creation of Isomorphic Labs, to reimagine drug discovery from first principles with an AI-first approach.
  • AI systems being developed to diagnose skin cancer run the risk of being less accurate for people with dark skin, according to Oxford research.
  • The EU faces a great challenge with AI regulation. The technology promises too many benefits, yet it faces the risk of overregulation that could prevent its development.
  • Aerospace engineers Yue Cui and Huck Beng Chew at the University of Illinois Urbana-Champaign use machine learning to predict stress in copper at the atomic scale.

Funding News

  • Landing AI, led by artificial intelligence visionary, Andrew Ng, raises $57M in Series A funding led by McRock Capital.
  • raises $100M in Series E funding, a round of funding that values at $1.7 billion post-money ($1.6 billion pre-money).
  • Sama, the first end-to-end AI platform that enables teams to manage the complete AI lifecycle, raises $70M in Series B funding led by Caisse de dépôt et placement du Québec (CDPQ).
  • WhyLabs, the provider of observability for AI and data applications, raises $10M million in Series A co-led by Defy Partners and Andrew Ng’s AI Fund.
  • Kodiak Robotics, a startup developing self-driving truck technologies, raises $125M in an oversubscribed series B round for a total of $165M to date.


MLOps and DevOps: Why Data Makes It Different
In this article, the O'Reilly team digs into the fundamentals of machine learning as an engineering discipline to answer key questions about MLOps, DevOps, and their evolution through data.

Detect Industrial Defects at Low Latency with Computer Vision at the Edge with Amazon SageMaker Edge
Learn how to create the cloud to edge solution with Amazon SageMaker to detect defective parts from a real-time stream of images sent to an edge device. A demo included.

Training a DCGAN in PyTorch
In this tutorial, you'll learn how to train our first DCGAN Model using PyTorch to generate images. Check out Part 2 and Part 3 of the series on Advanced PyTorch Techniques.

Serving ML Models in Production: Common Patterns
This article explores Ray Serve, a service combining pipelines, ensemble, business logic, and online learning for machine learning. Learn how to use the service for serving ML models in production.

Practical Differentially Private Clustering
In this article, you'll learn about Google's new differentially private clustering algorithm, which is based on privately generating new representative data points.

Deploying Your First Machine Learning API
In this article, you'll find a project explaining step by step how to develop and deploy ML API using FastAPI and Deta. The code is pretty simple.

Principal Component Analysis for Visualization
In this tutorial, you'll learn how to visualize a high dimensional data, use explained variance in PCA, and visually observe the explained variance from the result of PCA of high dimensional data.


The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research. Learn about the solution!

Alias-Free Generative Adversarial Networks
The researchers trace the root cause to careless signal processing that causes aliasing in the generator network and derive architectural changes that guarantee better results.

SSAST: Self-Supervised Audio Spectrogram Transformer
In this paper, the authors aim to alleviate the data requirement issues with the AST by leveraging self-supervised learning using unlabeled data for audio and speech classification.

Causal ImageNet: How to discover spurious features in Deep Learning
The authors propose a new method of identifying spurious or causal neural features (penultimate layer neurons of a robust model) via limited human supervision.

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Pretrained language models are very expensive to train. The authors propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining to democratize NLP.

SchNet: A Deep Learning Architecture for Molecules and Materials
In this paper, the researchers present the deep learning architecture SchNet that is designed to model atomistic systems by making use of continuous-filter convolutional layers.

CaraNet: Context Axial Reverse Attention Network for Segmentation of Small Medical Objects
Segmenting medical images is important for disease diagnosis and treatment. This paper proposes CaraNet to improve the segmentation performance on small objects compared.


Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!