Data Phoenix Digest - 19.08.2021

The Data Phoenix Events team invites you all August 25 to the second of our series of "The A-Z of Data" webinars. The topic — Monitoring ML Models in Production.

Speaker: Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to analyze and monitor the performance of machine learning models.

Register


NEWS

What's new this week?

Digital twins for AI optimization and healthcare. SIA Awards. AI for food security, road safety, and agriculture efficiency. AI investment news.

Funding news: Lucid Lane — $16 million in Series A funding; Motivo — $12 million in Series A funding; Monte Carlo — $60 million in Series C funding

ARTICLES

Containerizing Apache Hadoop Infrastructure at Uber
This article provides a summary of problems the Uber team faced when re-architecting the Hadoop deployment stack (Docker containers), and how they solved them along the way.

Learning from Evolution: Using AI Language Models to Design Functional Artificial Proteins
AI can generate highly realistic natural language sentences. ProGen can learn the language of proteins to generate artificial protein sequences across multiple families.

Align Before Fuse (ALBEF): Advancing Vision-Language Understanding with Contrastive Learning
ALBEF is a simple, end-to-end, and powerful framework for vision-language representation learning. Find the pre-trained model and code to spur more research in this important topic.

Learning to Extrapolate with Generative AI Models
GENhance is an AI model can generate sequences across natural language and proteins with attributes that go beyond the training distribution. Learn more about its applications!

Introducing Droidlet, a One-Stop Shop for Modularly Building Intelligent Agents
Droidlet is an agent architecture and a platform for building embodied agents that simplifies integrating a wide range of M) algorithms in embodied systems and robotics.

Generally Capable Agents Emerge from Open-Ended Play
"Open-Ended Learning Leads to Generally Capable Agents" is a preprint of how DeepMind has trained an agent capable of playing many different games without needing human interaction data.

Data Movement in Netflix Studio via Data Mesh
Learn about Netflix's journey to a more efficient data movement using Data Mesh, to improve pace of productions and efficiency of global business operations using the most up-to-date information.

The Only 3 ML Tools You Need
The author claims that in this piece you'll learn about the only three ML tools you need to make your team successful in applying ML in your product. Let's check what she means!

GitHub Copilot Open Source Alternatives
Copilot, the preview of its "AI pair programmer," is a code completion style tool designed to provide line or function suggestions in your IDE. Learn more about this GitHub's tool.

All Probability Distributions Explained in Six Minutes
Probability distributions are the basics of any data science work. In this article, the author makes an effort to provide easy and intuitive explanations of the most important probability distributions.

Make Money Using NFT + AI | GAN Image Generation
In this article, you'll see how to create new images using GAN, with focus on generating art using Stylegan2-ADA. The goal is to create contemporary art via NFT and sell it via Opeansea.

PAPERS

Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO overcomes the limitations of Perceiver without sacrificing its properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO decouples model depth from data size and still scales linearly with data size.

Observation of Time-Crystalline Eigenstate Order on a Quantum Processor
The authors demonstrate the characteristic spatiotemporal response of a DTC for generic initial states. A time-reversal protocol discriminates external decoherence from intrinsic thermalization and uses quantum typicality to circumvent the cost of densely sampling the eigenspectrum.

SPEAR : Semi-supervised Data Programming in Python
SPEAR is an open-source Python library for data programming with semi supervision. It implements several recent data programming approaches to facilitate weak supervision in the form of heuristics (or rules) and association of noisy labels to the training dataset.

Internal Video Inpainting by Implicit Long-range Propagation
In this paper, the authors propose a novel framework for video inpainting by adopting an internal learning strategy. It allows cross-frame context propagation to inpaint unknown regions by fitting a convolutional neural network to the known region.

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
Stochastic Differential Editing (SDEdit) is a new image editing and synthesis framework based on a recent generative model using stochastic differential equations (SDEs). The proposed approach achieves strong performance on a wide range of applications.

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
The authors present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image from those domains, by leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models.

JOBS

Looking to feature your open positions in the digest? Kindly reach out to us at editor@dataphoenix.info for details. We'll be proud to help your business thrive!