Data Phoenix Digest - 19.08.2021
Webinar "The A-Z of Data: Monitoring ML Models in Production"; AI for food security, road safety, and agriculture efficiency; introducing Droidlet; all probability distributions explained in six minutes; make money using NFT + AI; Perceiver IO, SPEAR, SDEdit, StyleGAN-NADA, jobs, and more...
The Data Phoenix Events team invites you all August 25 to the second of our series of "The A-Z of Data" webinars. The topic — Monitoring ML Models in Production.
Speaker: Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to analyze and monitor the performance of machine learning models.
NEWS
What's new this week?
Digital twins for AI optimization and healthcare. SIA Awards. AI for food security, road safety, and agriculture efficiency. AI investment news.
- IBM creates an app store for the digitization of the physical world that brings together enterprises, services, and tools providers, to drive its AI optimization efforts.
- Japanese telecom giant NTT launches a major initiative to improve digital health through precision medicine using AI and digital twin technology.
- SIA announced Jensen Huang, NIVIDIA CEO, is the 2021 recipient of the Robert N. Noyce Award, recognizing his outstanding contributions to the semiconductor industry.
- AI-driven technology are already providing solutions to global food insecurity. Learn how Dimitra, a company advocating for data-driven farming, takes part in that.
- AI can reduce the danger and improve safety for truck drivers. KeepTruckin is launching a new AI dashcam, to increase safety and efficiency for drivers and prevent accidents.
- Robotics and AI can be used to identify grape plants infected with a devastating fungus. Learn about a collaboration between a biologist and an engineer to protect grape crops.
Funding news: Lucid Lane — $16 million in Series A funding; Motivo — $12 million in Series A funding; Monte Carlo — $60 million in Series C funding
ARTICLES
Containerizing Apache Hadoop Infrastructure at Uber
This article provides a summary of problems the Uber team faced when re-architecting the Hadoop deployment stack (Docker containers), and how they solved them along the way.
Learning from Evolution: Using AI Language Models to Design Functional Artificial Proteins
AI can generate highly realistic natural language sentences. ProGen can learn the language of proteins to generate artificial protein sequences across multiple families.
Align Before Fuse (ALBEF): Advancing Vision-Language Understanding with Contrastive Learning
ALBEF is a simple, end-to-end, and powerful framework for vision-language representation learning. Find the pre-trained model and code to spur more research in this important topic.
Learning to Extrapolate with Generative AI Models
GENhance is an AI model can generate sequences across natural language and proteins with attributes that go beyond the training distribution. Learn more about its applications!
Introducing Droidlet, a One-Stop Shop for Modularly Building Intelligent Agents
Droidlet is an agent architecture and a platform for building embodied agents that simplifies integrating a wide range of M) algorithms in embodied systems and robotics.
Generally Capable Agents Emerge from Open-Ended Play
"Open-Ended Learning Leads to Generally Capable Agents" is a preprint of how DeepMind has trained an agent capable of playing many different games without needing human interaction data.
Data Movement in Netflix Studio via Data Mesh
Learn about Netflix's journey to a more efficient data movement using Data Mesh, to improve pace of productions and efficiency of global business operations using the most up-to-date information.
The Only 3 ML Tools You Need
The author claims that in this piece you'll learn about the only three ML tools you need to make your team successful in applying ML in your product. Let's check what she means!
GitHub Copilot Open Source Alternatives
Copilot, the preview of its "AI pair programmer," is a code completion style tool designed to provide line or function suggestions in your IDE. Learn more about this GitHub's tool.
All Probability Distributions Explained in Six Minutes
Probability distributions are the basics of any data science work. In this article, the author makes an effort to provide easy and intuitive explanations of the most important probability distributions.
Make Money Using NFT + AI | GAN Image Generation
In this article, you'll see how to create new images using GAN, with focus on generating art using Stylegan2-ADA. The goal is to create contemporary art via NFT and sell it via Opeansea.
PAPERS
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO overcomes the limitations of Perceiver without sacrificing its properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO decouples model depth from data size and still scales linearly with data size.
Observation of Time-Crystalline Eigenstate Order on a Quantum Processor
The authors demonstrate the characteristic spatiotemporal response of a DTC for generic initial states. A time-reversal protocol discriminates external decoherence from intrinsic thermalization and uses quantum typicality to circumvent the cost of densely sampling the eigenspectrum.
SPEAR : Semi-supervised Data Programming in Python
SPEAR is an open-source Python library for data programming with semi supervision. It implements several recent data programming approaches to facilitate weak supervision in the form of heuristics (or rules) and association of noisy labels to the training dataset.
Internal Video Inpainting by Implicit Long-range Propagation
In this paper, the authors propose a novel framework for video inpainting by adopting an internal learning strategy. It allows cross-frame context propagation to inpaint unknown regions by fitting a convolutional neural network to the known region.
SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
Stochastic Differential Editing (SDEdit) is a new image editing and synthesis framework based on a recent generative model using stochastic differential equations (SDEs). The proposed approach achieves strong performance on a wide range of applications.
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
The authors present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image from those domains, by leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models.
JOBS
- Data Engineer, Appian
- Applied Scientist II - ML/NLP, Amazon
- Data Scientist - Analytics, Host Quality, Airbnb
- Principal Data Scientist, Atlassian
- Machine Learning Scientist, Amazon
- Data Science Intern (Summer 2022), Dropbox
- Marketing Data Engineer, Stripe
- Data Engineer, Mozilla
- Sr. Data Scientist - Security, Snowflake
- Senior Data Scientist - Fraud, Udemy
Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!