Data Phoenix Digest - ISSUE 34
The launch of PyTorch Live, training a DCGAN in PyTorch, text classification with BERT in PyTorch, transformers from scratch, Machine-in-the-Loop rewriting for creative image captioning, TorchGeo, LILA, OpenPrompt, WaveFake, books, jobs, and more ...
What's new this week?
The launch of PyTorch Live. A new kid of the AWS AI/ML stack — Amazon SageMaker Canvas. AI advances in biology, art, chemistry, and more.
- AWS releases Amazon SageMaker Canvas, a new visual, no code capability that allows to build ML models and generate predictions without writing code or requiring ML expertise.
- Biology is up for an AI/ML revolution. We are getting closer to being able to ‘program biology’ for diagnostic and treatment purposes. Challenges are numerous, however.
- A new machine-learning model developed by MIT researchers has the potential to enable robots to understand interactions in the world in the way humans do.
- Botto, an AI-powered program that creates art, has been on the market for five weeks and has raked in more than €1 million from its first four NFT artworks at auction.
- A Columbia Engineering team has developed a new computation technique that can accurately predict the reduction temperature of metal oxides to their base metals.
- FJDynamics, a robotics startup, closes a Series B round of $70M as it advances its goal to empower workers in the harshest environment with robotic technologies.
- CloudTrucks, an enabler of cloud technology in trucking, raises $115M to keep building the tools for trucking entrepreneurs to succeed and thrive in the industry.
- Simpro, a field service management software company, raises $350M from K1 Investment Management with participation from existing investor Level Equity.
Parameter Exploration at Lyft
In this article, you'll learn about parameter exploration practices at Lyft, including the ups and downs of the methods they agreed on, to drive data-driven decision making at scale.
Root Causing Data Failures
Handling data is not an easy task. In this post, you'll find out how Anomalo, a data quality platform, can help you find the root cause of data quality issues automatically.
Training an Object Detector from Scratch in PyTorch
In this tutorial, you'll learn how to train a custom object detector from scratch using PyTorch. Note that this lesson is part 2 of a 3-part series on advanced PyTorch techniques.
Orchestrate a Data Science Project in Python With Prefect
This step-by-step guide will teach you how you can use Prefect to optimize your DS workflow in a few lines of Python code, to increase efficiency in the long run.
Text Classification with BERT in PyTorch
Text classification can be challenging. Fortunately, you can now use a pre-trained BERT model from Hugging Face to classify text of news articles. Learn how in this article!
Training a DCGAN in PyTorch
In this tutorial, you'll learn how to train our first DCGAN Model using PyTorch to generate images. Note that this lesson is part 1 of a 3-part series on Advanced PyTorch Techniques.
SynapseML: A Simple, Multilingual, and Massively Parallel Machine Learning Library
SynapseML (previously MMLSpark) is an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. Learn more about how it can help you.
Transformers from Scratch
In this massive tutorial, you'll take a deep dive in transformers. Learn how to make a speech-to-text converter for our imaginary voice-controlled computer with the author.
TorchGeo: Deep Learning with Geospatial Data
TorchGeo is a Python library for integrating geospatial data into the PyTorch deep learning ecosystem that enables deep learning for remote sensing applications.
LILA: Language-Informed Latent Actions
Language-Informed Latent Actions (LILA) is a framework for learning natural language interfaces in the context of human-robot collaboration under the shared autonomy paradigm.
Machine-in-the-Loop Rewriting for Creative Image Captioning
In this paper, the authors propose a rewriting model that modifies specified spans of text within the user's original draft to introduce descriptive and figurative elements locally in the text.
OpenPrompt: An Open-source Framework for Prompt-learning
Prompt-learning has become a new paradigm in modern NLP. OpenPrompt is a framework that allows to combine different PLMs, task formats, and prompting modules in a unified paradigm.
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
The authors explore signal processing techniques for analyzing audio signals to generate a novel data set with nine sample sets from five network architectures and two baseline models.
Scientific Visualization: Python + Matplotlib
In this extensive research material by Nicolas P. Rougier, you'll explore the Python scientific visualization landscape, from simple rules to 3D figures, optimization, and animation.
- Data Engineer at Wikimedia Foundation (Remote)
- Staff Data Scientist at Instacart (San Francisco, CA - Remote)
- Sr. Data Engineer at HashiCorp (United States - Remote)
- Product Data Scientist at Mozilla (Remote US, Remote Canada)
- Senior Data Engineer at Twitch (United States - Remote)
Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!
Data Phoenix Newsletter
Join the newsletter to receive the latest updates in your inbox.