Data Phoenix Digest - ISSUE 26
New insights into ML, AI, and Data (MAD) landscape, OECD report on investment in AI, China's new take on regulating AI, the FP Growth algorithm, parallelizing Python code, intro to GANs, instance-conditioned GAN, SwinlR, RSDet++, tools, jobs, and more ...
What's new this week?
New insights into ML, AI, and Data (MAD) landscape. OECD report on investment in AI. China's new take on regulating AI. Google uses AI to redesign search, and more.
- The 2021 report on the landscape of Machine Learning, Artificial Intelligence, and Data (MAD) has been released. Courtesy of Matt Turck, VP at FirstMark.
- Investments in AI are growing at an accelerated pace, according to a new report from the OECD. It grew from $3 billion in 2012 to nearly $75 billion in 2020.
- The Cyberspace Administration of China (CAC) has released draft guidelines that regulate the design and use of the algorithmic recommender systems to curate content.
- Google will be applying AI advancements to Google Search, to better connect web searchers to the content they’re looking for, while also making web search feel more natural and intuitive.
- The European Union and the United States have agreed to cooperate on boosting microchip supplies and to promote trustworthy AI, to achieve greater self-reliance in the 21st century.
- The researchers at GlaxoSmithKline and Cambridge Crystallographic Data Centre combined datasets to train ML models to predict stable polymorphs to use in new drug candidates.
- Domino raises $100M in funding led by Great Hill Partners, Coatue Management, Highland Capital Partners, Sequoia Capital, and NVIDIA.
- Autify, a no-code AI-powered software testing automation platform, raises $10M in Series A funding lead by World Innovation Lab (WiL).
- Tasq.ai raises $4M in seed funding. Data annotation is a hot area of investment because it remains a challenge for so many companies.
- Arize raises $19M in Series A funding in a round led by Battery Ventures, to help ML practitioners obtain a deeper understanding of model performance.
What’s An OLAP Cube?
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights.
The FP Growth algorithm
In this exploratory post, you'll be guided through a series of steps to apply the FP Growth algorithm in Python, in order to do frequent itemset mining for basket analysis.
Apache Spark Monitoring: How To Use Spark API & Open-Source Libraries To Get Better Data Observability Of Your Application
In this guide, you’ll learn how to ensure data observability in Spark using Spark’s internal systems like Listener APIs and Query Execution Listeners, and libraries to track data quality metrics.
Introducing TensorFlow Similarity
In this article, you'll learn about the first version of TensorFlow Similarity, a python package designed to make it easy and fast to train similarity models using TensorFlow.
Building AI that Can Generate Images of Things It Has Never Seen Before
GANs are a well-established AI method to create any types of images. Facebook AI shows how Instance-Conditioned GAN (IC-GAN) can generate realistic, unforeseen image combinations.
Parallelizing Python Code
In this article, you'll learn how to parallelize Python code by using process-based parallelism, specialized libraries, IPython parallel, and Ray.
Intro to Generative Adversarial Networks (GANs)
In this 101 guide, you'll look into the ins and outs of GANs, including the intuition of GAN) at a high level, the various GAN variants, and applications for solving real-world problems.
Inferring Concept Drift Without Labeled Data
In this comprehensive report, Cloudera presents four ways to infer concept drift in an unsupervised manner, to reduce false positive drift detections. Check out this gem for sure!
Robust High-Resolution Video Matting with Temporal Guidance
In this paper, you'll find a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance and is much lighter than previous approaches.
GANs can generate photo-realistic images. In this paper, the authors use kernel density estimation techniques to introduce a non-parametric approach to modeling distributions of complex datasets.
RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection
The authors propose a rotation sensitivity detection network (RSDet) which is consists of an eight-param single-stage rotated object detector and the modulated rotation loss.
SwinIR: Image Restoration Using Swin Transformer
SwinlR is a strong baseline model for image restoration based on the Swin Transformer. It consists of three parts: shallow feature extraction, deep feature extraction and HQ image reconstruction.
Fake It Till You Make It
In this research, the AI/ML team at Microsoft demonstrates that it is possible to perform face-related computer vision in the wild using synthetic data alone.
Stochastic Training is Not Necessary for Generalization
The authors show that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD, using modern architectures in settings with/without data augmentation.
A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning
The paper presents a comprehensive overview and survey for AFs in neural networks for DL. AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered.
CODE & TOOLS
Merlion: A Machine Learning Library for Time Series
Merlion is a Python library for time series intelligence that provides an end-to-end ML framework for loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance.
Ploomber is an open-source tool that helps engineers to easily handle .ipynb files, allowing them to develop collaborative, production-ready pipelines using JupyterLab or any text editor.
- Computational Materials Scientist AI / ML - Exabyte.io, San Francisco, Remote
- Senior Python Developer - VITECH, Lviv, Ivano-Frankivsk, Remote
- Postdoctoral Scholar in HPC and AI Performance - Lawrence Berkeley National Lab, Bay Area, California
- Senior Data Scientist - Reddit, New York
- Data Scientist - Patreon, New York
- Data Scientist, Algorithms - Lyft, San Francisco
- Machine Learning Intern (NLP+GPT-3) - bunq, Amsterdam, Netherlands
- Data Science Intern - Faire Wholesale, San Francisco
- Data Science - Intern (2022) - Cloudflare, Austin
- Machine Learning Intern (Summer 2022) - Dropbox, Flexible
Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!
Data Phoenix Newsletter
Join the newsletter to receive the latest updates in your inbox.