Data Phoenix Digest - ISSUE 26

New insights into ML, AI, and Data (MAD) landscape, OECD report on investment in AI, China's new take on regulating AI, the FP Growth algorithm, parallelizing Python code, intro to GANs, instance-conditioned GAN, SwinlR, RSDet++, tools, jobs, and more ...

Dmitry Spodarets


What's new this week?

New insights into ML, AI, and Data (MAD) landscape. OECD report on investment in AI. China's new take on regulating AI. Google uses AI to redesign search, and more.

Funding News

  • Domino raises $100M in funding led by Great Hill Partners, Coatue Management, Highland Capital Partners, Sequoia Capital, and NVIDIA.
  • Autify, a no-code AI-powered software testing automation platform, raises $10M in Series A funding lead by World Innovation Lab (WiL).
  • raises $4M in seed funding. Data annotation is a hot area of investment because it remains a challenge for so many companies.
  • Arize raises $19M in Series A funding in a round led by Battery Ventures, to help ML practitioners obtain a deeper understanding of model performance.


What’s An OLAP Cube?
An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights.

The FP Growth algorithm
In this exploratory post, you'll be guided through a series of steps to apply the FP Growth algorithm in Python, in order to do frequent itemset mining for basket analysis.

Apache Spark Monitoring: How To Use Spark API & Open-Source Libraries To Get Better Data Observability Of Your Application
In this guide, you’ll learn how to ensure data observability in Spark using Spark’s internal systems like Listener APIs and Query Execution Listeners, and libraries to track data quality metrics.

Introducing TensorFlow Similarity
In this article, you'll learn about the first version of TensorFlow Similarity, a python package designed to make it easy and fast to train similarity models using TensorFlow.

Building AI that Can Generate Images of Things It Has Never Seen Before
GANs are a well-established AI method to create any types of images. Facebook AI shows how Instance-Conditioned GAN (IC-GAN) can generate realistic, unforeseen image combinations.

Parallelizing Python Code
In this article, you'll learn how to parallelize Python code by using process-based parallelism, specialized libraries, IPython parallel, and Ray.

Intro to Generative Adversarial Networks (GANs)
In this 101 guide, you'll look into the ins and outs of GANs, including the intuition of GAN) at a high level, the various GAN variants, and applications for solving real-world problems.

Inferring Concept Drift Without Labeled Data
In this comprehensive report, Cloudera presents four ways to infer concept drift in an unsupervised manner, to reduce false positive drift detections. Check out this gem for sure!


Robust High-Resolution Video Matting with Temporal Guidance
In this paper, you'll find a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance and is much lighter than previous approaches.

Instance-Conditioned GAN
GANs can generate photo-realistic images. In this paper, the authors use kernel density estimation techniques to introduce a non-parametric approach to modeling distributions of complex datasets.

RSDet++: Point-based Modulated Loss for More Accurate Rotated Object Detection
The authors propose a rotation sensitivity detection network (RSDet) which is consists of an eight-param single-stage rotated object detector and the modulated rotation loss.

SwinIR: Image Restoration Using Swin Transformer
SwinlR is a strong baseline model for image restoration based on the Swin Transformer. It consists of three parts: shallow feature extraction, deep feature extraction and HQ image reconstruction.

Fake It Till You Make It
In this research, the AI/ML team at Microsoft demonstrates that it is possible to perform face-related computer vision in the wild using synthetic data alone.

Stochastic Training is Not Necessary for Generalization
The authors show that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD, using modern architectures in settings with/without data augmentation.

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning
The paper presents a comprehensive overview and survey for AFs in neural networks for DL. AFs such as Logistic Sigmoid and Tanh based, ReLU based, ELU based, and Learning based are covered.


Merlion: A Machine Learning Library for Time Series
Merlion is a Python library for time series intelligence that provides an end-to-end ML framework for loading and transforming data, building and training models, post-processing model outputs, and evaluating model performance.

Ploomber is an open-source tool that helps engineers to easily handle .ipynb files, allowing them to develop collaborative, production-ready pipelines using JupyterLab or any text editor.


Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!