Data Phoenix Digest - ISSUE 39

Data Phoenix team looking for speakers, introduction to clustering in Python with PyCaret, OCR passports with OpenCV and Tesseract, visualizing decision trees with Pybaobabdt, Pix2Pix, PP-ShiTu, FuseDream, PartImageNet, jobs, and more ...

Dmitry Spodarets
Dmitry Spodarets

Data Phoenix team will renew our "The A-Z of Data" webinars at the end of January. We're looking for speakers to collaborate on these activities. If you are looking for a platform with a relevant audience and you have experience and knowledge to share, we'll be glad to see you among the speakers. If you are our candidate or know someone who might be interested, let us know by email at [email protected].

ARTICLES

Introduction to Clustering in Python with PyCaret
PyCaret is an open-source, low-code ML library in Python that automates ML workflows. Let’s learn how you can enable and do unsupervised clustering tasks in Python with it.

Visualizing Decision Trees with Pybaobabdt
In this article, you’ll learn how to enable decision tree visualization and model interpretation. It comes with a free package of solutions for all tasks you may need to accomplish.

Pix2pix: Key Model Architecture Decisions
Pix2Pix is a conditional GAN that uses images and labels to generate images. In this article, you'll learn about its architectures and explore the examples on Pix2Pix.

Announcing PyCaret’s New Time Series Module
PyCaret’s new time series module is now available in beta. It is consistent with the existing API and comes with a lot of functionalities. To give it a try, check out the official quick start notebook.

Most Common Coding Mistakes on Data Science Interviews
Job interviews can be stressful. Here are some of the most common mistakes candidates make when answering SQL questions during their DS interviews.

OCR Passports with OpenCV and Tesseract
In this tutorial, you'll develop a CV system that can automatically locate the machine-readable zones (MRZs) in a scan of a passport. It is part 4 of a 4-part series on OCR 120.

PAPERS

PP-ShiTu: A Practical Lightweight Image Recognition System
PP-ShiTu is a practical lightweight image recognition system that uses metric learning, deep hash, knowledge distillation, and model quantization, to improve accuracy and inference speed.

AI and the Everything in the Whole Wide World Benchmark
In this paper, the authors explore the limits of AI benchmarks to reveal the construct validity issues in their framing as the functionally "general" broad measures of intended progress.

PartImageNet: A Large, High-Quality Dataset of Parts
PartImageNet is a large, high-quality dataset with part segmentation annotations. It consists of 158 classes from ImageNet with approximately 24000 images with non-rigid, articulated objects.

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
FuseDream is a pipeline that can generate high-quality images with varying objects, backgrounds, artistic styles, and novel counterfactual concepts that don't appear in the training data of the GAN.

JOBS

Looking to feature your open positions in the digest? Kindly reach out to us at [email protected] for details. We'll be proud to help your business thrive!

Digest