Data Phoenix Digest - 01.07.2021

Data Phoenix Rises

We at Data Science Digest have always strived to ignite the fire of knowledge in the AI community. We’re proud to have helped thousands of people to learn something new and give you the tools to push ahead. And we’ve not been standing still, either.

Please meet Data Phoenix, a Data Science Digest rebranded and risen anew from our own flame. Our mission is to help everyone interested in Data Science and AI/ML to expand the frontiers of knowledge. More news, more updates, and webinars(!) are coming. Stay tuned!

NEWS

AI that helps write code. AI-generated artwork. EU’s ban on biometric surveillance. Spain’s push for leadership in smart technologies. And fusion experiments forecasted by AI.

Can AI help you write code? Well, if you’re unsure about the answer, check out GitHub Copilot, a new AI tool for programmers. GitHub Copilot draws context from the code, suggesting whole lines or entire functions, to help you complete your work faster. On the other side of the fence, AI empowers artists. NVIDIA Canvas, a new, AI/ML-powered application, uses AI to help artists quickly paint beautiful, realistic artwork.

In the meantime in Europe, the EU is pushing forward to significantly limit the use of AI and related technologies like Computer Vision to monitor the public. The AI Regulation is just one of many digital proposals unveiled by EU lawmakers in recent months. Negotiations between the different EU institutions continue as the bloc works toward adopting new digital rules. As the regulators keep hogging the blanket, countries like Spain have big plans for AI, to tackle the shortage of workforce.

AI & ML advance critical research in physics. Dan Boyer of the US Department of Energy's (DOE) Princeton Plasma Physics Laboratory (PPPL) has used machine learning to develop fast and accurate predictions for advancing control of experiments in the National Spherical Torus Experiment-Upgrade (NSTX-U).

ARTICLES

How to Use XGBoost for Time Series Forecasting
In this tutorial, you’ll learn how to fit, evaluate, and make predictions with an XGBoost model for time series forecasting. You’ll also look into the basics of XGBoost ensemble and time series data preparation.

Genetic Algorithms for Natural Language Processing
In this article, you’ll find a technical overview of genetic algorithms and dive deep into how they are related to NLP. As a result, you’ll learn why genetic algorithms are effective to develop a vocabulary of tokenized grams.

Differential Evolution from Scratch in Python
In this tutorial, you’ll explore the ins and outs of differential evolution and learn how to implement its algorithm in Python, and to apply the differential evolution algorithm to a real-valued 2D objective function.

Style Your Pandas DataFrame and Make It Stunning
In this article, you’ll learn about the built-in methods to style the dataframe in Pandas. You’ll also practice how to create custom styling functions, customize the dataframe at HTML and CSS level, and save the styled dataframe into excel files.

The FLORES-101 Data Set: Helping Build Better Translation Systems Around the World
Building on the success of machine translation systems like M2M-100, Facebook AI has open-sourced FLORES-101, a many-to-many evaluation data set covering 101 languages from all over the world, to enable researchers to rapidly test and improve upon multilingual translation models like M2M-100. In this article, you’ll delve into its basics.

PAPERS

Alias-Free GAN
In this paper, the group of researchers from NVIDIA and Aalto University explore the synthesis process of typical generative adversarial networks and the challenges of how they process images. They present more advanced methods and networks to pave the way for generative models better suited for video and animation.

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
In this paper, Yuxin Fang et al. present You Only Look at One Sequence (YOLOS), a series of object detection models based on the naïve Vision Transformer with the fewest possible modifications and inductive biases. They also discuss the limitations of current pre-train schemes and model scaling strategies for Transformer in vision through object detection.

CoAtNet: Marrying Convolution and Attention for All Data Sizes
In this paper, Zihang Dai et al. demonstrate that while Transformers tend to have larger model capacity, their generalization can be worse than convolutional networks due to the lack of the right inductive bias. They present CoAtNets, a new family of hybrid models, to tackle the problem.

Consistent Instance False Positive Improves Fairness in Face Recognition
In this paper, Xingkun Xu et al. propose a false positive rate penalty loss, a novel method to mitigate face recognition bias by increasing the consistency of instance False Positive Rate (FPR). The method requires no demographic annotations, allowing to mitigate bias among demographic groups divided by various attributes.

Multivariate Probabilistic Regression with Natural Gradient Boosting
Natural Gradient Boosting (NGBoost) is a new method proposed by the researchers. It is based on nonparametrically modeling the conditional parameters of the multivariate predictive distribution. The method is robust, works out-of-the-box without extensive tuning, is modular with respect to the assumed target distribution, and performs competitively in comparison to existing approaches.

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
In this research, Yongming Rao et al. propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input. A lightweight prediction module can estimate the importance score of each token given the current features. The module is added to different layers to prune redundant tokens hierarchically.

PROJECTS

MLOps Toys
The platform is a collection of MLOps projects by category, including data versioning, training orchestration, feature store, experiment tracking, model serving, model monitoring, and explainability.

VIDEOS

Data Governance
In this video, Jessi Ashdown, Uri Gilad, and Alexey Grigorev discuss data governance, from implementing specific data policies and reasons to do data governance in the first place to data quality and using data catalogs.

Ingestion and Historization in the Data Lake
In this video, Alexey Grigorev, the founder of DataTalks.Club, hosts Illia Todor, Data Engineer, to talk about ingestion and historization of data in the data lake.