Good quality FAIR data is fundamental for enhancing data reuse. When we discuss data quality in the FAIR context, we often focus on the metadata level quality attributes like accessibility and reuse conditions rather than the semantic ones like imbalances, outliers, and duplicates. In practice, ensuring both the metadata and semantic levels of data quality is crucial but also challenging. One solution for this challenge is synthetic data. MIT technology review names synthetic data as one of the ten tech breakthroughs of 2022 citing it as a solution for training AI models when faced with inadequate quality, or incomplete data or biased data. Synthetic data improves data quality and helps accelerate AI projects enabling responsible innovation. Let's understand how it works in practice with the experience of the co-founder of a synthetic data company and how to check for data quality at scale using open-source libraries, as well as metrics required to measure the ensuing synthetic data quality.
Speaker
Shalini Kurapati - Co-founder and CEO Clearbox AI. Shalini leads the strategy, operations and business development at Clearbox AI with her multidisciplinary expertise at the intersection of Technology, Policy and Management. Shalini holds a PhD from Delft University of Technology with a strong R&D background and practical expertise in data management, data privacy and data stewardship. She specialises in transparency, privacy and fairness issues across data life cycles as well as algorithms. Shalini has a wide-ranging international professional experience in the Netherlands, Sweden, India, United States and most recently Italy and is also a Certified Informational Privacy Professional/Europe (CIPP/E) with demonstrable knowledge of GDPR and European e-Privacy laws.
Comments