PUBLICATIONS
Books and Book Chapters
- Data Cleaning (Book)
Ihab F. Ilyas, Xu Chu
ACM Book Series 2019 [Amazon Link] - Data Cleaning (Book Chapter)
Xu Chu
In Encyclopedia of Big Data Technologies 2018 [PDF] - Trends in Cleaning Relational Data: Consistency and Deduplication (Book)
Ihab F. Ilyas, Xu Chu
In Foundations and Trends® in Databases, Volume 5, Issue 4, 2015 [PDF]
Conference Publications
- Learning to be a Statistician: Learned Estimator for Number of Distinct Values
Renzhi Wu, Boling Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, Jingren Zhou
VLDB 2022 [PDF] - A Model-Agnostic Approach for Learning with Noisy Labels of Arbitrary Distributions
Shuang Hao, Peng Li, Renzhi Wu, Xu Chu
ICDE 2022 [PDF] - OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning
Hantian Zhang, Xu Chu, Abolfazl Asudeh, Sham Navathe
SIGMOD 2021 [PDF] - Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples
Peng Li, Xiang Cheng, Xu Chu, Yeye He, Surajit Chaudhuri
SIGMOD 2021 [PDF] - Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
Bojan Karlaš*, Peng Li*, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang. (* denotes equal contributions)
VLDB 2021 [PDF] - CleanML: A Benchmark for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Peng Li, Xi Rao, Jeffinifer Blase, Yue Zhang, Xu Chu, Ce Zhang
ICDE 2021 [PDF] - ZeroER: Entity Resolution using Zero Labeled Examples
Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, Saravanan Thirumuruganathan
SIGMOD 2020 [PDF] - GOGGLES: Automatic Training Data Generation with Affinity Coding
Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu
SIGMOD 2020 [PDF] - Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformation
Yeye He, Xu Chu, Kris Ganjam, Yudian Zheng, Vivek Narasayya, Surajit Chaudhuri
VLDB 2018 [PDF] - HoloClean: Holistic Data Repairs with Probabilistic Inference
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, Christopher Ré
VLDB 2017 [PDF] - Detecting Data Errors: Where are we and what needs to be done?
Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, and Nan Tang
VLDB 2016 [PDF] - Distributed Data Deduplication
Xu Chu, Ihab F. Ilyas, Paraschos Koutris
VLDB 2016 [PDF] - SEMA-JOIN : Joining Semantically-Related Tables Using Big Table Corpora
Yeye He, Kris Ganjam, Xu Chu
VLDB 2015 [PDF] - KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
SIGMOD 2015 [PDF] - TEGRA: Table Extraction by Global Record Alignment
Xu Chu, Yeye He, Kaushik Chakrabarti, Kris Ganjam.
SIGMOD 2015 [PDF] - Discovering Denial Constraints
Xu Chu, Ihab F. Ilyas, Paolo Papotti.
VLDB 2014 [PDF] - Holistic Data Cleaning: Putting Violations into Context
Xu Chu, Ihab F. Ilyas, Paolo Papotti.
ICDE 2013 [PDF]
Tutorials
- Qualitative Data Cleaning (Tutorial)
Xu Chu, Ihab F. Ilyas
VLDB 2016, [PDF] [Slides] - Data Cleaning: Overview and Emerging Challenges (Tutorial).
Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang
SIGMOD 2016 [PDF] (Slides)
Demos
- Demonstration of Panda: A Weakly Supervised Entity Matching System
Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He
VLDB 2021 [PDF] - PIClean: a Probabilistic and Interactive Data Cleaning System
Zhuoran Yu, Xu Chu
SIGMOD 2019 [PDF] - Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel
Yeye He, Kris Ganjam, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, Yudian Zheng
SIGMOD 2018 [PDF] - CLAMS: Bringing Quality to Data Lakes.
Mina Farid, Alexandra Roatis, Ihab F. Ilyas, Hella-Franziska Hoffmann, Xu Chu
SIGMOD 2016 [PDF] - KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
VLDB 2015 [PDF] - RuleMiner: Data Quality Rules Discovery (Demo)
Xu Chu, Ihab F. Ilyas, Paolo Papotti, Yin Ye.
ICDE 2014 [PDF]