Chu Data Lab

  • Home
  • Team
  • Publications
  • Research
    • ML for Data Integration
    • Data Cleaning for ML
    • Fairness in ML
  • Teaching

Publications

PUBLICATIONS

  • Books and Book Chapters
  • Conference Publications
  • Tutorials
  • Demos

Books and Book Chapters

  • Data Cleaning (Book)
    Ihab F. Ilyas, Xu Chu
    ACM Book Series 2019  [Amazon Link]
  • Data Cleaning (Book Chapter)
    Xu Chu
    In Encyclopedia of Big Data Technologies 2018 [PDF]
  • Trends in Cleaning Relational Data: Consistency and Deduplication (Book)
    Ihab F. Ilyas, Xu Chu
    In Foundations and Trends® in Databases, Volume 5, Issue 4, 2015 [PDF]

Conference Publications

  • Learning to be a Statistician: Learned Estimator for Number of Distinct Values
    Renzhi Wu, Boling Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, Jingren Zhou
    VLDB 2022 [PDF]
  • A Model-Agnostic Approach for Learning with Noisy Labels of Arbitrary Distributions
    Shuang Hao, Peng Li, Renzhi Wu, Xu Chu
    ICDE 2022 [PDF]
  • OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning
    Hantian Zhang, Xu Chu, Abolfazl Asudeh, Sham Navathe
    SIGMOD 2021 [PDF]
  • Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples
    Peng Li, Xiang Cheng, Xu Chu, Yeye He, Surajit Chaudhuri
    SIGMOD 2021 [PDF]
  • Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions
    Bojan Karlaš*, Peng Li*, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang. (* denotes equal contributions)
    VLDB 2021 [PDF]
  • CleanML: A Benchmark for Evaluating the Impact of Data Cleaning on ML Classification Tasks
    Peng Li, Xi Rao, Jeffinifer Blase, Yue Zhang, Xu Chu, Ce Zhang
    ICDE 2021 [PDF]
  • ZeroER: Entity Resolution using Zero Labeled Examples
    Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, Saravanan Thirumuruganathan
    SIGMOD 2020 [PDF]
  • GOGGLES: Automatic Training Data Generation with Affinity Coding
    Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu
    SIGMOD 2020 [PDF]
  • Transform-Data-by-Example (TDE): An Extensible Search Engine for Data Transformation
    Yeye He, Xu Chu, Kris Ganjam, Yudian Zheng, Vivek Narasayya, Surajit Chaudhuri
    VLDB 2018 [PDF]
  • HoloClean: Holistic Data Repairs with Probabilistic Inference
    Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, Christopher Ré
    VLDB 2017 [PDF]
  • Detecting Data Errors: Where are we and what needs to be done?
    Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, and Nan Tang
    VLDB 2016 [PDF]
  • Distributed Data Deduplication
    Xu Chu, Ihab F. Ilyas, Paraschos Koutris
    VLDB 2016 [PDF]
  • SEMA-JOIN : Joining Semantically-Related Tables Using Big Table Corpora
    Yeye He, Kris Ganjam, Xu Chu
    VLDB 2015 [PDF]
  • KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
    Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
    SIGMOD 2015 [PDF]
  • TEGRA: Table Extraction by Global Record Alignment
    Xu Chu, Yeye He, Kaushik Chakrabarti, Kris Ganjam.
    SIGMOD 2015 [PDF]
  • Discovering Denial Constraints
    Xu Chu, Ihab F. Ilyas, Paolo Papotti.
    VLDB 2014 [PDF]
  • Holistic Data Cleaning: Putting Violations into Context
    Xu Chu, Ihab F. Ilyas, Paolo Papotti.
    ICDE 2013 [PDF]

Tutorials

  • Qualitative Data Cleaning (Tutorial)
    Xu Chu, Ihab F. Ilyas
    VLDB 2016, [PDF]  [Slides]
  • Data Cleaning: Overview and Emerging Challenges (Tutorial).
    Xu Chu, Ihab F. Ilyas, Sanjay Krishnan, Jiannan Wang
    SIGMOD 2016 [PDF]  (Slides)

Demos

  • Demonstration of Panda: A Weakly Supervised Entity Matching System
    Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He
    VLDB 2021 [PDF]
  • PIClean: a Probabilistic and Interactive Data Cleaning System
    Zhuoran Yu, Xu Chu
    SIGMOD 2019 [PDF]
  • Transform-Data-by-Example (TDE): Extensible Data Transformation in Excel
    Yeye He, Kris Ganjam, Yue Wang, Vivek Narasayya, Surajit Chaudhuri, Xu Chu, Yudian Zheng
    SIGMOD 2018 [PDF]
  • CLAMS: Bringing Quality to Data Lakes.
    Mina Farid, Alexandra Roatis, Ihab F. Ilyas, Hella-Franziska Hoffmann, Xu Chu
    SIGMOD 2016 [PDF]
  • KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
    Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye
    VLDB 2015 [PDF]
  • RuleMiner: Data Quality Rules Discovery (Demo)
    Xu Chu, Ihab F. Ilyas, Paolo Papotti, Yin Ye.
    ICDE 2014 [PDF]

Copyright © 2025 · Agency Pro on Genesis Framework · WordPress · Log in

Contact:
Email: xu.chu@cc.gatech.edu
Phone: 404-894-3160
Fax: 404-385-2295