DataPrep: Data Preparation in Python

Real-world data scientists often spend over 80% of their time on data preparation (data collection --> data understanding --> data cleaning --> data integration --> feature engineering). We believe that the main reason that data preparation takes a lot of human time is due to the lack of a good data preparation tool. Our vision is to build DataPrep (, a fast and easy-to-use python library for data preparation to fill the gap. You can think of DataPrep as "scikit-learn" for data preparation.


