Data sampling and stratification in data science