Glossary

Select one of the keywords on the left…

The Data Science PipelineIntroduction

Reading time: ~5 min

In this mini-course, we will introduce a collection of skills commonly applied to solve data problems in industry and science. These skills correspond to stages of a typical data science project: we acquire data, wrangle it into a form conducive to further analysis, visualize the data to better understand it, model the data to gain further insight and make predictions about the process that generated the data, and communicate our results to stakeholders.

We will be using the Python data science ecosystem for developing the computational pipeline skills: Pandas for data wrangling, Plotly for data visualization, and Scikit-Learn for modeling. These packages are popular enough to be a good investment of your time even if you eventually settle into some other toolchain, because the experience will help you in interviews and when collaborating with the Python users you will inevitably encounter.