Continuous Delivery for Machine Learning (CD4ML)
When it comes to working with data, we rarely know beforehand the right way to derive insights and extract value. Often, it is impossible to know exactly what can be done with data until we start exploring it. This makes it difficult to set expectations and goals for a data project. On top of that, the challenges of real-world data add complexity and toil, and deploying data-driven products is still challenging.
In this full-day workshop, we explore a representative data problem, with all the messiness that one can expect from real-world, non-curated data. By introducing the concepts of Hypothesis-Driven Development and Continuous Delivery for Machine Learning (CD4ML), we’ll teach participants how to develop data science and analysis workflows that enable rapid development and deployment of data science projects with the goal of incremental and continuous improvement. Participants in this workshop can expect to spend much of the day with their hands on the keyboard. We’ve created an example data science application that’s ready for participants to jump in to. It uses real data, an industry-standard toolset, and addresses a typical machine learning task.
We’ll start the morning explaining Hypothesis-Driven Development, including what it is and when to (and when not to) use it, and get our hands dirty with data exploration, feature engineering, and data visualization. Methods and techniques will be suitable for entry-level data science.
After the lunch break, we’ll introduce the concepts of Continuous Delivery, especially as it applies to scientific computing, and work towards developing our machine learning model in a live productive system. These concepts will be accessible to those who may have limited or no experience in Continuous Delivery environments, and we’ll explore approaches to and benefits of concepts like unit testing, continuous integration, trunk-based development, and more.