This research, led by William Herlands, a PhD student in the Machine Learning and Public Policy program at Carnegie Mellon University, used Project Tycho weekly case count data for measles in US states from 1935 to 2003 to develop a scalable, multidimensional Gaussian process model that can learn a complex change surface from data. This model was able to detect change points in measles time series that correspond to the years of vaccine introduction. This research is a great example of how real-world historical data curated by Project Tycho can inform cutting-edge machine learning methods.
Related Project Tycho Datasets
We present a scalable Gaussian process model for identifying and characterizing smooth multidimensional changepoints, and automatically learning changes in expressive covariance structure. We use Random Kitchen Sink features to flexibly define a change surface in combination with expressive spectral mixture kernels to capture the complex statistical structure. Finally, through the use of novel methods for additive non-separable kernels, we can scale the model to large datasets. We demonstrate the model on numerical and real world data, including a large spatio-temporal disease dataset where we identify previously unknown heterogeneous changes in space and time.
Read the full article