A team of researchers led by Fan Yang of the University of Pittsburgh School of Computing and Information developed an automatic approach for re-constructing historical counts from possibly overlapping or incomplete aggregated reports, and evaluated the method using data from Project Tycho.
Related Project Tycho Datasets
United States of America - Acute nonparalytic poliomyelitis
United States of America - Acute paralytic poliomyelitis United States of America - Acute poliomyelitis United States of America - Acute type A viral hepatitis
United States of America - Congenital rubella syndrome
United States of America - Measles
United States of America - Mumps United States of America - Pertussis United States of America - Rubella United States of America - Smallpox United States of America - Smallpox without rash
United States of America - Viral hepatitis, type A
We address the challenge of reconstructing historical counts from aggregated, possibly overlapping historical reports. For example, given the monthly and weekly sums, how can we find the daily counts of people infected with flu? We propose an approach, called ARES (Automatic REStoration), that performs automatic data reconstruction in two phases: (1) first, it estimates the sequence of historical counts utilizing domain knowledge, such as smoothness and periodicity of historical events; (2) then, it uses the estimated sequence to learn notable patterns in the target sequence to refine the reconstructed time series. In order to derive such patterns, ARES uses an annihilating filter technique. The idea is to learn a linear shift-invariant operator whose response to the desired sequence is (approximately) zero-yielding a set of null-space equations that the desired signal should satisfy, without the need for the accompanying data. The reconstruction accuracy can be further improved by applying the second phase iteratively. We evaluate ARES on the real epidemiological data from the Tycho project and demonstrate that ARES recovers historical data from aggregated reports with high accuracy. In particular, it considerably outperforms top competitors, including least squares approximation and the more advanced H-FUSE method (42% and 34% improvement based on average RMSE, respectively).
Read the full article