Loading...

Details

  • Journal: Proceedings of the 2017 SIAM International Conference on Data Mining
  • Date: Jan. 1, 2017
  • DOI: 10.1137/1.9781611974973.88
  • Category: Scientific Research

Description

Zongge Liu, Hyun Ah Song, Vladimir Zadorozhny, Christos Faloutsos, and Nicholas Sidiropoulos created a new method to recover time sequence and counts data from aggregated historical records. H-FUSE uses domain knowledge efficiently and effectively in reconstructing historical counts when compared to real measles and smallpox data from Project Tycho.

Authors

Zongge Liu

Hyun Ah Song

Vladimir Zadorozhny

Christos Faloutsos

Nicholas Sidiropoulos

Related Project Tycho Datasets

United States of America - Measles

United States of America - Smallpox

Abstract

In this paper, we address the challenge of recovering a time sequence of counts from aggregated historical data. For example, given a mixture of the monthly and weekly sums, how can we find the daily counts of people infected with flu? In general, what is the best way to recover historical counts from aggregated, possibly overlapping historical reports, in the presence of missing values? Equally importantly, how much should we trust this reconstruction?

We propose H-FUSE, a novel method that solves above problems by allowing injection of domain knowledge in a principled way, and turning the task into a well-defined optimization problem. H-FUSE has the following desirable properties: (a) Effectiveness, recovering historical data from aggregated reports with high accuracy; (b) Self-awareness, providing an assessment of when the recovery is not reliable; (c) Scalability, computationally linear on the size of the input data.

Experiments on the real data (epidemiology counts from the Tycho project [13]) demonstrates that H-FUSE reconstructs the original data 30 − 81% better than the least squares method.

Read the full article