Loading...

Details

Description

Led by Faisal M. Almutairi of the University of Minnesota, researchers created a novel data disaggregation method, called HOMERUN, that exploits an alternative representation of a sequence to reconstruct a higher resolution event sequence from a mixture of lower resolution samples. The method was tested using Project Tycho data.

Authors

Faisal M. Almutairi
Fan Yang
Hyun Ah Song
Christos Faloutsos
Nicholas Sidiropoulos
Vladimir Zadorozhny

Related Project Tycho Datasets

United States of America - Acute nonparalytic poliomyelitis
United States of America - Acute paralytic poliomyelitis United States of America - Acute poliomyelitis United States of America - Acute type A viral hepatitis
United States of America - Congenital rubella syndrome
United States of America - Measles
United States of America - Mumps United States of America - Pertussis United States of America - Rubella United States of America - Smallpox United States of America - Smallpox without rash
United States of America - Viral hepatitis, type A

Abstract

Recovering a time sequence of events from multiple aggregated and possibly overlapping reports is a major challenge in historical data fusion. The goal is to reconstruct a higher resolution event sequence from a mixture of lower resolution samples as accurately as possible. For example, we may aim to disaggregate overlapping monthly counts of people infected with measles into weekly counts. In this paper, we propose a novel data disaggregation method, called HomeRun, that exploits an alternative representation of the sequence and finds the spectrum of the target sequence. More specifically, we formulate the problem as so-called basis pursuit using the Discrete Cosine Transform (DCT) as a sparsifying dictionary and impose non-negativity and smoothness constraints. HomeRun utilizes the energy compaction feature of the DCT by finding the sparsest spectral representation of the target sequence that contains the largest (most important) coefficients. We leverage the Alternating Direction Method of Multipliers to solve the resulting optimization problem with scalable and memory efficient steps. Experiments using real epidemiological data show that our method considerably outperforms the state-of-the-art techniques, especially when the DCT of the sequence has a high degree of energy compaction.

Read the full article