Imputation of Missing Data in Multidimensional Retail Data Sets (20110121, Dr. Arindam Banerjee)

Technology No. 20110121
IP Status: Issued US Patent; Application #: 13/204,237

Multiple Predictions of missing data

A new system provides multiple imputation of missing data elements in retail data sets used for modeling and decision support applications. The method is based on the multi-dimensional, tensor structure of the data sets and offers a fast, scalable scheme suitable for large data sets. The computer-implemented algorithm first identifies a retail data set (a measured quantity of products “p” over multiple time periods “t” and multiple retail chains and stores “s”). It then encodes dummy variables corresponding to the missing data for relevant (p,s,t) combinations and provides multiple imputations of the missing data to obtain a plurality of “complete” data sets for demand modeling.

Based on multi-dimensional, tensor structure of data sets

Current methods for imputing missing data elements in retail data sets face some limitations. First, missing data elements are typically replaced by certain point estimates for their relevant imputed values. Therefore, the resulting data set does not capture the natural variability it would have if missing data had been actually recorded instead of being imputed. This omission leads to statistical bias in subsequent analyses. Second, current procedures typically ignore data correlations along the various data set dimensions—or may only consider these correlations along a single dimension. By simultaneously considering multi-dimensional dependencies and correlations in the retail data set, much greater accuracy and statistical reliability can be obtained. This new approach applies the multi-dimensional, tensor structure of data sets to provide multiple imputation of missing data elements in retail data sets.

Phase of Development

  • Prototype developed.

Benefits

  • Fast, scalable scheme suitable for large data sets

Features

  • Multiple imputation of missing data elements in retail data sets
  • Based on the multi-dimensional, tensor representation of data sets
  • Obtains a plurality of “complete” data sets for demand modeling

Applications

  • Predicting missing data elements or values in retail data sets
  • Multiple imputation of missing data elements

Researchers
Arindam Banerjee, PhD
Professor, Computer Science and Engineering
External Link (experts.umn.edu)

Publications
Probabilistic Tensor Factorization for Tensor Completion
Technical Report, TR 11-026

Interested in Licensing?
The University relies on industry partners to integrate software for commercial purposes. The license is available for this technology and would be for the integration, sale, manufacture or use of products claimed by the issued patent. Please contact us to share your business needs and technical interest in this technology and if you are interested in licensing the technology for further research and development.
  • swap_vertical_circlecloud_downloadSupporting documents (0)
Questions about this technology?