Office for Technology Commercialization
http://www.research.umn.edu/techcomm
612-624-0550

Imputation of Missing Data in Multidimensional Retail Data Sets

Technology #20110121

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Image Gallery
Imputation of missing data elements in retail data sets
Categories
Researchers
Arindam Banerjee
Professor, Computer Science and Engineering
External Link (www-users.cs.umn.edu)
Managed By
Andrew Morrow
Technology Licensing Officer
Patent Protection
US Patent US8818919B2
Publications
Probabilistic Tensor Factorization for Tensor Completion
Technical Report, TR 11-026


Multiple Predictions of missing data

A new system provides multiple imputation of missing data elements in retail data sets used for modeling and decision support applications. The method is based on the multi-dimensional, tensor structure of the data sets and offers a fast, scalable scheme suitable for large data sets. The computer-implemented algorithm first identifies a retail data set (a measured quantity of products “p” over multiple time periods “t” and multiple retail chains and stores “s”). It then encodes dummy variables corresponding to the missing data for relevant (p,s,t) combinations and provides multiple imputations of the missing data to obtain a plurality of “complete” data sets for demand modeling.

Based on multi-dimensional, tensor structure of data sets

Current methods for imputing missing data elements in retail data sets face some limitations. First, missing data elements are typically replaced by certain point estimates for their relevant imputed values. Therefore, the resulting data set does not capture the natural variability it would have if missing data had been actually recorded instead of being imputed. This omission leads to statistical bias in subsequent analyses. Second, current procedures typically ignore data correlations along the various data set dimensions—or may only consider these correlations along a single dimension. By simultaneously considering multi-dimensional dependencies and correlations in the retail data set, much greater accuracy and statistical reliability can be obtained. This new approach applies the multi-dimensional, tensor structure of data sets to provide multiple imputation of missing data elements in retail data sets.

Phase of Development

  • Prototype developed.

Benefits

  • Fast, scalable scheme suitable for large data sets

Features

  • Multiple imputation of missing data elements in retail data sets
  • Based on the multi-dimensional, tensor representation of data sets
  • Obtains a plurality of “complete” data sets for demand modeling

Applications

  • Predicting missing data elements or values in retail data sets
  • Multiple imputation of missing data elements


Interested in Licensing?
The University relies on industry partners to integrate software for commercial purposes. The license is available for this technology and would be for the integration, sale, manufacture or use of products claimed by the issued patent. Please contact Andrew Morrow to share your business needs and technical interest in this technology and if you are interested in licensing the technology for further research and development.