Imputation of Missing Data in Multidimensional Retail Data Sets (20110121, Dr. Arindam Banerjee)
Multiple Predictions of missing data
A new system provides multiple imputation of missing data elements in retail data sets used for modeling and decision support applications. The method is based on the multi-dimensional, tensor structure of the data sets and offers a fast, scalable scheme suitable for large data sets. The computer-implemented algorithm first identifies a retail data set (a measured quantity of products “p” over multiple time periods “t” and multiple retail chains and stores “s”). It then encodes dummy variables corresponding to the missing data for relevant (p,s,t) combinations and provides multiple imputations of the missing data to obtain a plurality of “complete” data sets for demand modeling.
Based on multi-dimensional, tensor structure of data sets
Current methods for imputing missing data elements in retail data sets face some limitations. First, missing data elements are typically replaced by certain point estimates for their relevant imputed values. Therefore, the resulting data set does not capture the natural variability it would have if missing data had been actually recorded instead of being imputed. This omission leads to statistical bias in subsequent analyses. Second, current procedures typically ignore data correlations along the various data set dimensions—or may only consider these correlations along a single dimension. By simultaneously considering multi-dimensional dependencies and correlations in the retail data set, much greater accuracy and statistical reliability can be obtained. This new approach applies the multi-dimensional, tensor structure of data sets to provide multiple imputation of missing data elements in retail data sets.
Phase of Development
- Prototype developed.
- Fast, scalable scheme suitable for large data sets
- Multiple imputation of missing data elements in retail data sets
- Based on the multi-dimensional, tensor representation of data sets
- Obtains a plurality of “complete” data sets for demand modeling
- Predicting missing data elements or values in retail data sets
- Multiple imputation of missing data elements
|Interested in Licensing?|
|The University relies on industry partners to integrate software for commercial purposes. The license is available for this technology and would be for the integration, sale, manufacture or use of products claimed by the issued patent. Please contact us to share your business needs and technical interest in this technology and if you are interested in licensing the technology for further research and development.|