Office for Technology Commercialization

Algorithm Detects Data Anomalies for Use in Data Mining, Fraud Detection and Threats

Technology #z04105

Questions about this technology? Ask a Technology Manager

Download Printable PDF

Image Gallery
Anomalous data detection and data miningDiscrete Data as ContinuousDetect Fraud
Vipin Kumar, PhD Department Head, Professor, Computer Science & Engineering, College of Science and Engineering
Dr. Kumar's current research interests include data mining, high-performance computing, and their applications in Climate/Ecosystems and Biomedical domains.
External Link (
Dr. Kumar's Research
External Link (
Managed By
Andrew Morrow
Technology Licensing Officer
Patent Protection
US Patent 7,668,843
Files and Attachments
Non-confidential Summary [PDF]

Anomaly Detection Algorithm for Fraud Detection and Data Mining

Algorithms for comparing the similarity or difference of data sets of the same function and discovering abnormal data sets or ‘outliers’ are of critical importance in detection of threats, credit card fraud, network attacks, and money laundering. The algorithm can also be used in data mining software to analyze data. Current algorithms are time consuming and computationally intense, but the anomaly detection algorithm is much quicker and takes up less computational resources.

MN-IP Try and Buy
  • Trial period of 6 to 12 months. $5000/6 months.
  • Fee waived for MN-based companies or if sponsoring $50,000+ in research.
  • Exclusive license for a $30,000 conversion payment.
  • No patent costs.
  • Royalty rate of 2% (1% for MN company) after first $1 million in product sales.

** View the Term Sheet **
** Contact Andrew Morrow for more information.

Discrete Data as Continuous Data

The key difference with the anomaly detection algorithm is that it can be used to take categorical data, which is separated into discrete categories, and operate as though it were continuous. The algorithm determines the ‘distance’ between two data records using a similarity matrix and flags the data as an outlier if a predetermined condition is not met. The algorithm can use multiple data categories and test data independently, flagging the data set if the condition is not met on any one of the categories.


  • Determine the similarity or distance between two data records using a similarity matrix
  • Treats discrete categorical data as though it were continuous
  • Quicker and less computationally intense than current algorithms
  • Potential uses include data mining, fraud detection and network threat detection