Using frequent itemset mining to build association rules? Using frequent itemset mining to build association rules? python python

Using frequent itemset mining to build association rules?


Some theoretical facts about association rules:

  • Association rules is a type of undirected data mining that finds patterns in the data where the target is not specified beforehand. Whether the patterns make sense is left to human interpretation.
  • The goal of association rules is to detect relationships or association between specific values of categorical variables in large sets.
  • And is rules can intrepreted as "70% of the the customers who buy wine and cheese also buy grapes".

To find association rules, you can use apriori algorithm. There already exists many python implementation, although most of them are not efficient for practical usage:

or use Orange data mining library, which has a good library for association rules.

Usage example:

'''save first example as item.basket with formatA, B, C, EA, CA, C, D, EA, C, Eopen ipython same directory as saved file or use os module>>> import os>>> os.chdir("c:/orange")'''import orangeitems = orange.ExampleTable("item")#play with support argument to filter out rulesrules = orange.AssociationRulesSparseInducer(items, support = 0.1) for r in rules:    print "%5.3f %5.3f %s" % (r.support, r.confidence, r)

To learn more about association rules/frequent item mining, then my selection of books are:

There is no short way.


It seems like a neat way to handle this type of problems is using a Bayesian network. In particular as a Bayesian network structure learning problem. Once you have that you will be able to efficiently answer questions like p(A=1|B=0 and C=1) and so on.


If you have quantities for each items, then you could consider "high utility itemset mining". It is the problem of itemset mining but adapted for the case where items can have quantities in each transaction and also each item can have a weight.

If you just use the basic Apriori, then you would loose the information about quantities.