This is my favor to CS grad students studying datamining:). Possibly this will be an assignment for you.Below is a brief description of Apriori with Java source codes.
Apriori is a well known algorithm for association rule mining, mainly used for market basket like data analysis.For example you are given a list of transactions, you are to find the rules such as
"90% of customers buying coffee and sugar also buy cream".
For the market basket data analysis, one needs two steps:
first determine the frequent item sets then form the association rules from these sets.
While forming the frequent item sets the Apriori principle is simple: Any subset of a frequent itemset must be frequent. Before starting to give the algorithm steps, let me give 2 definitions, support and confidence.
Support of an itemset X isfraction of transactions in D that contain X
Confidence c of an association rule X ⇒ Y in D is Fraction of transactions which contain the itemset Y from the subset of transactions from D which contain the itemset X.
First determine the frequency of each single item.
Then generate candidate sets, for each k-item
frequent set,join with itself to generate the candidates.
Next, prune candidates that don't match the apriori principle stated above.
Calculate the frequencies of the pruned candidates and now you have the item sets of size k+1.
After you have formed the frequent item sets not you need to generate association rules.
For each frequent itemset X
For each subset A of X, form a rule A (X – A)
Delete those rules that do not have minimum confidence
Computation of the confidence of a rule A ==> (X – A)
Thats all!,for the source code
Apriori.java -- the appriori implementation
Kume.java -- helper class, implementation of Set which supports subsets(k) and minus() operations
Grigori Kozintsev - Gamlet AKA Hamlet [+extras] (1964)
11 months ago