Apriori Algorithm
Introduction
Apriori is a classic algorithm for association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database.
The Apriori algorithm aims to find the rules which satisfy both a minimum support threshold and a minimum confidence threshold.
Example
Suppose you have records of large number of transactions at a shopping center as follows:
Transactions

Items bought

T1

Item1, item2, item3

T2

Item1, item2

T3

Item2, item5

T4

Item1, item2, item5

For example in the above table you can see Item1 and item2 are bought together frequently.
What is the use of learning association rules?
 Shopping centers use association rules to place the items next to each other so that users buy more items. If you are familiar with data mining you would know about the famous beerdiapersWalMart story. Basically WalMart studied their data and found that on Friday afternoon young American males who buy diapers also tend to buy beer. So WalMart placed beer next to diapers and the beersales went up. This is famous because no one would have predicted such a result and that’s the power of data mining. You can Google for this if you are interested in further details ·
 Also if you are familiar with Amazon, they use association mining to recommend you the items based on the current item you are browsing/buying.
 Another application is the Google autocomplete, where after you type in a word it searches frequently associated words that user type after that particular word.
So as I said Apriori is the classic and probably the most basic algorithm to do it. Now if you search online you can easily find the pseudocode and mathematical equations and stuff. I would like to make it more intuitive and easy, if I can.
Let’s start with a nonsimple example,
Transaction ID

Items Bought

T1

{Mango, Onion, Nintendo, Keychain, Eggs, Yoyo}

T2

{Doll, Onion, Nintendo, Keychain, Eggs, Yoyo}

T3

{Mango, Apple, Keychain, Eggs}

T4

{Mango, Umbrella, Corn, Keychain, Yoyo}

T5

{Corn, Onion, Onion, Keychain, Icecream, Eggs}

For simplicity
M = Mango
O = Onion
And so on……
So the table becomes
Original table:
Transaction ID

Items Bought

T1

{M, O, N, K, E, Y }

T2

{D, O, N, K, E, Y }

T3

{M, A, K, E}

T4

{M, U, C, K, Y }

T5

{C, O, O, K, I, E}

Step 1: Count the number of transactions in which each item occurs, Note ‘O=Onion’ is bought 4 times in total, but, it occurs in just 3 transactions.
Item

No of transactions

M

3

O

3

N

2

K

5

E

4

Y

3

D

1

A

1

U

1

C

2

I

1

Step 2: Now remember we said the item is said frequently bought if it is bought at least 3 times. So in this step we remove all the items that are bought less than 3 times from the above table and we are left with
Item

Number of transactions

M

3

O

3

K

5

E

4

Y

3

This is the single items that are bought frequently. Now let’s say we want to find a pair of items that are bought frequently. We continue from the above table (Table in step 2)
Step 3: We start making pairs from the first item, like MO,MK,ME,MY and then we start with the second item like OK,OE,OY. We did not do OM because we already did MO when we were making pairs with M and buying a Mango and Onion together is same as buying Onion and Mango together. After making all the pairs we get,
Item pairs

MO

MK

ME

MY

OK

OE

OY

KE

KY

EY

Step 4: Now we count how many times each pair is bought together. For example M and O is just bought together in {M,O,N,K,E,Y}. While M and K is bought together 3 times in {M,O,N,K,E,Y}, {M,A,K,E} AND {M,U,C, K, Y}. After doing that for all the pairs we get:
Item Pairs

Number of transactions

MO

1

MK

3

ME

2

MY

2

OK

3

OE

3

OY

2

KE

4

KY

3

EY

2

Step 5: Golden rule to the rescue. Remove all the item pairs with number of transactions less than three and we are left with
Item Pairs

Number of transactions

MK

3

OK

3

OE

3

KE

4

KY

3

These are the pairs of items frequently bought together.
Now let’s say we want to find a set of three items that are brought together.
We use the above table (table in step 5) and make a set of 3 items.
Step 6: To make the set of three items we need one more rule (it’s termed as selfjoin),
It simply means, from the Item pairs in the above table, we find two pairs with the same first Alphabet, so we get
 OK and OE, this gives OKE
 KE and KY, this gives KEY
Then we find how many times O,K,E are bought together in the original table and same for K,E,Y and we get the following table
Item Set

Number of transactions

OKE

3

KEY

2

While we are on this, suppose you have sets of 3 items say ABC, ABD, ACD, ACE, BCD and you want to generate item sets of 4 items you look for two sets having the same first two alphabets.
 ABC and ABD > ABCD
 ACD and ACE > ACDE
And so on … In general you have to look for sets having just the last alphabet/item different.
Step 7: So we again apply the golden rule, that is, the item set must be bought together at least 3 times which leaves us with just OKE, Since KEY are bought together just two times.
Thus the set of three items that are bought together most frequently are O,K,E.
References & Resources
 N/A
Latest Post
 Dependency injection
 Directives and Pipes
 Data binding
 HTTP Get vs. Post
 Node.js is everywhere
 MongoDB root user
 Combine JavaScript and CSS
 Inline Small JavaScript and CSS
 Minify JavaScript and CSS
 Defer Parsing of JavaScript
 Prefer Async Script Loading
 Components, Bootstrap and DOM
 What is HEAD in git?
 Show the changes in Git.
 What is AngularJS 2?
 Confidence Interval for a Population Mean
 Accuracy vs. Precision
 Sampling Distribution
 Working with the Normal Distribution
 Standardized score  Z score
 Percentile
 Evaluating the Normal Distribution
 What is Nodejs? Advantages and disadvantage?
 How do I debug Nodejs applications?
 Sync directory search using fs.readdirSync