How to implement the apriori algorithm in Python?

by jayson.maggio , in category: Python , a year ago

How to implement the apriori algorithm in Python?

Facebook Twitter LinkedIn Telegram Whatsapp

1 answer

Member

by tina , a year ago

@jayson.maggio The Apriori algorithm is a classic algorithm for extracting frequent itemsets in a transactional database and for constructing association rules from those itemsets. Here is an outline of how you could implement the Apriori algorithm in Python:

  1. First, you need to define a function that takes as input a list of transactions and a minimum support threshold, and returns a list of frequent itemsets that meet the minimum support threshold.
  2. Next, you need to implement the Apriori algorithm itself. This involves two steps: (a) generating candidate itemsets, and (b) pruning the list of candidates to obtain the frequent itemsets.
  3. To generate candidate itemsets, you can start with a list of single-item itemsets and then iteratively generate new itemsets by combining items from the current list of itemsets. You can use the combinations function from the itertools module to generate these combinations.
  4. To prune the list of candidates, you need to count the support for each candidate itemset in the transaction list and eliminate those candidates that do not meet the minimum support threshold.
  5. Finally, you can use the frequent itemsets to generate association rules by applying the following formula:


confidence(A -> C) = support(A U C) / support(A)


where A and C are itemsets and U is the union operator.


Here is some example code that puts these steps together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from itertools import combinations

def apriori(transactions, min_support):
  # Step 1: generate frequent itemsets
  items = list(set([item for transaction in transactions for item in transaction]))
  frequent_itemsets = []
  while True:
    # Step 2a: generate candidate itemsets
    candidates = list(combinations(items, len(itemsets)+1))
    # Step 2b: prune candidates
    frequent_candidates = []
    for candidate in candidates:
      support = sum([1 for transaction in transactions if set(candidate).issubset(set(transaction))])
      if support >= min_support:
        frequent_candidates.append(candidate)
    # Step 3: update frequent itemsets
    if not frequent_candidates:
      break
    frequent_itemsets.extend(frequent_candidates)
    items = [item for itemset in frequent_candidates for item in itemset]
  return frequent_itemsets

# Example usage
transactions = [  ['bread', 'milk'],
  ['bread', 'diaper', 'beer', 'eggs'],
  ['milk', 'diaper', 'beer', 'cola'],
  ['bread', 'milk', 'diaper', 'beer'],
  ['bread', 'milk', 'diaper', 'cola']
]
frequent_itemsets = apriori(transactions, min_support=2)
print(frequent_itemsets)

# Output: [('diaper', 'beer'), ('diaper', 'bread'), ('bread', 'milk')]