Skip to content

mhahsler/arulespy

Repository files navigation

Python interface to the R package arules

PyPI package version number Actions Status License

arulespy is a Python module available from PyPI. The arules module in arulespy provides an easy to install Python interface to the R package arules for association rule mining built with rpy2.

The R arules package implements a comprehensive infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat, and optimized C/C++ code for mining and manipulating association rules using sparse matrix representation.

The arulesViz module provides plot() for visualizing association rules using the R package arulesViz.

arulespy provides Python classes for

  • Transactions: Convert pandas dataframes into transaction data
  • Rules: Association rules
  • Itemsets: Itemsets
  • ItemMatrix: sparse matrix representation of sets of items.

with Phyton-style slicing and len().

Most arules functions are interfaced as methods for the four classes with conversion from the R data structures to Python. Documentation is avaialible in Python via help(). Detailed online documentation for the R package is available here.

Low-level arules functions can also be directly used in the form R.<arules R function>(). The result will be a rpy2 data type. Transactions, itemsets and rules can manually be converted to Python classes using the helper function a2p().

To cite the Python module ‘arulespy’ in publications use:

Michael Hahsler. ARULESPY: Exploring association rules and frequent itemsets in Python. arXiv:2305.15263 [cs.DB], May 2023. DOI: 10.48550/arXiv.2305.15263

Installation

arulespy is based on the python package rpy2 which requires an R installation. Here are the installation steps:

  1. Install the latest version of R (>4.0) from https://www.r-project.org/

  2. Install required libraries on your OS:

    • libcurl is needed by R package curl.
      • Ubuntu: sudo apt-get install libcurl4-openssl-dev
      • MacOS: brew install curl
      • Windows: no installation necessary, but read the Windows section below.
  3. Install arulespy which will automatically install rpy2 and pandas.

    pip install arulespy
  4. Optional: Set the environment variable R_LIBS_USER to decide where R packages are stored (see libPaths() for details). If not set then R will determine a suitable location.

  5. Optional: arulespy will install the needed R packages when it is imported for the first time. This may take a while. R packages can also be preinstalled. Start R and run install.packages(c("arules", "arulesViz"))

The most likely issue is that rpy2 does not find R or R's shared library. This will lead the python kernel to die or exit without explanation when the package arulespy is imported. Check python -m rpy2.situation to see if R and R's libraries are found. If you use iPython notebooks then you can include the following code block in your notebook to check:

from rpy2 import situation

for row in situation.iter_info():
    print(row)

The output should include a line saying Loading R library from rpy2: OK.

Note for Windows users

rpy2 currently does not fully support Windows and the installation is somewhat tricky. I was able to use it with the following setup:

  • Windows 10
  • rpy2 version 3.5.14
  • Python version 3.10.12
  • R version 4.3.1

I use the following code to set the needed environment variables needed by Windows before I import from arulespy

from rpy2 import situation
import os

r_home = situation.r_home_from_registry()
r_bin = r_home + '\\bin\\x64\\'
os.environ['R_HOME'] = r_home
os.environ['PATH'] =  r_bin + ";" + os.environ['PATH']
os.add_dll_directory(r_bin)

for row in situation.iter_info():
    print(row)

The output should include a line saying Loading R library from rpy2: OK

More information on installing rpy2 can be found here.

Example

from arulespy.arules import Transactions, apriori, parameters
import pandas as pd

# define the data as a pandas dataframe
df = pd.DataFrame (
    [
        [True,True, True],
        [True, False,False],
        [True, True, True],
        [True, False, False],
        [True, True, True]
    ],
    columns=list ('ABC')) 

# convert dataframe to transactions
trans = transactions.from_df(df)

# mine association rules
rules = apriori(trans,
                    parameter = parameters({"supp": 0.1, "conf": 0.8}), 
                    control = parameters({"verbose": False}))  

# display the rules as a pandas dataframe
rules.as_df()
LHS RHS support confidence coverage lift count
1 {} {A} 0.8 0.8 1 1 8
2 {} {C} 0.8 0.8 1 1 8
3 {B} {A} 0.4 0.8 0.5 1 4
4 {B} {C} 0.5 1 0.5 1.25 5
5 {A,B} {C} 0.4 1 0.4 1.25 4
6 {B,C} {A} 0.4 0.8 0.5 1 4

Complete examples:

References