Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference beetween fpgrowth and fpmax not documented #1030

Open
emilianomm opened this issue Apr 24, 2023 · 2 comments
Open

Difference beetween fpgrowth and fpmax not documented #1030

emilianomm opened this issue Apr 24, 2023 · 2 comments

Comments

@emilianomm
Copy link

Describe the documentation issue

Hi. I´m using the library to find association rules in a dataset. In order to do that, I´m passing the output of the three algorithms to the association_rules() function. The documentation says these are equivalent in terms of parameters and output, but I´m getting on the following error only with the output from fpmax() :

KeyError: 'frozenset({120})You are likely getting this error because the DataFrame is missing  antecedent and/or consequent  information. You can try using the  `support_only=True` option'

A minimal code example of my implementation would be like

from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import fpmax

### Assume baskets_matrix is an ad_hoc pandas df.

### This works OK
freq_items_1 = fpgrowth(baskets_matrix, min_support=0.1)
freq_items_2 = fpmax(baskets_matrix, min_support=0.1)

### This also works OK
AR_1 =association_rules(freq_items_1, metric="confidence", min_threshold=0.5)

### This raises the error
AR_2 =association_rules(freq_items_2, metric="confidence", min_threshold=0.5)

Since all other factors are the same, I have to assume that there is a difference in the output of fpgrowth and fpmax which is not clearly documented.

I also noticed that the documentation refers to the association_rules() function as generate_rules() which leads to further confussion.

Suggest a potential improvement or addition

I would like to ask if it´s possible to clarify if the output from the different algoriths are indeed different or there is another issue here.

Also, I think it will be useful for anyone using the library to have this remarks added on the documentatinon.

Thanks in advance!

@Jordenjj
Copy link

Jordenjj commented May 2, 2023

As per the documentation "FP-Max is a variant of FP-Growth, which focuses on obtaining maximal itemsets. An itemset X is said to maximal if X is frequent and there exists no frequent super-pattern containing X. In other words, a frequent pattern X cannot be sub-pattern of larger frequent pattern to qualify for the definition maximal itemset."
That being said, I am getting the error too when using FP-Max.

@josejub
Copy link

josejub commented May 12, 2023

Same here, when mining frequent itemsets with fp-growth it works fine, but when using fp-max I get the same error. a example of my code is:

Assume negated is a one-hot encoded dataframe

max = fpmax(negated, min_support=0.3, use_colnames=True, max_len=5)
max
rules = association_rules(max,metric="confidence", min_threshold=0.85) # Error appears here

Works well

max = fpgrowth(negated, min_support=0.3, use_colnames=True, max_len=5)
max
rules = association_rules(max,metric="confidence", min_threshold=0.85)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants