New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associative Learning Algorithms #2662
Comments
I'm not sure item set mining is in the scope of sklearn. I only know the apriori algorithm but I know there are more advanced ones. I guess one could fit them in the API using sparse indicator matrices but somehow they seems very disjoint from the rest of sklearn. |
They can be used for as a precursor for the CBA algorithm, a decision tree algorithm for categorical data |
There are no decision trees (or any other algorithm) for categorical data without a one-hot-transform in sklearn. |
I think frequent item mining should be considered OT. None of the core developers works in that area, so any submitted code is likely to become orphaned. We've been trying to reduce the scope of the library for this very reason. |
Also, I believe that the kind of code patterns will be very different Not saying that it is not interesting, just saying the tool should be a |
Hi, I have some knowledge of Apriori and FP growth algorithm. I'd like to work on this issue. Is there anyone else already working on it, and if so I'd like to help with that too. |
Closing this issue. I think association learning should be prototyped in a separate package; if it turns out that the code and interfaces are similar enough to ours, we can consider the code for merging into scikit-learn. |
Sad decision! |
Very reasonable decision :) |
👍 for focus |
@joernhees could you explain how this formulation of unsupervised learning even fits into the scikit-learn API? If not easily, then it probably belongs in scope of a different project that can establish its own API. I think @larsmans made that quite clear above, and it doesn't deserve a snide response. |
sorry if this came across as snide, that wasn't my intention. I originally arrived here searching for association rule learning algorithms and just expected to find them in sklearn (as it's a pretty awesome collection of machine learning algorithms and usually i find most things i need in it (big thank you for that)). After reading this thread i was both: pleased and disappointed, and wanted to voice both:
You're right that association rule mining doesn't fully fit into the current API. Conceptually i see it somewhere in between dimensionality reduction techniques and hierarchical clustering. API wise it's probably closest to hierarchical clustering. As two lines were probably too short to express that in a friendly way, please accept my apologies. |
no problem. There are definitely Python implementations of apriori. On 24 September 2014 07:52, Jörn Hees notifications@github.com wrote:
|
I think this would be worthwhile, this article: Comparing Association Rules and Decision Trees This blog post includes Python code for A-Priori, it might be interesting to have a go at implementing these algorithms sometime. Is there any work on a separate prototyping package? |
None so far. Maybe you can try to gather support for this on the mailing list? |
I am, for one, disappointed that these algorithms are not implemented in sklearn. My advisor is Jiawei Han, the author of FP-growth and PrefixSpan, and the number of citations for both of those papers ("Mining frequent patterns without candidate generation" and "Mining sequential patterns by pattern-growth") is proof that both of those algorithms have a place in sklearn. |
Just because scikit-learn has a popularity criterion for included Feel free to be disappointed, but I strongly doubt that ARL techniques will On 25 March 2015 at 09:11, Henry notifications@github.com wrote:
|
Association learning algorithms are simply too far from classification and regression-like problems. Although we can consider Frequent Itemset/ pattern mining algorithm instead as a feature generation algorithm like countvectorizer and tfidfvectorizer. Those frequent patterns might be used in any classifier algorithm as input features, and will be much more intuitive and somewhat different than applying information gain based decision tree learning |
That's an option. Kudo and Matsumoto show how to sample a subset of the polykernel with PrefixSpan. |
I can lookup and check scikit-learn documentation, but I will ask you directly, Is this option (Kudo and Matsumoto) available in scikit-learn. |
No. I'm just saying it could be. |
+1 for Apiori Alogorithm |
Note that there are ML algorithms which depend up frequent item lists as input. For example, see Cynthia Rudin's Bayesian Rule Lists (c.f., http://www.stat.washington.edu/research/reports/2012/tr609%20-%20old.pdf). Consider a data set with a response variable to be predicted for which all the features are binary indicators (perhaps as a result of one-hot-encoding). We can consider a training set row to be a 'basket' and the presence of a feature for that training set row to be an 'item' within the basket. Thus, fairly generic data sets could be operated upon by apriori, FP-growth, and other frequent itemset mining techniques. In the Bayesian Rule List algorithm, the frequent itemsets are evaluated and eventually an if-then-else structure is created from them. See the referenced paper for more details. The point is that having frequent itemset mining approaches available could support classifiers and regressors --- already within the scope of sklearn --- not just market basket analysis. |
That's motivation for such algorithms to be available in scipy, perhaps. Of On 19 April 2016 at 01:14, rmenich notifications@github.com wrote:
|
I don't know how much of sklearn has changed since this conversation started but there's an entire "cluster" package that's not regression/classification either. I think a good implementation of the latest algorithms for association rules and frequent itemsets would be welcome by many in sklearn. |
Clustering is much like classification, but unsupervised, and has long been part of scikit-learn. Association rule mining remains outside the primary tasks scikit-learn focuses on, and does not neatly fit its API, but might be relevant in the context of an association-based classifier. "latest algorithms" isn't what scikit-learn is about. See our FAQ. It would be nice not to have to repeat myself. |
@actsasgeek if you want to implement association rule mining in a scikit-learn compatible way, we'd be happy to include it into scikit-learn-contrib: https://github.com/scikit-learn-contrib/scikit-learn-contrib/blob/master/README.md |
I hope my repetitive question does not bother you, as I see a feeling of opposite toward adding association rule mining in such a great lib like scikit learn. I just want to get updated is there any frequent item set implemented in scikit learn after three years of the creation of this thred?. |
Association rule mining is outside of the scope of machine learning, and
certainly out of the scope of scikit-learn.
Classification based on association rules is the only context in which we
would consider it, and then it would still need to be a hard sell.
…On 17 August 2017 at 15:59, saria85 ***@***.***> wrote:
I hope my repetitive question does not bother you, as I see a feeling of
opposite toward adding association rule mining in such a great lib like
scikit learn. I just want to get updated is there any frequent item set
implemented in scikit learn after three years of the creation of this
thred?.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2662 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67fCICLgV-3OpYiV3ErpJSW0mobgks5sY9a4gaJpZM4BT5PS>
.
|
For those who are interested, A library called |
yes everybody needs it, so it will be great to have in scikit-learn. |
That is pattern mining not ML
|
Hi everyone, I am a research engineer, working on the implementation of a standard pattern mining library in Python : scikit-mine, which is being designed for compatibility with scikit-learn If you allow me I would like to give my opinion and thoughts concerning interactions between pattern mining and Machine Learning
NB : I know it can seem frustrating sometimes, I was frustrated myself when I wanted to use Pattern Mining algorithms and couldn't find any tool that suited me. Maintainers have to make strong choices, including saying NO to people in need. Hopefully the community will converge and everyone will be satisfied |
Remi:
Association rule mining is a foundational component in some interpretable
ML approaches (c.f., Cynthia Rudin's work on Bayesian rule lists
<https://arxiv.org/abs/1602.08610> and her other decision list papers
<https://users.cs.duke.edu/~cynthia/papers.html>). So I'm not sure about
the statement, " Pattern Mining IS NOT Machine Learning "; why the
semantic distinction?
Ron Menich
…On Mon, Mar 8, 2021 at 5:13 AM Rémi Adon ***@***.***> wrote:
Hi everyone,
I am a research engineer, working on the implementation of a standard
pattern mining library in Python : scikit-mine
<https://github.com/scikit-mine/scikit-mine>, which is being designed for
compatibility with scikit-learn
If you allow me I would like to give my opinion and thoughts concerning
interactions between pattern mining and Machine Learning
1. Pattern Mining IS NOT Machine Learning. It is a different area of
research, and proper inclusion of this family of algorithms into the Python
ecosystem is a topic in itself. Echoing @amueller
<https://github.com/amueller> @larsmans <https://github.com/larsmans>
@GaelVaroquaux <https://github.com/GaelVaroquaux> and @ogrisel
<https://github.com/ogrisel> I also believe this is out of the scope
of sklearn (hence the need for other libraries to handle it)
2. Echoing @ajaybhat <https://github.com/ajaybhat> @hlin117
<https://github.com/hlin117> @jnothman <https://github.com/jnothman>
and @rmenich <https://github.com/rmenich> : Apriori and FPGrowth
algorithms are standard frequent itemset mining algorithms that many people
know, but IMO they have been outperformed by other methods in the last
decade, both in computational runtimes and the quality of the discovered
patterns. SLIM
<https://scikit-mine.github.io/scikit-mine/reference/itemsets.html#slim>
is of this kind
3. Echoing @Sandy4321 <https://github.com/Sandy4321> and @rmenich
<https://github.com/rmenich> interactions between pattern mining and
other libraries in the Python ecosystem is definitely something to be
considered. I am working on this :)
4. Inclusion in scikit-learn-contrib is also something to hope for, at
least if people express the need for such algorithms
NB : I know it can seem frustrating sometimes, I was frustrated myself
when I wanted to use Pattern Mining algorithms and couldn't find any tool
that suited me. Maintainers have to make strong choices, including saying
NO to people in need. Hopefully the community will converge and everyone
will be satisfied
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2662 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AENM6PNO3YUSHZ4MWGTNBD3TCSPMBANCNFSM4AKPSPJA>
.
|
@rmenich the main differences. Again, this is only my opinion
Concerning Cynthia Rudin's work, If I am correct the model is trained from pre-mined rules. In other words one can use patterns discovered by a PM algo as knowledge to build a Machine Learning model, but I would not say PM is ML ... Also good to note : the algorithms mentioned in this thread deal with itemset mining, which is actually a subpart of what the Pattern Mining literature offers. Their exists a plethora of other types of patterns
|
https://github.com/remiadon it is big mistake to say that Dr.Rudin approach is not machine learning try to read her papers again c.f., Cynthia Rudin's work on Bayesian rule lists |
This comment was marked as abuse.
This comment was marked as abuse.
This conversation is no more constructive and isn't following our CoC (https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md). I'm locking the conversation. |
I noticed that there were no Associative Learning Algorithms such as:
Apiori Alogorithm
Equivalence Classification Algorithm (Eclat)
PrefixSpan
FP-Growth
All of them are used to detect combination of patterns in a dataset.
Some of them are kind of difficult to implement I would say about 200 lines of code?
The text was updated successfully, but these errors were encountered: