Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-1 dbscan category #199

Open
foongminwong opened this issue Dec 15, 2020 · 1 comment
Open

-1 dbscan category #199

foongminwong opened this issue Dec 15, 2020 · 1 comment

Comments

@foongminwong
Copy link

Hi, I was trying to run dbscan on some texts and create a scatterplot.

I wonder why my dbscan_labels has a -1 category (not sure what it means):

documents['dbscan_labels'] = (
    documents['tfidf']
    .pipe(hero.dbscan)
    .astype(str)
)

hero.scatterplot(df=documents, col='pca', color='dbscan_labels', hover_data=['ID', 'Title'], title=" DBScan Clustering (Test) - Texthero library")

image

I tried running using k-means previously and the clusters/scatter plot look good:

documents['tfidf'] = (
    documents['Text']
    .pipe(hero.clean)
    .pipe(hero.tfidf)
)

documents['kmeans_labels'] = (
    documents['tfidf']
    .pipe(hero.kmeans, n_clusters=13)
    .astype(str)
)

documents['pca'] = documents['tfidf'].pipe(hero.pca)

hero.scatterplot(df=documents, col='pca', color='kmeans_labels', hover_data=['ID', 'Title'], title="K-Means Clustering (Test) - Texthero library")

image

Thank you!

@foongminwong foongminwong changed the title One of the dbscan categories is -1 -1 dbscan category Dec 15, 2020
@jbesomi
Copy link
Owner

jbesomi commented Dec 15, 2020

Hi @foongminwong, thank you for reaching out!

DBSCAN classify points into different classes, one of which is "noise point" / outliers. -1 indicates that these points have been classified as such from your DB algorithm.

We will need to update the docstring of the texthero.representation.dbscan function and make it more explicit. Would you like to help us with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants