Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization of the threshold parameter in hierarchical clustering #81

Open
CharlineJnnt opened this issue Mar 9, 2023 · 1 comment

Comments

@CharlineJnnt
Copy link

Hello @sidhomj,

I used the unsupervised partof DeepTCR to cluster TCR sequences, but when I allowed the method to determine the optimal threshold parameter with the following command line, I got this error:

DTCRU_test.Cluster(clustering_method="hierarchical", linkage_method="ward", criterion="distance", write_to_sheets=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/DeepTCR/DeepTCR.py", line 1054, in Cluster
    IDX = hierarchical_optimization(distances, features, method=linkage_method, criterion=criterion)
  File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/DeepTCR/functions/utils_u.py", line 52, in hierarchical_optimization
    sil.append(skmetrics.silhouette_score(features[sel, :], IDX[sel]))
  File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 118, in silhouette_score
    return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
  File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 229, in silhouette_samples
    check_number_of_labels(len(le.classes_), n_samples)
  File "/home/ubuntu/.conda/envs/DeepTCR_env/lib/python3.7/site-packages/sklearn/metrics/cluster/_unsupervised.py", line 35, in check_number_of_labels
    % n_labels
ValueError: Number of labels is 2876. Valid values are 2 to n_samples - 1 (inclusive)

To correct this, I tried to modifiy the function hierarchical_optimization in the utils_u.py script in DeepTCR/functions folder (l.44):

def hierarchical_optimization(distances,features,method,criterion):
    Z = linkage(squareform(distances), method=method)
    t_list = np.arange(1, 100, 1) #t_list = np.arange(0, 100, 1)
    sil = []
    for t in t_list:
        IDX = fcluster(Z, t, criterion=criterion)
        if len(np.unique(IDX[IDX >= 0])) == 1:
            sil.append(0.0)
            continue
        sel = IDX >= 0
        sil.append(skmetrics.silhouette_score(features[sel, :], IDX[sel]))

    IDX = fcluster(Z, t_list[np.argmax(sil)], criterion=criterion)
    return IDX

and it works !

@sidhomj
Copy link
Owner

sidhomj commented Mar 9, 2023

Thank you for contributing! I'll update the code!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants