Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have any suggestions on setting the max number of iteration in training som? #66

Open
zhenzonglei opened this issue Apr 15, 2020 · 18 comments
Labels

Comments

@zhenzonglei
Copy link

Hi,
I just found there is a inconsistency statement for the verbose output of the train method.
Line 347 states that if verbose is true, the status of the training will be printed at each iteration. But in line 361, the status is only printed after all iterations. I guess the code between 361-363 should be indented.
Thanks

@zhenzonglei zhenzonglei changed the title verbose output for train methods verbose output for the train method Apr 15, 2020
@JustGlowing
Copy link
Owner

JustGlowing commented Apr 15, 2020 via email

@zhenzonglei
Copy link
Author

Got it! Thanks very much.
BTW,do you have any suggestions on setting the max number of iteration in training som? For example, if I have 10,000 samples, what is a reasonable number for the iteration?
Thanks again

@JustGlowing
Copy link
Owner

Hi again, the number of iterations required for convergence depends on many factors. The main ones are size of the som and shape of the data. The only way to know if you reached convergence is to look at the learning curve and check if it reached a plateau (see the Iris example).

If you have a som 100-by-100, start with 10000 iterations so that each sample is observed at least once and check the results. Increase the the number of iterations if you think that the error is on a downward trajectory.

@JustGlowing JustGlowing changed the title verbose output for the train method number of iterations Apr 16, 2020
@JustGlowing JustGlowing reopened this Apr 16, 2020
@JustGlowing JustGlowing changed the title number of iterations Do you have any suggestions on setting the max number of iteration in training som? May 27, 2020
@V-for-Vaggelis
Copy link
Contributor

Let me extend this question a little further with some emphasis on the topographic error. I have a dataset with around 360 rows, and small correlations between features. After plotting the learning curves like in the "Iris" example I noticed the quantization error indeed shows a decreasing behavior that reaches a plateau and the topographic error shows a fluctuating behavior that tends to become stable as well. The problem is it fluctuates around 0.8 which is too large. Since the t.e is an indication of how representative the SOM is I believe it is an important issue.

The question is whether there is a parameter that if properly tuned can decrease the t.e, or if it is inevitable to get a non-representative SOM for low-correlated data ?

@JustGlowing
Copy link
Owner

hi @V-for-Vaggelis,

Have you tried inspecting the results visually? You want to check that the u-matrix (that you can get with the method distance_map()) is smooth.

You can obtain a smooth mapping no matter how the data is correlated.

@JustGlowing
Copy link
Owner

@V-for-Vaggelis also, to really understand if the som has converged you can check the weights step by step and stop when the they don't change anymore (||W_i - W_i-1|| < epsilon).

@V-for-Vaggelis
Copy link
Contributor

@JustGlowing A weird thing happened. I updated minisom and would not print t.e anymore. So I print it myself and got 0.09 for the same data. Could it be a bug you had fixed? I also got the distance map as you advised. In general it has a smooth behavior, but there is a small red area (large distances). I guess it means this small area of the grid can't be trusted to draw conclusions.

Also another thing, is there a paper I can refer to in my thesis for minisom or should I just link to the repo?

@JustGlowing
Copy link
Owner

@V-for-Vaggelis there was a bug fix released in December related to the quantization error.

Can you please cite MiniSom as follows:

G. Vettigli, "MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map,". Available:
whttps://github.com/JustGlowing/minisom.

@Yifeng-J
Copy link

Hi,I am using Minisom to cluster data, and I find it is so convenient. So thanks for your contributions. However, I am confused about how to properly select initial parameters, eg: sigma,learning rate and max_iteration. In the issue, you said " The only way to know if you reached convergence is to look at the learning curve and check if it reached a plateau", but I want to know use which indicator to plot learning curve, quantization error?

And finally, I want to know is there any way I can get the cluster number to which each datapoint in that dataset belongs to. In the Cluster example you set each neuron as a cluster but it is not properly in my experiment.

Thanks .

@JustGlowing
Copy link
Owner

hi @Yifeng-J,

Here's an example of how to plot the learning curve: https://github.com/JustGlowing/minisom/blob/master/examples/BasicUsage.ipynb

I'd recommend to use the quantization error unless you're trying to optimize your own custom metric.

Regarding the cluster index, that example you pointed out shows the most convenient way to solve the issues. However, you can do more complex stuff, like grouping different neurons and assigning the cluster index according to that.

@Yifeng-J
Copy link

@JustGlowing Ok,thanks for your answering. I will try some other method to solve the cluster index problem. I hope you can give me some suggestions on how to choose initial parameters, because I can't find any information about how to choose it properly on the Internet.

@JustGlowing
Copy link
Owner

I'd suggest you too start with the default parameters and plot the results as showed into the documentation. Then you can tweaking the parameters. You'll get a grasp once you try a couple of edge cases (eg set sigma too high or too low). Remember that there's no optimal set of parameters, but you can find a set that is good enough for you.

@Yifeng-J
Copy link

@JustGlowing Ok, I get it. Thank you very much!

@atheeraa
Copy link

Hello guys, I'm trying to use minisom for clustering a 16-dimensions embeddings with 7 classes,
I'm not sure how to set the size of maps,

if for example I set it to 77 i'd get 49 clusters
3
3 = 9 clusters

I read your rule of thumb, but it doesn't work for me, because I'd have to set it to 16*16 and by doing so I'd get 265 clusters!

Would appreciate the help

@JustGlowing
Copy link
Owner

Hi @atheeraa , you have to set input_len to 16 and create a map of size 3x3. This will give you 9 clusters and you can merge two of the closest clusters to get the 8 that you need.

@atheeraa
Copy link

Thank you for your reply!
I have another question regarding the visualization, I have a 16 dimensions embeddings, how do you suggest I plot the map, when following the clustering example you provided, I can see that I can change the x and y of the scatter function, but I don't know how do I show the whole data at once.

Again, thank you for your replies, I appreciate your help.

@JustGlowing
Copy link
Owner

hi again @atheeraa , you want to have a look at this example https://github.com/JustGlowing/minisom/blob/master/examples/BasicUsage.ipynb

@Overture-Y
Copy link

Thank you so much for your wonderful working, I'm trying to use minisom in some cluster task, but in the cluster example, "som.winner" process the data one by one, which cost so much time if the amount of input is huge, if the input 's shape is (m, n ), how to process the array without “for”? thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants