Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the gaussian parameter(mean,cov,weights) ? #17

Open
liuziyang365 opened this issue Dec 13, 2018 · 1 comment
Open

How to get the gaussian parameter(mean,cov,weights) ? #17

liuziyang365 opened this issue Dec 13, 2018 · 1 comment

Comments

@liuziyang365
Copy link

liuziyang365 commented Dec 13, 2018

if i have data = batch1+batch2, are there 2 ways to run the online-clustering ?
1 way: set nBatch = 2
bnpy.run(
data, 'DPMixtureModel', 'Gauss', 'memoVB',
output_path='/home/pinga/lzy/bnpy-test/tmp/3/',
nLap=100, nTask=1, nBatch=2,
sF=0.1, ECovMat='eye',
initname=dict_2['task_output_path'],
moves='birth,merge,shuffle',
m_startLap=5, b_startLap=0, b_Kfresh=4)

2 way :set batch2_initname = batch1_out_put_path
gamma = 5
sF = 5
K = 1
model_1, dict_1 = bnpy.run(
batch2, 'DPMixtureModel', 'Gauss', 'memoVB',
output_path='/home/pinga/lzy/bnpy-test/tmp/2/',
nLap=100, nTask=1, nBatch=1,
sF=sF,K=K,gamma0=gamma ECovMat='eye',
initname='randexamples',
moves='birth,merge,shuffle',
m_startLap=5, b_startLap=0, b_Kfresh=4)
model_2, dict_2 = bnpy.run(
batch2, 'DPMixtureModel', 'Gauss', 'memoVB',
output_path='/home/pinga/lzy/bnpy-test/tmp/2/',
nLap=100, nTask=1, nBatch=1,
sF=sF,K=K,gamma0=gamma ECovMat='eye',
initname=dict_1['task_output_path'],
moves='birth,merge,shuffle',
m_startLap=5, b_startLap=0, b_Kfresh=4)

Are these two ways the same?
How to get the gaussian parameters (mean,cov,weight) after clustering each batch ?
How to get the weighted log probabilities for each sample?

@michaelchughes
Copy link
Contributor

michaelchughes commented Dec 21, 2018

Are these two ways the same? No

No. Those two approaches will deliver very different models (I'm assuming you intended the 2nd approach's first call to bnpy.run to refer to batch1, and the second call to batch2. As written your second example never uses batch1).

The first approach will fit a model that tries to jointly explain batch 1 and batch 2. It will run for 100 laps or epochs (100 passes thru both batch1 and batch2).

The second approach will be different. It will first fit a good mixture model to batch1 only, using 100 laps/epochs. Then, it will use this model as initialization (but only an initialization) when fitting to batch2, again running the batch2-specific training for 100 laps/epochs.

If batch2 looks very different than batch1, the results will be very different.

  • first approach will try to find clusters that fit both batches.
  • second approach, because it has birth/merge moves, will only fit batch2 in the end. Any preliminary clusters that were useful for batch1 would probably be discarded/edited by merge moves or standard update steps unless they provide benefit for batch2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants