Skip to content
This repository has been archived by the owner on Feb 22, 2021. It is now read-only.

Help with determining window size and min count #22

Open
vondiplo opened this issue Jul 1, 2016 · 6 comments
Open

Help with determining window size and min count #22

vondiplo opened this issue Jul 1, 2016 · 6 comments
Labels

Comments

@vondiplo
Copy link

vondiplo commented Jul 1, 2016

I'm trying train a model that would include certain topics.
Relying on the default parameters somehow keeps the topic out of the model.
I was thinking of changing the window size to 5 and the min count to 5 to get more granular results. However, I don't seem to actually know what would be the effect of changing these parameters. Could someone please shed some light regarding the impact ?

@dav009
Copy link
Contributor

dav009 commented Jul 1, 2016

  • Have you tried stemming the vocabulary as Phillip suggested?
  • Are you sure the given topic has enough links on wikipedia ? You can probably check dbpedia NLP datasets to check how many times it appears

@vondiplo
Copy link
Author

vondiplo commented Jul 2, 2016

  • I did try stemming the vocabulary but there are times were automatically getting other descriptive concepts aren't obvious
  • I'm not sure how popular the topic is on wikipedia, but I do know it was returned by dbpedia-spotlight (https://github.com/dbpedia-spotlight/dbpedia-spotlight/). If they could recognise the topic using their model why wouldn't my trained model be able to do so ?

@dav009
Copy link
Contributor

dav009 commented Jul 6, 2016

can you add some info on how many annotations do the topics you are looking for have?
I think the stats on annotations is available here: http://spotlight.sztaki.hu/downloads/latest_data/

presumably if it is returnable by spotlight it means it has enough counts

@vondiplo
Copy link
Author

@dav009 - It doesn't appear there at all (I've downloaded the english tar, then performed a grap on the unzipped folder with my searched concept, 'visual_cortex'). Yet on the other hand neither does Barak_Obama, but it surely does appear in both spotlight's annotations and in wik2vec vectors.

@vondiplo
Copy link
Author

vondiplo commented Aug 7, 2016

Hi @dav009, is there any update regarding this?

@dav009
Copy link
Contributor

dav009 commented Aug 7, 2016

Sorry for the late reply,

  • Barack_Obama(note your typo) is for sure there.
  • if you get the URI counts out of that 1G tar.gz, I would take a look, Im currently under not so good internet.

@mal mal removed the fandango label Jan 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants