Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ami search: some files empty for customized dictionary #80

Open
AmbrineH opened this issue Aug 12, 2020 · 14 comments
Open

ami search: some files empty for customized dictionary #80

AmbrineH opened this issue Aug 12, 2020 · 14 comments

Comments

@AmbrineH
Copy link
Collaborator

AmbrineH commented Aug 12, 2020

ami search is giving empty files for histogram.csv and some xml files but I am getting other html files like full.dataTables.html, etc just fine for my latest dictionary and the error I am getting is:

Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54-NEXT-SNAPSHOT/pmcstop.txt Cannot read stopword stream: /org/contentmine/ami/wordutil, ami3, version 2020/08/09_09/54-NEXT-SNAPSHOT/stopwords.txt PMC3561042 .PMC6517453 !wPMC6695746 PMC7102705 PMC7119083 PMC7120695 PMC7197577 PMC7241517 PMC7341712 !wPMC7395586 ..... create data tables Null pluginOption'

Dictionary Used: DICTIONARY_ COUNTRY

Command:
ami -p ami_12_08_2020/try_for_ami_search_1 search --dictionary ami_12_08_2020/country_final.xml

Output: ALL OUTPUT FILES

@petermr
Copy link
Owner

petermr commented Aug 13, 2020

The first paragraph refers to ami search but the second refers to amidict .
Can you please clarify which? and if there are two issues, pleaes separate them.

@petermr
Copy link
Owner

petermr commented Aug 13, 2020

Did you create a dictionary successfully with amidict? If so this should be independent of using ami search. If the search fails with every dictionary, then the problem is with ami search - if it fails only with the one created by amidict please make sure that is uploaded.
there seems to be a problem with the cooccurrence for some people and not others. This may be because the dictionary is not correct. Ideally I need:

  • ami -p P search --dictionary A // works
  • ami -p P search --dictionary B // does not work
    Then it's probably the dictionary.

P.

@AmbrineH
Copy link
Collaborator Author

I am sorry, I copied the wrong command. It is supposed to be about ami search only. I have updated the issue. My apologies for the inconvenience.

The query works fine for the inbuilt dictionary and not on my own (which I created using SPARQL query and then converted using amidict). But I shall certainly try other variants as well but I believe @Priya-Jk-15 has the same issue as well.

@petermr
Copy link
Owner

petermr commented Aug 13, 2020 via email

@AmbrineH
Copy link
Collaborator Author

I tried the following query to create a new corpus and tried it multiple times changing the number of downloaded articles from 10 - 950.

GET PAPERS QUERY: getpapers -q "viral epidemics" -o ami_12_08_2020/try_for_ami_search_1 -f v_epid/log.txt -x -p -k 10
AMI SEARCH: ami -p ami_12_08_2020/try_for_ami_search_1 search --dictionary ami_12_08_2020/country_final.xml

I have committed the smaller folder here: TEST_ FOLDER_WITH_RESULT
Dictionary Used: DICTIONARY_COUNTRY

@Priya-Jk-15
Copy link
Collaborator

@Prasinus818 When I gave ami search, I mentioned neither the path nor the folder name of the dictionary disease. I only gave --dictionary disease and I think it only used inbuilt dictionary. Since when I replaced disease with drug and virus, ami search didn't create DataTables. Will ami search be able to get the dictionary from its folder name because I think you have used the country dictionary's folder name country_final.xml? Kindly please clarify.

@petermr
Copy link
Owner

petermr commented Aug 14, 2020

PLEASE use one issue per topic.
This Issue contains material on getpapers, ami search, amidict .
An issue reporting a bug should contain the minimal information to reproduce the bug.

I suggest opening new issue(s) which contain a precise statement of the problem. It helps if all the files are small, e.g. 10 CTrees for a CProject.

@Priya-Jk-15
Copy link
Collaborator

I created a small disease dictionary with 10 entries which is at https://github.com/petermr/openVirus/blob/master/dictionaries/diseases/amisearch%20issue/disease.xml . I validated the dictionary using the syntax amidict -v --dictionary dic display --fields --validate.

disease validate (10)

Then, I used the dictionary for ami search on a corpus of 5 Ctrees. It created only empty DataTables which is at https://github.com/petermr/openVirus/tree/master/dictionaries/diseases/amisearch%20issue/virepi

@petermr please check it.

@vaishaliarora277
Copy link
Collaborator

@petermr ,

  • I am trying to use ami search for the customised dictionary funder which is committed at : https://github.com/petermr/openVirus/blob/master/dictionaries/funders/funder.xml

  • I tested it on a corpus of 10 articles which were downloaded using the getpapers query : getpapers -q "viral epidemic" -o minicorpus10 -x -k 10

  • The ami search command I used was : ami -p minicorpus10 search --dictionary C:\Users\me\funder.xml

  • It did not create the full.datatables.html and empty _cooccurrence. When I used this corpus to search with the built-in dictionary funders, it worked out really well. This suggests that the corpus is fine, but sadly my dictionary isn't.

@petermr
Copy link
Owner

petermr commented Aug 21, 2020 via email

@Priya-Jk-15
Copy link
Collaborator

@petermr I validated my created dictionary and got the following output:

Generic values (DictionaryDisplayTool)
================================
-v to see generic values
Specific values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@14751b3
--fields            : d        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [disease]
--directory         : d       dic
Dictionary: disease
entries: 13814
    myopia
    psychosis
    psychosis
    ....

Then, I used the dictionary to search a corpus of 10 Ctrees. I got _cooccurence and the results folder has results.xml but the full.datatables.html has only frequecies. The results are at https://github.com/petermr/openVirus/tree/master/examples/Priya/amisearch_issue/virepi .

The tree of the corpus:

Folder PATH listing for volume OS
Volume serial number is 845F-351F
C:.
├───10.1101
│   ├───2020.06.10.20127597
│   ├───results
│   │   └───search
│   │       └───disease
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───10.11012020.06.10.20127597
│   ├───results
│   │   └───search
│   │       └───disease
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC6517453
│   ├───results
│   │   └───search
│   │       └───disease
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC6695746
│   ├───results
│   │   ├───search
│   │   │   └───disease
│   │   └───word
│   │       └───frequencies
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC7119083
│   ├───results
│   │   ├───search
│   │   │   └───disease
│   │   └───word
│   │       └───frequencies
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───PMC7409732
│   ├───results
│   │   ├───search
│   │   │   └───disease
│   │   └───word
│   │       └───frequencies
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
├───search.openVirus
│   ├───dictionaries
│   │   └───diseases
│   └───search.openVirus
│       └───dictionaries
│           └───diseases
└───__cooccurrence
    ├───disease
    └───disease-disease

@AmbrineH
Copy link
Collaborator Author

I could create the dictionary and validated it. Please let me know if the validation results are correct @petermr:

Generic values (DictionaryDisplayTool)
================================
-v to see generic values

Specific values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@2b4c1d96
--fields            : d        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [country]
--directory         : d ami_12_08_2020\amidict10

Dictionary: country

entries: 263
    Afghanistan
    Albania
    Algeria
    ....

I will update the results in the wiki if they are fine

@petermr
Copy link
Owner

petermr commented Aug 22, 2020 via email

@Priya-Jk-15
Copy link
Collaborator

@petermr Please check my comment above regarding this issue. I still think full.datatables.html needs some changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants