Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmine gives empty results #43

Open
simulsys opened this issue May 14, 2016 · 3 comments
Open

Cmine gives empty results #43

simulsys opened this issue May 14, 2016 · 3 comments

Comments

@simulsys
Copy link

Hi there I am making progress. I narrowed down my search to 33 folders, but I do not seem to get analyzed results. Here is a snapshot of my directory:
PMC4814072
14/05/16 08:26:44
File: commonest.dataTables.html
1 KB 14/05/16 08:26:44
File: count.dataTables.html
1 KB 14/05/16 08:26:44
File: entries.dataTables.html
1 KB 14/05/16 08:26:44
File: eupmc_fulltext_html_urls.txt
2 KB 14/05/16 08:26:04
File: eupmc_results.json
405 KB 14/05/16 08:26:04
File: full.dataTables.html
1 KB 14/05/16 08:26:44
File: gene.human.count.xml
1 KB 14/05/16 08:26:35
File: gene.human.documents.xml
1 KB 14/05/16 08:26:35
File: gene.human.snippets.xml
1 KB 14/05/16 08:26:35
File: sequence.dnaprimer.count.xml
1 KB 14/05/16 08:26:33
File: sequence.dnaprimer.documents.xml
1 KB 14/05/16 08:26:33
File: sequence.dnaprimer.snippets.xml
1 KB 14/05/16 08:26:33
File: species.binomial.count.xml
1 KB 14/05/16 08:26:44
File: species.binomial.documents.xml
1 KB 14/05/16 08:26:44
File: species.binomial.snippets.xml
1 KB 14/05/16 08:26:44
File: species.genus.count.xml
1 KB 14/05/16 08:26:40
File: species.genus.documents.xml
1 KB 14/05/16 08:26:40
File: species.genus.snippets.xml
1 KB 14/05/16 08:26:40
File: word.frequencies.count.xml
1 KB 14/05/16 08:26:32
File: word.frequencies.documents.xml
1 KB 14/05/16 08:26:32
File: word.frequencies.snippets.xml
1 KB 14/05/16 08:26:32

The 1KB files are all empty. What should they be, please? Plus I cannot find scholarly.txt?

@petermr
Copy link
Member

petermr commented May 14, 2016

Please can you give more information. To diagnose your problem we would need to know:

  • the input files, including their pathnames. project/ctree/initialFiles
  • the commands run
  • any console output

I don't see any scholarly.html, or fulltext.xml or cproject contents

@simulsys
Copy link
Author

Commands:
getpapers --query 'pollination honey bees canola' --outdir test3
Result output:
info: Searching using eupmc API
info: Found 33 open access results
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt
andrew@andrew-Dimension-5000 ~ $

cmine test3
andrew@andrew-Dimension-5000 ~ $ cmine test3
0 [main] DEBUG org.xmlcml.ami2.plugins.CommandProcessor - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project test3
!.!!!!!!!!!!.!!!!!!!!!!.!!!!!!!!!!.!!running: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
WS: test3 1752 [main] DEBUG org.xmlcml.ami2.wordutil.WordSetWrapper - symbol expands to: /org/xmlcml/ami2/wordutil/pmcstop.txt
1754 [main] DEBUG org.xmlcml.ami2.wordutil.WordSetWrapper - symbol expands to: /org/xmlcml/ami2/wordutil/stopwords.txt
1819 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC1892840
!1820 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!.1822 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC2718223
!1823 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1824 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC2841636
!1824 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1826 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC2994710
!1826 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1828 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3155332
!1828 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1829 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3250423
!1829 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1831 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3338325
!1831 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1833 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3338563
!1833 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1835 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3384620
!1835 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1838 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3478041
!1838 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1850 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3628874
!1850 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!.1852 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3655217
!1853 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1854 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3806756
!1855 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1856 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3817108
!1856 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1860 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3869053
!1860 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1863 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3958374
!1866 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1869 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4046413
!1870 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1875 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4053381
!1875 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1876 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4217196
!1877 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1878 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4284392
!1878 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1880 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4284396
!1880 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!.1881 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4312970
!1881 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1883 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4339550
!1883 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1885 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4364903
!1885 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1886 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4370578
!1887 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1888 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4426341
!1889 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1890 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4436261
!1891 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1893 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4552548
!1893 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1895 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4553434
!1896 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1898 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4689365
!1898 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1900 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4736462
!1900 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!.1901 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4796003
!1902 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!1903 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4814072
!1903 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract
!filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
frequenciesfrequencies....summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}]
C: frequencies....running: sequence([dnaprimer])[]
....filter: sequence([dnaprimer])[]
dnaprimerdnaprimer....summary: sequence([dnaprimer])[]
C: dnaprimer....running: gene([human])[]
....filter: gene([human])[]
humanhuman....summary: gene([human])[]
C: human....running: species([genus])[]
SP: test3....filter: species([genus])[]
genusgenus....summary: species([genus])[]
C: genus....running: species([binomial])[]
SP: test3....filter: species([binomial])[]
binomialbinomial....summary: species([binomial])[]
C: binomial....15967 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption
15968 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption
15969 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption
15969 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption
15971 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption

Thanks for the help!
I do not have input files, what are they?

@petermr
Copy link
Member

petermr commented May 14, 2016

You need to tell getpapers to retrieve either -x or -p. Without that it creates CTrees but no fulltext.*.

I am not sure whether this is a bug or a feature. I think it's reasonable to create CTrees with just the eupmc_result.json in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants