Skip to content
petermr edited this page Jul 29, 2020 · 28 revisions

Please leave your questions and answers here.... We can only answer questions about software and the scientific literature on viruses, epidemics, etc. If you want general knowledge go to Wikipedia (I do!).

what does openVirus do?

It's a collection of tolls and resources to help generate knowledge from the public scientific literature. The tools are generic but we concentrate on viral epidemics (not just COVID-19) and tools to manage them.

where is the introductory material?

what's the software?

  • getpapers queries scientific repositories
  • quickscrape scrapes publisher and other sites
  • ami is a novel toolkit for collecting, transforming, indexing, sectioning, searching, and re-using scientific documents.

where can I get the software?

how can we run it?

You need your own machine with install permissions, and understand commandlines. getpapers and quickscrape need Node.js and have installation instructions. ami3 is a Java toolkit. At present you download a JAR file (https://github.com/petermr/tigr2ess gives instructions on installation and running. ) We will have later JAR files.

what help do you want?

  • scraping, web stuff (Javascript, Node, REST)

  • Academic and scholarly publishing, publisher sites, repositories.

  • document transformation. HTML, XML/XSLT, JATS, PDF, Pandoc, etc.

  • text searching.

  • documentation and tutorials.

  • liaison.

  • Spanish (we are starting to index Redalyc, LatinAmerica)

  • Wikimedia, SPARQL

  • workflows, packaging and distribution

  • community engagement and management.

  • and lots more...

what sites do you extract from?

We only use openly visible sites.

also

  • https://crossref.org metadata and search for all publications including closed. Will normally give URLs and abstracts (if available). No fulltext.
  • https://doaj.org Directory of OA journals. Large dumps of metadata and text. Should use this more!

should I wear a face mask?

We can't answer medical and personal questions! But we can help search the literature for peer-reviewed Open Access papers to help organisations make policies and protocols.

Do you support languages other than English?

We try to index everything to Wikidata.org (the data extracted from Wikipedia with a lot more added). Here's "coronavirus" in 96 languages (bottom of page https://www.wikidata.org/wiki/Q82069695) and here's "cough" with 99 languages. So when we annotate pages with Wikidata there's a good chance that yours will be linked.

We are experimenting with adding Hindi equivalents to our dictionaries, using the links in Wikidata.

We also expect to download and extract ES and PT shortly when we start indexing Redalyc. For that we'll need native language speakers. Tasks include: processing of diacritics, creation of stopwords and vocabularies (probably open available), knowledge of phrases, sentence structure, punctuation, synonymy, etc.

What about copyright?

It's a major friction in the system. We take a liberal view - that science is facts (not copyrightable), and that copying has widespread fair use permissions, especially for non-commercial or educational purposes. However it's the law in most countries, and we don't knowingly break it. We're happy to find collaborators whose legal systems are permissive. In the UK it's legal to textmine documents you have legal access to , for non-commercial research purposes (which is what we are).

How to import getpapers output to run ami searches

The directory that is created when running getpapers to download papers can be used as the CProject Directory for ami

For example: getpapers -q "masks" -o masks -f n95/log.txt -x -p Here output directory masks will be used for ami ami -p masks/ search --dictionary country disease funders>

What's a CProject and how do I create one?

A CProject is just a directory whose immediate child directories (CTrees) are individual documents. Many of the subdirectories have reserved names (e.g. __cooccurrence holds the co-occurrence results).

If you use getpapers then the output is already in CProject-form so there is no action required. If you start with a bundle of PDFs, then put them in a single directory (e.g. myproject, virusmasks, etc. Do NOT include spaces or uppercase). Then run ami -p virus makeproject --rawfiletypes pdf . This will rename the PDFs as:

foo&Bar.pdf => foo_bar/fulltext.pdf

Most ami commands will have a -p or -t argument running on the project or the tree/s.

why doesn't ami-dictionary work? should I use amidict`?

In 2020-05 we changed the style of ami dictionary commands. We probably forgot to announce this clearly.

Sorry.

The old command ami-dictionary has been moved to amidict . It has its own toplevel Options (e.g. there is no --cproject option).

amidict --help
Usage: amidict [OPTIONS] COMMAND

`amidict` is a command suite for managing dictionary:

Parameters:
===========
      [@<filename>...]   One or more argument files containing options.
Options:
========
  -d, --dictionary=<dictionaryList>...
                         input or output dictionary name/s. for 'create' must be singular; when 'display' or
                           'translate', any number. Names should be lowercase, unique. [a-z][a-z0-9._]. Dots can be
                           used to structure dictionaries intodirectories. Dictionary names are relative to
                           'directory'. If <directory> is absent then dictionary names are absolute.
      --directory=<directory>
                         top directory containing dictionary/s. Subdirectories will use structured names (NYI). Thus
                           dictionary 'animals' is found in '<directory>/animals.xml', while 'plants.parts' is found in
                           <directory>/plants/parts.xml. Required for relative dictionary names.
  -h, --help             Show this help message and exit.
  -V, --version          Print version information and exit.
General Options:
  -i, --input=FILE       Input filename (no defaults)
  -n, --inputname=PATH   User's basename for inputfiles (e.g. foo/bar/<basename>.png) or directories. By default this
                           is often computed by AMI. However some files will have variable names (e.g. output of
                           AMIImage) or from foreign sources or applications
  -L, --inputnamelist=PATH...
                         List of inputnames; will iterate over them, essentially compressing multiple commands into
                           one. Experimental.
  -f, --forcemake        Force 'make' regardless of file existence and dates.
  -N, --maxTrees=COUNT   Quit after given number of trees; null means infinite.
Logging Options:
  -v, --verbose          Specify multiple -v options to increase verbosity. For example, `-v -v -v` or `-vvv`. We map
                           ERROR or WARN -> 0 (i.e. always print), INFO -> 1(-v), DEBUG->2 (-vv)
      --log4j=(CLASS LEVEL)...
                         Customize logging configuration. Format: <classname> <level>; sets logging level of class, e.
                           g.
                          org.contentmine.ami.lookups.WikipediaDictionary INFO
Commands:
=========
  create     creates dictionaries from text, Wikimedia, etc..
  display    Displays AMI dictionaries. (Under Development)
  search     searches within dictionaries
  translate  translates dictionaries between formats

subcommands

A common subcommand is ami create

midict create --help
Usage: amidict create [-hV] [--query[=query]] [--informat=input format]
                      [--linkcol=<linkCol>] [--termcol=<termCol>]
                      [--termfile=<termfile>] [--testString=<testString>]
                      [--wptype=<wptype>] [--wikilinks[=<wikiLinks>[,
                      <wikiLinks>...]...]]... [--namecol=<nameCol>...]
                      [--datacols=datacol[,datacol...]...]...
                      [--hrefcols=hrefcol[,hrefcol...]...]...
                      [--outformats=output format[,output format...]...]...
                      [--template=<templateNames>...]... [--terms=<terms>[,
                      <terms>...]...]...
creates dictionaries from text, Wikimedia, etc..
TBD
      --datacols=datacol[,datacol...]...
                            use these columns (by name) as additional data
                              fields in dictionary. datacols='foo,bar' creates
                              foo='fooval1' bar='barval1' if present. No
                              controlled use or vocabulary and no hyperlinks.
  -h, --help                Show this help message and exit.
      --hrefcols=hrefcol[,hrefcol...]...
                            external hyperlink column from table; might be
                              Wikidata or remote site(s)
      --informat=input format
                            input format (csv, list, mediawikitemplate,
                              wikicategory, wikipage, wikitable, wikitemplate)
      --linkcol=<linkCol>   column to extract link to internal pages. main use
                              Wikipedia. Defaults to the 'name' column
      --namecol=<nameCol>...
                            column(s) to extract name; use exact case (e.g.
                              Common name)
      --outformats=output format[,output format...]...
                            output format (xml, html, json)
      --query[=query]       generate query for cut and paste into EPMC or
                              similar. value sets size of chunks (too large
                              crashes EPMC). If missing, no query generated.
      --template=<templateNames>...
                            names of Wikipedia Templates, e.g.
                              Viral_systemic_diseases (note underscores not
                              spaces). Dictionaries will be created with
                              lowercasenames and all punctuation removed).
      --termcol=<termCol>   column(s) to extract term; use exact case (e.g.
                              Term). Could be same as namecol
      --termfile=<termfile> list of terms in file, line-separated
      --terms=<terms>[,<terms>...]...
                            list of terms (entries), comma-separated
      --testString=<testString>
                            String input for debugging; semantics depend on task
  -V, --version             Print version information and exit.
      --wikilinks[=<wikiLinks>[,<wikiLinks>...]...]
                            try to add link to Wikidata and/or Wikipedia page
                              of same name.
      --wptype=<wptype>     type of input (HTML , mediawiki)

How can a dictionary be modified?

A dictionary can be created, read, updated or deleted using the standard CRUD operations. There are many ways to perform these operations, one of the simplest is by using Excel spreadsheets

How Can I Join OpenVirus Community?

Pre-Requisites required for joining

• Basic Coding Skills

• Access to www

• Logical Thinking

• Good Communication

• Consistency

WHOM TO CONTACT

Dr. PETER MURRAY-RUST pm286@cam.ac.uk Dr. GITANJALI YADAV gy@nipgr.ac.in

What does Out Of Memory Error mean and how do I correct it?

While building a large multi-module project, each file requires a certain amount of memory and more the number of files, more the memory required until the JVM runs out of Java heap space. Java heap space is the memory space container of our Java program managed by JVM.

This error arose when I was using Amisearch for the CProject directory of 950 articles.

To fix this error, assign more memory to JVM just by giving a command in the Command prompt : set MAVEN_OPTS=-Xmx512m -XX:MaxPermSize=128m for Windows OS.

For other OS, see https://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError This error arises because some files in the CProject are too bulky and consuming most of the space. The error can be tackled by deleting such files, now how can you find those bulky files, let's understand it as:

.... unTransform] in --transform (OutOfMemoryError: Java heap space) PMC7286271 java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ....

Now, if I delete this file PMC7286271 from the CProject directory, and then run the amisearch command, it shows no error. When we are creating CProject directory of say 1000 files, it is common that we can have 4 or 5 such bulky files which are causing 'OutOfMemory' error. Deleting these files solve the problem.

What does "large document truncated" mean?

When ami search was used to search a test dictionary on the CProject, it showed as:

large document (1507) for PMC6824115 truncated to 500 sections

This means that the document PMC6824115 has 1507 sections (most documents have << 100). It's probably a review or a catalogue. It's so large that my browser is having difficulty.This bulky file was creating Java 'OutOfMemory' error. Deleting this file from the CProject directory solved the problem.

what does "Git Desktop: fatal: the remote end hung up unexpectedly" for mean?

From my experience (PMR) this often means that the volume of material is too much and the repository server hung up.

It may be possible to commit the material in small amounts - e.g. chunks of 300 rather than all 950. In the worst case just work with 300 .

How do I fix: "[Github gives: ]"Permission denied (publickey)."?

There may be multiple reasons for this error while you are making a pull request on GitHub.

  1. You can try clearing your credentials (windows PC): https://stackoverflow.com/questions/15381198/remove-credentials-from-git and then try signing in again.
  2. In case there are some cached credentials in that repo that became invalid, you can also clone the entire repository in a separate location and try again.

Why does BUILD FAILURE occurs when updating ami3?

  • This error occurs while trying to update ami3, which was installed using maven. When using the command mvn clean install -DskipTests, the BUILD FAILURE error occurs.
  • This happens sometimes on Windows OS . One thing to try is closing the Command Prompt window where you’re running the command and try again in a new window.
  • If again in new command prompt BUILD FAILED with the same error ,try closing all windows and try again. If that doesn’t work you may need to reboot your computer.
  • It was due to that other Command Prompt probably had a process running that was using that jar file , so the old version couldn’t be deleted.

Cannot run AMI: ami is not recognised as an internal or external command, operable program or batch file.

This probably means you haven't got a JAR file for ami or haven't set the PATH to point to it. See https://github.com/petermr/openVirus/wiki/INSTALLING-ami3 for help (there are indications of how to set your path).

"fatal: the remote end hung up unexpectedly", why does it occur?

How do I know what version of ami I am running? Is it the latest?

See https://github.com/petermr/ami3/wiki/a_FAQ#how-do-i-know-what-version-of-ami-i-am-running-can-i-get-the-latest

Remko's package manager adds dates so that it should be clear when the package was released. When reporting bugs, always use the latest version unless instructed otherwise.

Clone this wiki locally