Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read ’annotation‘ in biocxml format #23

Open
pyramid20002000 opened this issue Oct 17, 2023 · 2 comments
Open

Cannot read ’annotation‘ in biocxml format #23

pyramid20002000 opened this issue Oct 17, 2023 · 2 comments

Comments

@pyramid20002000
Copy link

pyramid20002000 commented Oct 17, 2023

I’m writing a python script, to convert biocxml file into pubtator file.
I did not find similar script, so all I can do is to write one on my own.

The bioc files are downloaded from :
https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/BioRED.zip

I tried to read the "Test.BioC.XML" in two ways:
1:
with open(fpath, 'r') as fp:
collection = biocxml.load(fp)
docs = collection.documents
2:
with biocxml.iterparse(fpath) as reader:
collection_info = reader.get_collection_info()
for doc in reader:

It is strange to find that all annotations are missing, but relations are corrected parsed.

image

Any idea why this happens?

@pyramid20002000
Copy link
Author

image
Above is the difference between the correct pubtator file and the one that I converted with bioc.
I believe this is a bug in parsing the xml file somewhere.

@pyramid20002000
Copy link
Author

pyramid20002000 commented Oct 20, 2023

@ptlai Thanks to Dr. Lai's help.

In order to help more people, I will explain the problem and post the solution here :
The problem is that each document object has empty annotation list.
But relation annotation list is fine.

Actually the annotations are inside each passage node.
They can be found by the following code.

from bioc import biocxml
fpath = 'Test.BioC.XML'
with open(fpath, 'r') as fp:
collection = biocxml.load(fp)
docs = collection.documents
for doc in docs:
for passage in doc.passages:
for annotation in passage.annotations:
print(annotation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant