Biomedical Discourse Treebanks

This repository contains biomedical discourse treebanks:

COVID19-DTB (Nishida and Matsumoto, 2021; to appear in TACL)

COVID19-DTB: COVID-19 Discourse Dependency Treebank

COVID19-DTB is a collection of manually annotated discourse dependency structures for scholarly paper abstracts on COVID-19 and related coronaviruses like SARS and MERS.

The following figure shows an actual example of discourse dependency structure for an abstract (Israeli et al., 2020) in COVID19-DTB.

Description

First, we sampled 300 abstracts randomly from the 2020 September snapshot of The COVID-19 Open Research Dataset (CORD-19). Then, the 300 abstracts were segmented into Elementary Discourse Units (EDUs) manually by the authors. Then, we employed two professional annotators to give gold discourse dependency structures to the 300 abstracts independently. We divided the results into development and test sets, each of which consists of 150 examples.

The further details on the annotation scheme, annotation procedure, and corpus statistics can be found in our paper.

Data format

We follow the same JSON schema with the SciDTB dataset (Yang and Li, 2018): Each discourse dependency structure is stored as a JSON file like the following:

{
    "root": [
        {
            "id": 0,
            "parent": -1,
            "text": "ROOT",
            "relation": "null"
        },
        {
            "id": 1,
            "parent": 3,
            "text": "SARS - CoV-2 genetic identification is based on viral RNA extraction prior to RT - qPCR assay ,",
            "relation": "BACKGROUND"
        },
        {
            "id": 2,
            "parent": 1,
            "text": "however recent studies support the elimination of the extraction step . <S>",
            "relation": "COMPARISON"
        },
        {
            "id": 3,
            "parent": 0,
            "text": "Herein , we assessed the RNA extraction necessity ,",
            "relation": "ROOT"
        },
        {
            "id": 4,
            "parent": 3,
            "text": "by comparing RT - qPCR efficacy in several direct approaches vs. the gold standard RNA extraction ,",
            "relation": "MANNER-MEANS"
        },
        {
            "id": 5,
            "parent": 3,
            "text": "in detection of SARS - CoV-2 from laboratory samples as well as clinical Oro - nasopharyngeal SARS - CoV-2 swabs . <S>",
            "relation": "SAME-UNIT"
        },
        {
            "id": 6,
            "parent": 3,
            "text": "Our findings show advantage for the extraction procedure ,",
            "relation": "FINDINGS"
        },
        {
            "id": 7,
            "parent": 6,
            "text": "however a direct no - buffer approach might be an alternative ,",
            "relation": "COMPARISON"
        },
        {
            "id": 8,
            "parent": 7,
            "text": "since it identified up to 70 % of positive clinical specimens . <S> <P>",
            "relation": "CAUSE-RESULT"
        }
    ]
}

"<S>" and "<P>" denote the sentence and paragraph boundaries, respectively.

Citation

The blank entries (i.e., volume, pages, doi) in the bibtex item below will be filled in after publication in TACL.

@article{nishida2021outofdomain,
    title={Out-of-Domain Discourse Dependency Parsing via Bootstrapping: {A}n Empirical Analysis on its Effectiveness and Limitation},
    author={Nishida, Noriki and Matsumoto, Yuji},
    journal={Transactions of the Association for Computational Linguistics},
    volume={},
    pages={},
    year={2021},
    doi={},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
covid19-dtb		covid19-dtb
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

covid19-dtb

covid19-dtb

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Biomedical Discourse Treebanks

COVID19-DTB: COVID-19 Discourse Dependency Treebank

Description

Data format

Citation

About

Releases

Packages

norikinishida/biomedical-discourse-treebanks

Folders and files

Latest commit

History

Repository files navigation

Biomedical Discourse Treebanks

COVID19-DTB: COVID-19 Discourse Dependency Treebank

Description

Data format

Citation

About

Topics

Resources

Stars

Watchers

Forks