Skip to content

OHDSI/PhenotypeChangesInVocabUpdate

Repository files navigation

Utility to compare cohorts run in different vocabulary versions by resolving their concept sets

Compares source codes captured, hierarchy changes and domain changes; identifies Non-standard concepts used in concept set expressions

Step by Step Example

#install package
remotes::install_github("OHDSI/PhenotypeChangesInVocabUpdate")

library (dplyr)
library (openxlsx)
library (readr)
library (tibble)
library (PhenotypeChangesInVocabUpdate)

#set the BaseUrl of your Atlas instance
#baseUrl <- "https://yourSecureAtlas.ohdsi.org/"

# if security is enabled authorize use of the webapi
ROhdsiWebApi::authorizeWebApi(
  baseUrl = baseUrl,
  authMethod = "windows")

#specify cohorts you want to run the comparison for, in my example I import it from the CSV with one column containing cohortIds
#the example file is located in "~/PhenotypeChangesInVocabUpdate/extras/Cohorts.csv"
# also you can define the cohorts as vector directly:
#cohorts <-c(12822, 12824, 12825)

#you must specify the full file name with cohortIds
cohortsDF <- readr::read_delim("~/PhenotypeChangesInVocabUpdate/extras/Cohorts.csv", delim = "\t", show_col_types = FALSE)
cohorts <-cohortsDF[[1]]

#excluded nodes is a text string with nodes you want to exclude from the analysis, it's set to 0 by default
# for example now some CPT4 and HCPCS are mapped to Visit concepts and we didn't implement this in the ETL,
#so we don't want these in the analysis (note, the tool doesn't look at the actual CDM, but on the mappings in the vocabulary)
#this way, the excludedNodes are defined in this way:
#excludedNodes <-"9201, 9202, 9203"


#set connectionDetails,
#you can use keyring to store your credentials,
#see how to configure keyring to use with the example below in ~/PhenotypeChangesInVocabUpdate/extras/KeyringSetup.R

# you can also define connectionDetails directly, see the DatabaseConnector documentation https://ohdsi.github.io/DatabaseConnector/

connectionDetails = DatabaseConnector::createConnectionDetails(
  dbms = keyring::key_get("YourDatabase", "dbms" ),
  connectionString = keyring::key_get("YourDatabase", "connectionString"),
  user = keyring::key_get("YourDatabase", "username"),
  password = keyring::key_get("YourDatabase", "password" )
)

newVocabSchema <-'vocab_schema_n1' #schema containing a new vocabulary version
oldVocabSchema <-'vocab_schema_n0' #schema containing an older vocabulary version
resultSchema <-'achilles_results' #schema containing Achilles results

#create the dataframe with concept set expressions using the getNodeConcepts function
Concepts_in_cohortSet<-getNodeConcepts(cohorts, baseUrl)

#resolve concept sets, compare the outputs on different vocabulary versions, write results to the Excel file "PhenChange.xlsx" saved in a session root folder
resultToExcel(connectionDetails = connectionDetails,
              Concepts_in_cohortSet = Concepts_in_cohortSet,
              newVocabSchema = newVocabSchema,
              oldVocabSchema = oldVocabSchema,
              resultSchema = resultSchema)

#open the excel file
#Windows
shell.exec("PhenChange.xlsx")

#MacOS
#system(paste("open", "PhenChange.xlsx"))

The output description:

Writes an Excel file with a separate tab for each type of comparison.

Definitions/column names used:

"Node concept" is a concept directly used in Concept Set Expression

"includedescendants": indicates whether descendants of "Node concept" are included in concept set, 0 stands for False, 1 stands for True

"isexcluded": indicates whether "Node concept" and it's descendants if "includedescendants" = 1 are excluded from a concept set, 0 stands for False, 1 stands for True

"drc": descendant record count - total number of occurrences of descendants of a given concept

"source concept": related source concept_id. The concept set definition is usually done through standard concepts, but different clinical events might be captured with the same standard concepts if mapping was changed, that's why the tool tracks source concepts related.

“Action”: flags whether concept or hierarchy branch is added or removed

The Excel file has the following tabs:

1. summaryTable

sum of added or removed source concepts occurrences in a dataset per cohort

  • for example, the cohort_id 123 doesn't pick up source codes X and Y when using newer vocabulary version. X appears 10 times in the data, Y appears 15 times.

In this situation you'll get the following output:

cohortid 123
action Removed
sum 25

2. nonStNodes

lists non-standard concepts used in the concept set definition.

Note, the concept set definition JSON isn't updated with the vocabulary update, so you will not see concept changes in Atlas.

This way you need to run this tool to see if concepts changed to non-standard.

  • For example, the cohort_id 10729 has conceptset =’Malignancies that spread to liver’ which has Node concept = "4324190|History of malignant neoplasm of breast" with descendants included,

this concept is non-standard and mapped this way:

Maps to "1340204|History of event"

Maps to value "4112853|Malignant tumor of breast".

In this situation you'll get the output below, which gives you the target concepts you need to use to capture the same clinical events while using a new vocabulary version.

cohortid 10729
conceptsetname Malignancies that spread to liver
conceptsetid 15
isexcluded 0
includedescendants 1
nodeConceptId 4324190
nodeConceptName History of malignant neoplasm of breast
drc 20284048
mapsToConceptId 1340204
mapsToConceptName History of event
mapsToValueConceptId 4112853
mapsToValueConceptName Malignant tumor of breast

3. mapDif

Tab shows related source concepts that were added or removed. Mapping in both vocabulary versions is shown.

Note, source codes from the user's database only are included into the analysis.

This way the user knows why the difference in related source concepts occurs and might modify the concept set expression adding or removing mapped concepts.

  • In the example below, events with ICD9CM “Neural hearing loss concept, unilateral” are now captured because of the mapping change. OLD_MAPPED_CONCEPT “Unilateral neural hearing loss” didn’t have the proper hierarchy, and wasn’t captured.
COHORTID 12822
CONCEPTSETNAME Cranial nerve disorder
CONCEPTSETID 28
ISEXCLUDED 0
INCLUDEDESCENDANTS 1
NODE_CONCEPT_ID 441848
NODE_CONCEPT_NAME Cranial nerve disorder
SOURCE_CONCEPT_ID 44823107
sourceCodesCount 7115
ACTION Added
SOURCE_CONCEPT_NAME Neural hearing loss, unilateral
SOURCE_VOCABULARY_ID ICD9CM
SOURCE_CONCEPT_CODE 389.13
OLD_MAPPED_CONCEPT_ID 379831
OLD_MAPPED_CONCEPT_NAME Unilateral neural hearing loss
OLD_MAPPED_VOCABULARY_ID SNOMED
OLD_MAPPED_CONCEPT_CODE 425601005
NEW_MAPPED_CONCEPT_ID 381312
NEW_MAPPED_CONCEPT_NAME Neural hearing loss
NEW_MAPPED_VOCABULARY_ID SNOMED
NEW_MAPPED_CONCEPT_CODE 73371001

4.peakDif

Hierarchy change is reflected at "Peak concept" level, the common parent concept of added or removed standard concepts above which the hierarchy is changed.

  • In the example below, the 375527|Headache disorder and all its descendants were added to the included concepts in the Headache concept set. This is quite a big change since drc (descendant record count)= 34219562, and now a researcher has to decide whether the new, more broad, definition fits well.
cohortid 12825
conceptsetid 23
conceptsetname Headache
isexcluded 0
includedescendants 1
nodeConceptId 378253
nodeConceptName Headache
action Added
peakConceptId 375527
peakName Headache disorder
peakCode 230461009
drc 34219562

5. domainChange

This tab shows included concepts that changed their domain, so the different event table should be used.

  • In the example below “2108163|Therapeutic apheresis; for plasma pheresis” concept changed its domain from Procedure to Measurement, so the concept set “Treatment or investigation for TMA” needs to be used with Measurement table as well to include the “2108163|Therapeutic apheresis; for plasma pheresis” events.
cohortid 10656
conceptsetname Treatment or investigation for TMA
conceptsetid 20
isexcluded 0
includedescendants 1
nodeConceptId 4182536
nodeConceptName Transfusion
conceptId 2108163
conceptName Therapeutic apheresis; for plasma pheresis
vocabularyId CPT4
conceptCode 36514
oldDomainId Procedure
newDomainId Measurement
drc 1010478

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages