Skip to content

Design Proposal for Observation lastn Operation

IanMMarshall edited this page Sep 25, 2019 · 2 revisions

Introduction

The following design proposes a partial solution for implementing the Observation "lastn" operation. It was prepared in response to a request for an elegant and performant solution that would support retrieving and summarizing the most recent Observations captured for a specified patient as of now, and also retrieving the most recent instances of a selected observation or category of observations.

A secondary requirement also addressed by this design involves being able to quickly retrieve a listing of all Observation.code values currently recorded for all patients in a repository. This would allow creating a selection menu to show actual code values contained in the repository.

Requirements

See http://hl7.org/fhir/observation-operation-lastn.html.

Use Cases

This design is intended to address a use case where a clinician wishes to see recent observations for a specified patient going back to a certain date and time. Clinicians may wish to see all Observations for a patient or may wish to filter observations based on a specific category, observation code, or other types of codes.

Assumptions

Design assumes that there is not an immediate requirement to support search parameters other than those needed to implement the use cases provided above i.e. "subject" and "date".

High Level Design

The proposed high level approach is as follows:

  1. Partial implementation of FHIR "$lastn" Operation for Observations as described in http://hl7.org/fhir/observation-operation-lastn.html. See API section below for details of lastn implementation.
  2. Documents will be created in Elasticsearch for Observations.
  3. Depending on operation invoked, Elasticsearch will be used to:
  • Return a document listing Observation identifiers meeting the criteria of the $lastn, grouped by Observation.code, or
  • Return a document listing all Observation code values in the repository.
  1. For $lastn operation, Observation resources identified by the Elasticsearch result document will be retrieved from database and a resource Bundle will be generated that includes the Observations identified in the Elasticsearch result document, ordered by Observation.code and then by Observation.effective ascending.

High Level Implementation

API Design

Lastn Operation

A new operation, "$lastn", would be added to HAPI FHIR for Observation Resource Provider which will follow the guidance of http://hl7.org/fhir/observation-operation-lastn.html:

The request for a lastn query SHALL include:

- A $lastn operation parameter
- A subject using either the patient or subject search parameter
- A category parameter and/or a search parameter that contains a code element in its FHIRpath expression. ( e.g., code or code-value-concept)

In regards to "and/or a search parameter that contains a code element in its FHIRpath expression", supported search parameters would include:

  • Observation.meta.tag
  • Observation.status
  • Observation.category
  • Observation.code
  • Observation.interpretation
  • Observation.bodySite
  • Observation.method

In addition to the code values above, the partial "$lastn" implementation would also support filtering by Observation.effective.

No other Observation search parameters and modifiers would be supported at this time.

Elasticsearch Document Design

Elasticsearch repository would include up to three indexes:

  • One for Patient Observation resources.
  • One for Observation.code CodeableConcept
  • One for Observation.code CodeableConcept.coding The first index would be the main one used for lastn requests. The other two indexes could be used during updates of the Observation index to normalize the Observation.code values.

Observation Document

The Observation documents would be identified by the Observation.Identifier and would consist of the following elements:

  • subject identifier
    • Keyword type.
    • e.g. Patient/851a7b59-feb4-4b4c-9f48-8cf5fad54213
  • meta tag
    • Array of Codings
  • status
    • Keyword type.
  • category
    • Array of Codings representing multiple Codeable Concepts.
  • code
    • Array of Codings representing a single Codeable Concept
  • normalized_code
    • Keyword type representing the entire Observation.code (all of the codings for the code are normalized into a single String value).
  • interpretation - Array of Codings
    • Array of Codings representing a single Codeable Concept
  • bodySite - Array of Codings
    • Array of Codings representing a single Codeable Concept
  • method - Array of Codings
    • Array of Codings representing a single Codeable Concept
  • Effective Date
    • Date type used for sorting.
    • Text type used for display/output (e.g. logging or debugging)
  • Observation Identifier
    • Keyword type
    • Observation ID value

Codings, other than those for Observation.code would be represented by a single field, "code_system_hash", which would be a unique numeric hash value derived from the System and Code values. For Observation.code, codings would also include "code" and "system" fields so that they can be returned in a query response if necessary.

Codeable Concepts would be represented by a composite field consisting either of an array of Codings (represented by "code_system_hash") or a single text element.

Index mappings for Observation documents would be defined as follows:

{
  "mappings" : {
    "properties" : {
      "bodysite" : {
        "properties" : {
          "code_system_hash" : { "type" : "keyword" },
          "text" : { "type" : "text" }
        }
      },
      "category" : {
        "properties" : {
          "code_system_hash" : { "type" : "keyword" },
          "text" : { "type" : "text" }
        }
      },
      "code" : {
        "properties" : {
          "coding" : {
            "properties" : {
              "code" : { "type" : "text" },
              "system" : { "type" : "text" },
              "code_system_hash" : { "type" : "text" }
            }
          }
        }
      },
      "normalized_code" : { "type" : "keyword" },
      "effectivedtm" : { "type" : "date" },
      "identifier" : { "type" : "keyword" },
      "interpretation" : {
        "properties" : {
          "code_system_hash" : { "type" : "keyword" },
          "text" : { "type" : "text" }
        }
      },
      "meta_tag" : { "type" : "keyword" },
      "method" : {
        "properties" : {
          "code_system_hash" : { "type" : "keyword" },
          "text" : { "type" : "text" }
        }
      },
      "status" : { "type" : "keyword" },
      "subject" : { "type" : "keyword" }
    }
  }
}

An example of an index document that might be generated for an Observation would be as follows:

{
  "identifier" : "093d5af6-7347-482a-aa32-53814767d85e",
  "subject" : "Patient/0000039a-636f-45eb-97aa-85582d8e9805",
  "category" : [
    {
      "code_system_hash" : [
        950854891576463190
      ]
    }
  ],
  "status" : "Final",
  "code" : [
    {
      "code_system_hash" : 7417299403646902754
    }
  ],
  "normalized_code" : "7417299403646902754",
  "interpretation" : {
    "code_system_hash" : [
      -1469111858374900926
    ]
  },
  "bodysite" : {
    "code_system_hash" : [
      4376320339240164549
    ]
  },
  "method" : {
    "code_system_hash" : [
      5670388653210975895
    ]
  },
  "effectivedtm" : 1569349466169,
  "metatag" : -5313397126116324049
}

Observation.code CodeableConcept Document

TODO

Observation.code CodeableConcept.coding

TODO

Elasticsearch Queries

For "lastn" requests, an Elasticsearch query would be dynamically generated that would include the following elements:

  • A query section that filters the result set by Observation.subject, Observation.category, Observation.effective[x] and/or other coded search parameters as listed above.
  • A series of nested aggregation sections:
    • Aggregate first by Observation.code ("normalized_code" field in the document).
    • Within each Observation.code grouping, sort by the "effectivedtm" field and include only the top "N" Observation identifiers where N is specifed by the "max" query parameter.

An example query is shown below:

{
  "size": 0,
  "query": {
    "bool" : {
      "must" : [
        {
          "term": {
            "subject": "Patient/91b8b433-5546-42f1-a74f-224949c045c0"
          }
        },
        {
          "term": {
            "category.code_system_hash": 4658901908624248865
          }
        }
      ]
    }
  },
  "aggs": {
    "group_by_code" : {
      "composite" : {
        "size" : 100,
        "sources" : [
          { "code" : { "terms" : { "field" : "normalized_code" } } }
        ]
      },
      "aggs" : {
        "most_recent_effective": {
          "top_hits" : {
            "sort" : [
              {
                "effectivedtm" : {
                  "order" : "desc"
                }
              }
            ],
            "_source" : {
              "includes" : [ "identifier" ]
            },
            "size" : 3
          }
        }
      }
    }
  }
}

The above query would return the 3 most recent Observations for each Observation.code (up to a maximum of 100 codes) in the specified category for "Patient/91b8b433-5546-42f1-a74f-224949c045c0".

Elasticsearch Configuration

TODO: A mechanism will be required to configure server settings (i.e. host and port) as well as authentication for connecting to Elasticsearch.

Elasticsearch API

See https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html.

Detail Design

Design Decisions

The following section documents specific challenges in the design where multiple options were considered and the rationale for the proposed approaches.

Filtering Observations by Search Parameters

An early challenge was determining how or whether to filter Observations by Search Parameters. Per http://hl7.org/fhir/observation-operation-lastn.html:

The request for a lastn query MAY include:

- Other Observation search parameters and modifiers

The challenge for supporting search parameters with "lastn" requests is that Observation results need to somehow be filtered based on the search parameters first before the additional processing for "lastn" can be done (i.e. grouping by Observation.code, sorting by Observation.effective[x] within each Observation.code group and finally return the "N" most recent Observations within each each group).

Currently HAPI FHIR includes extensive database/SQL backed support for Observation search parameters. However SQL databases are not well suited to efficiently support requests like "lastn". Elasticsearch is much better able to handle requests like "lastn", but duplicating support for the search parameters in Elasticsearch would be time consuming and will create maintenance challenges in the future.

As there does not appear to be an immediate requirement to support filtering by all search parameters, it is proposed that filtering only be supported for Observation.subject, Observation.category, Observation.effective[x] and other coded values as listed earlier in this document.

Aggregating Observations by Observation.code when Observation.code has multiple codings

The FHIR specification for Observation resources, http://hl7.org/fhir/observation.html, defines Observation.code as a CodeableConcept which is defined as being either a single text element or an array of one or more coding elements. According to http://hl7.org/fhir/observation-operation-lastn.html, an Observation.code that has multiple codings should be considered a match to another Observation.code if any of the codings in the first Observation.code match any of the codings in the second Observation.code. Although Elasticsearch does support this type of matching, using Observation.code for aggregation can lead to query results being duplicated. Specifically if an Organization.code has multiple codings, the aggregated results for this Organization.code will be repeated for each coding. Unfortunately this is by design.

A couple of options considered to address or avoid this problem included:

  1. Adding a post-processing step in HAPI FHIR to consolidate the duplicated results.
  2. Normalizing Observation.code values in Elasticsearch index when Observations are first created, and updating the Observation index later when/if codings are later added to an existing Observation.code.

The first option would be the simpler of the two, but would result in an ongoing and possibly growing performance impact as codeable concepts evolve.

The second option would be more complicated because it would require a mechanism to map codings to Observation.code (i.e. CodeableConcept) elements and a separate scheduled task to periodically update the normalized Observation.code values in existing Observation documents when new codings are added to an existing Observation.code. This option would have the advantage of having negligible impact on production requests as most of the additional work required for this option could be done during creation of Observation resources and/or through schedueld task running in the background.

Given the longer-term advantages of the second option, it is proposed that Observation.code values be normalized in the Elasticsearch index.

Indexing Codings in Observations

Codings are defined as being composed of two elements, "code" and "system". Elasticsearch does support indexing of fields like codings as "nested fields" and allows filtering query results by nested fields. Nested indexing however complicates the queries and requires more memory as nested fields are indexed as separate documents. Also, Elasticsearch has limits on the number of nested fields allowed in a document.

Rather than indexing codings as "code" and "system, it is proposed instead that a distinct hash code for each "code/system" pair be used in the Observation documents for indexing coding elements.

Detailed Technical Design

TODO.

Performance

Proof-of-concept testing has been done using an Elasticsearch Repository with the following:

  • One thousand distinct observation code values.
  • One million patients, each with a 25 observations:
    • 15 with Observation codes randomly chosen from the one-thousand available.
    • 10 with a single common Observation code randomly chosen from the one-thousand available.

The following tests were performed:

  • Query to retrieve all distinct Observation codes from all Observation documents:
    • Test repeated 1000 times.
    • Average response time from Elasticsearch was 16ms per query execution.
  • Query to retrieve all Observations for a randomly selected patient:
    • 1000 patients were selected at random from the Elasticsearch repostiory.
    • Query to retrieve all Observations was repeated for each of the 1000 patients.
    • Average response time from Elasticsearch was 5ms per query execution.
  • A "lastn" type query that retrieves the 5 most recent Observations per Observation.code for a randomly selected patient:
    • 1000 patients were selected at random from the Elasticsearch repostiory.
    • Query to retrieve last 5 Observations/code was repeated for each of the 1000 patients.
    • Average response time from Elasticsearch was 5ms per query execution.

Upgrade Considerations

TODO: May need to consider a way for creating an Elasticsearch repository for an existing HAPI FHIR implementation.