Tissue Microarrays

One of the main uses of QuPath is for high throughput biomarker studies using Tissue Microarrays (TMAs) in cancer research. This section gives a very quick overview of the background, and as an introduction to a description of how QuPath can be applied for such studies.

Background

A TMA image contains multiple tissue samples, usually from different patients, which have all been stained in the same way. Each tissue sample is knowns as a (TMA) core, and each core is usually a circle of approximately 0.6-1.5 mm in diameter.

The purpose of studies involving TMAs is to be able to rapidly and efficiently evaluate biomarker expression across a large group of patients with a relatively small number of slides. Because the tissue samples are very small, and provide only a snapshot of one part of a typically much larger (and possibly heterogeneous) tumor, it is common for multiple TMA cores to be taken from the same patients - perhaps 3-4. Therefore we might end up with 1500 TMA cores created from tissue samples of 500 patients, and imaged across 15 slides with 100 cores per slide.

Goals

Commonly, TMA images are brightfield using the stains hematoxylin and DAB (H-DAB). Alternatives are possible (for example, cores can be created using other chromogenic stains (e.g. hematoxylin and eosin - H&E - or fluorescence), but here we focus on the common case of H-DAB.

The hematoxylin is the bluish 'counterstain' that should highlight every nucleus. The DAB is the brown color that indicates 'positivity' for the biomarker. Therefore a very informal definition of the problem is that we want to quantify the 'brownness' of the staining for each TMA core, which relates to the biomarker expression.

Having done this, we then want to relate this back to information about the patients, such as how long they lived after the tissue samples were recovered. This enables inferences to be made concerning how the biomarker expression relates to the prognosis for the particular type (or subtype) of cancer under investigation.

Challenges & Solutions

This introduction may have made the analysis of TMAs sound rather a lot simpler than it is in practice. Some of the main practical challenges are described below, along with the approaches offered by QuPath to address them.

TMA dearraying

Firstly, it is not trivial to match up patient information with the tissue cores present in an image. During TMA creation in the lab, a map is made that associates TMA core locations with patient identifiers. An 'ideal' TMA would contain a perfect grid of cores arranged in a predefined order matching with the map - with perhaps some asymmetry introduced by design (e.g. some extra or missing cores) to help make sure that it's clear if the slide is upside down. However, for various technical reasons, tissue cores are present on the map might not have managed to make it onto the slide - or they might be shifted from their expected location, or the tissue quality for a core may be poor and fragmented. Therefore the first challenge of analysis is TMA dearraying: figuring out the intended TMA grid to match with the map, and identifying which cores are missing or unsuitable for analysis. QuPath has its own automated algorithm to do this - along ith the ability to manually adjust the results if required.

Identifying cells

It's often not enough to just measure 'brownness', regardless of where it is. Depending on the biomarker, it may be expressed in a different location - typically the nucleus, membrane or cytoplasm of each cell. Scoring it well depends upon accurately identifying individual cells.

This can be achieved within QuPath by first detecting the cell nuclei (based upon the hematoxylin staining), and then expanding the nuclei to approximate the full cell area. Intensity measurements can then be made in all three regions: nucleus, cytoplasm and membrane.

Accurate cell detection can be tricky in the cases where the biomarker expression is particularly high, so that the nuclei are obscured. Nevertheless, QuPath offers a number of adjustable parameters to help get good results even when this happens, in addition to a separate cell detection algorithm to deal with biomarkers where positive staining is located primarily in the membrane (e.g. HER2).

Classifying cells

Quantifying the biomarker in all cells is usually not enough: rather, the type of cell makes a difference. For many common biomarkers (e.g. estrogen receptor, Ki67, P53) it's desirable to restrict scoring only to tumor cells - although in other cases different cell types may be of interest.

QuPath facilitates this by providing trainable cell classification. The user browses the image, adding training annotations manually to and classifying these manually to indicate how the cells within the annotation should be classified, and then QuPath uses this information to figure out likely classifications for all the cells in the image. Through the use of efficient data structures and a fast random forest classifier, this can be done interactively - with hundreds of thousands of cells within a slide being reclassified within seconds of adding a new training region, and all results updated accordingly. Classifiers created this way can then be saved, and applied in batch across other images.

An approach taken by many commercial platforms offering biomarker analysis is to identify regions of interest (i.e. tumor regions, usually) at an early stage, and then later detect cells within these regions.

QuPath can operate like this... although it's not the preferred QuPath way. Rather, a better approach with QuPath is to instead take advantage of fast cell detection algorithms and identify all cells up-front - regardless of where they are located. Cells are then classified afterwards, and only the required cells are scored for the biomarker.

There are two major benefit of QuPath's approach:

Cell information is available when regions are classified, not only pixel information or textures... so the classifier has more useful information to work with

Cell detection is comparatively time-consuming, but classification in QuPath is fast. By doing the slow bit first, it's very quick to see the final results immediately after training the classifier.

Computing scores

Having completed the above steps, you typically end up with a huge number of data points: many thousands of cells, each with a classification and multiple measurements related to quantified biomarker expression. It is then necessary to distill these into scores for each individual patient.

QuPath provides the the ability to set up to 3 intensity thresholds in order to subcategorize each cell according to staining intensity (negative, weak, moderate or strong positive). These are then converted into common summary measurements used in pathology, including positive percentage, H-score and Allred score. Furthermore, through scripting it is possible to recombine the information into an infinite range of alternative scores if required.

Survival analysis

Having performed the image analysis, it's usually desirable to apply some further analysis - typically survival analysis. QuPath can help with this in two ways:

QuPath supports exporting the results in the form of a CSV file that can be easily imported elsewhere, and optionally a collection of JPEG images that provide visualizations of the original cores and the detected cells color-coded according to classification.
If desired, patient IDs (either in TMA maps or spreadsheet columns) can be imported into QuPath, as can some additional metadata and survival information associated with TMA cores. By including this information in any exported results, QuPath can help simplify the often-troublesome task of linking up image analysis and patient data afterwards.

In addition to the above, QuPath also provides its own user interface for basic survival analysis. While not a replacement for a dedicated statistical package, this has the advantage of linking back the survival information with the exported TMA cores - in addition to offering a range of tools to explore the impact of different censoring times, scoring methods and cutoff thresholds. This can greatly assist with having a 'first look' at results immediately after export.

Amount of data

In our hypothetical example above with 500 patients, 1500 cores were produced across 15 slides. At a conservative estimate of 20 GB per slide (scanned at x40 magnification), this represents 300 GB of raw image data (although it may require less space when saved due to image compression).

This is only for one biomarker. In a single study, many more biomarkers can be required - multiplying the size of the data requiring analysis accordingly.

QuPath includes fast, efficient algorithms that support parallel processing using all processor cores available in a modern computer. Furthermore, QuPath logs the commands that are run interactively, to help with turning these into scripts for batch processing later. Running in batch mode, it typically only takes a few minutes to analyze each slide on a desktop computer - and because QuPath is free and open source, you can scale this up to as many computers and processors as you have access to, to increase throughput as necessary.

These docs are for QuPath ≤ v0.1.2.

For more up-to-date information, see https://qupath.readthedocs.io