Skip to content

Preprocessing

Pete edited this page Nov 30, 2016 · 3 revisions

Estimate stain vectors

Estimate stain vectors is used to help improve the stain separation in brightfield images.

The example image here is from the OpenSlide freely-distributable test data.

Background

QuPath uses the color deconvolution method for stain separation, as described by Ruifrok and Johnston:

Ruifrok, A C, and D A Johnston. 2001. “Quantification of Histochemical Staining by Color Deconvolution.” Anal Quant Cytol Histol. 23 (4): 291–99. http://www.ncbi.nlm.nih.gov/pubmed/12610362.

Gabriel Landini has also provided a very useful ImageJ plugin to implement color deconvolution, and has written a description of the uses (and abuses) of the technique here. This is highly recommended reading, particularly as a warning against over-interpreting measurements made from DAB staining.

Fundamentally, color deconvolution works to digitally separate up to three stains from an RGB image. To do so, it is necessary to know:

  • The background values for each RGB channel (red, green and blue)
  • A stain vector, which characterizes the color for each stain

The purpose of Estimate stain vectors is to help identify these values, so that color deconvolution can do its job well.

Running the command

Viewing the default stain vectors

Stain estimation

The stain vectors are supposed to give a normalized representation of the color of each 'pure' stain in the image, without regard for staining intensity.

Default stain vectors are set whenever the Image type is Brightfield H&E or Brightfield H-DAB. You can see these (along with the image type) in the Image panel on the left.

Sometimes these defaults are ok, but often they are not a good match for the stains (or scanner) that was used - and so they should always be checked.

If the 'Image type' is incorrect, double-click on the entry in the Image panel to change it.

Find a representative region

Stain estimation region selection

Before running Estimate stain vectors, you should first find a representative region containing relatively clear examples of the stains that you want - along with an area of background, if possible. Then draw a rectangle annotation around this region.

If you choose a very large region, QuPath will have to downsample it to look for the stains. Since downsampling means averaging adjacent pixels - which dilutes the useful information - it's best to avoid it as far as possible.

In other words, you should try to choose a small region containing all the information you need.

Check the color deconvolution before changes

While not essential, it can be useful to have a look at the stain separation by color deconvolution prior to making any changes - to get a feeling for how well (or not) it is performing.

More information regarding how to do this is provided in the Changing colors section, however the quickest approach is to simply click on the image and press '1', '2', '3', or '4' to see 1) the original image, 2) the first stain (here, hematoxylin), 3) the second stain (here, eosin), and 4) the third stain (or residual information, if only two stains are present).

Stain 1 (hematoxylin)

Stain estimation original hematoxylin

Stain 2 (eosin)

Stain estimation original eosin

Stain 3 (residual)

Stain estimation original residual

Run Estimate stain vectors

Now you are ready to actually run the Analyze → Preprocessing → Estimate stain vectors command.

If the background contained in the region you have drawn does not match with the background values QuPath is currently using, it will prompt you whether you want to update the stored values.

Assuming that the region you have drawn does contain a representative area of background, you should probably click Yes. If not, click No.

Update background values

Stain estimation original hematoxylin

Check scatterplots

QuPath now builds scatterplots to help view the regionships between the red, green and blue values for each pixel.

Because it's quite hard to work with a 3D scatterplot, QuPath shows this in the form of three separate 2D scatterplots, systematically showing each color plotted against one of the others.

Additionally, QuPath draws colored lines to indicate the existing stain vectors.

Scatterplots for original stain vectors

Stain estimation original scatterplot

The scatterplots for the original stain vectors in the example image are shown above. Ideally, the stain vectors should tightly surround the majority of the scattered points.

However, here that is not the case. Quite a few points seem to be pulled too far towards the 'green' axis compared to the stain vectors, and the vectors seem too widely separated in the Red vs. Blue plot.

The vectors do not appear to be wildly inaccurate, but there is some room for improvement.

Scatterplots for updated stain vectors

Stain estimation updated scatterplot

Pressing the Auto button tells QuPath to try to make a better choice of stain vectors based upon the information in the selected region. They will automatically adjust, and the changes shown in the scatterplots.

QuPath makes its decision based on the parameters given below the plots. Hovering the cursor over the parameters will show some more information about what they do, and you can try adjusting them and press Auto again to see their effect.

Set a name for the updated stains

Stain estimation - naming stains

If you are happy with the results, press OK and then enter a name to identify your new stain vectors when prompted.

This name can help identify your stains later, e.g. within scripts. For this reason, is strongly recommended to add an informative (and unique) name at this point.

View the results

Stain estimation region updated

Now, the dialog window will disappear and you will be returned to the original image. Typically this does not look different at all, however the stain vectors in the Image panel will be updated to reflect the changes.

To see a bit more of what has happened, you can use the number keys (or Brightness/Contrast tool) again to view the color-deconvolved channels.

Stain 1 (hematoxylin, updated)

Stain estimation updated hematoxylin

Stain 2 (eosin, updated)

Stain estimation updated eosin

Stain 3 (residual, updated)

Stain estimation updated residual

Questions & Answers

Why does Estimate stain vectors matter?

If the stain vectors are sufficiently wrong, then commands that make use of cell detection (e.g. cell detection) may perform badly, because information from different stains is being mixed up.

The can also lead to strange or impossible results, such as cells being measured as having 'negative' amounts of particular stains.

When should Estimate stain vectors be applied?

This command should be used at the very first stage of analysis, before detecting or measuring anything.

This is because currently all measurements are made based upon whatever stain vectors were set when the measurement command was run (i.e. the measurements don't automatically update).

Do I need to apply Estimate stain vectors to every image?

No. Firstly, running the command at all only makes sense for brightfield images with chromogenic stains (e.g. H&E, or hematoxylin and DAB).

Secondly, if you have a large image set containing multiple images acquired with similar staining, then you are likely to want to estimate the stain vectors only on one 'typical' image, and then use the same vectors across all images.

To do this, you can create a script that sets the stain vectors and applies this across the images. To do this, select the Workflow panel and choose Create script.

This should result in a script something like the following:

setImageType('BRIGHTFIELD_H_E');
setColorDeconvolutionStains('{"Name" : "H&E updated", "Stain 1" : "Hematoxylin", "Values 1" : "0.54736 0.74187 0.38732", "Stain 2" : "Eosin", "Values 2" : "0.18043 0.95500 0.23540", "Background" : "252 251 247"}');

Depending on what commands you have run, you may see that there are other lines in the automatically-generated script. These should be removed.

By setting a suitable name whenever you updated the stain vectors, it should be possible to track down the lines you need.

You can then use Run → Run for project to set the stain vectors across multiple images within a QuPath project using your script.

How do I know if I am using sensible parameters?

If a large number of points fall outside the stain vectors in the scatterplots - or one or both of the stain vectors lie quite far wide of the scattered points - this is an indication that something has gone wrong.

Additionally, the colors of the stain vectors should approximately match the color of the stains.

A bad choice of stain vectors

Stain estimation outlier

In the scatterplot above, the stains have not been well-estimated. The reason is that some of the pixels within the selected region did not belong to either stain or background - quite likely from pen annotations drawn on the slide.

The Ignore extrema parameter is used to eliminate some outliers, to help improve the robustness against this kind of problem. However, in the example above this has a very low value (0.1%), which means that the outliers are used to determine the stain.

There are two potential ways to solve this:

  • Increase the value of the Ignore extrema parameter, or
  • Choose another region within which to run the Estimate stain vector command.

Simple tissue detection

Still to come...

Clone this wiki locally