EBSD Dictionary Indexing

Please note that a more extensive tutorial for dictionary indexing can be obtained in the Complete Examples section (item 7) of this wiki collection.

Dictionary Indexing, or DI for short, is a recently developed technique for diffraction pattern indexing that employs the complete pattern instead of extracting features from the pattern. In the traditional Hough-based indexing for EBSD patterns, individual Kikuchi band orientations and locations are extracted from the pattern and then processed to extract interplanar or interzonal angles that are then compared against a lookup table. In the DI approach, no feature extraction is performed and complete patterns are compared to simulated patterns; hence the need for a good forward model to predict patterns.

In the sections below, we attempt to describe the complete process of taking experimental patterns and indexing them using the EMEBSDDI program. This process is somewhat involved since there are a lot of input parameters needed to make things work.

Acquiring and preparing the data

This is the most crucial step for the whole DI approach. As you prepare to acquire your experimental data, you should add one step to your usual procedure, namely the acquisition of a single full size high quality pattern taken from near the center of your region of interest (ROI). This means no binning, and likely a longer exposure time. It is important that you do not change any microscope settings after you acquire this reference pattern; if you do change things, then the detector parameters determined by the fitting routine will be incorrect.

So, the suggested experimental procedure is as follows:

locate your ROI;
set up all the parameters for your data acquisition;
record a full high quality reference pattern (and store it in a separate file);
set the proper binning and exposure time for your indexing run;
and execute the run.

You can let the acquisition software index the patterns and generate an .ang or .ctf file, as usual; that way you can compare the DI results with the Hough-based indexing.

Dictionary indexing relies on the availability of experimental patterns, stored in some convenient format (so you must store all the patterns; make sure you have plenty of disk space to do so. EMsoft supports the following data formats:

Binary: this format was originally implemented to convert individual pattern files (jpeg, tiff or bmp) into a single file with extension .data. We do not recommend that this format be used, but if you really have no other way to convert thousands of pattern files to any of the formats below, then this will be your only remaining option. Contact us to get a MatLab script that will generate the binary file for you.
TSLup1 and TSLup2: The EDAX/TSL pattern acquisition software allows you to export patterns in .up1 (1-byte per pattern pixel) or .up2 (2-bytes per pattern pixel) formats; EMsoft can read both file formats.
TSLHDF: Recent versions of the EDAX/TSL acquisition software allow you to export the pattern to an HDF5 formatted file; this is really the preferred way of transporting pattern data, and EMsoft can readily read this format.
EMEBSD: When the EMEBSD program is used with the makedictionary parameter set to 'y', then the output file will be an HDF5 file that can subsequently be read by the indexing program.
BrukerHDF: Recent versions of the Bruker acquisition program can export HDF5 files that can be read by EMsoft.

At the time of writing of this document (May 2018), there is no option to read any of the Oxford/Aztec output formats, except for the individual pattern files after they have been assembled in the Binary format described above.

Computing the master pattern

For each of the phases present in your sample, you need to compute a master pattern using the EMMCOpenCL program (with the correct sample tilt , microscope voltage, and crystal structure) followed by the EMEBSDmaster program for the actual master pattern. You will use these files as input to the indexing program. Details of these programs can be found in other help pages; there is also a worked out example available.

Fitting the detector parameters

This step is very important and makes use of the reference pattern that you recorded earlier. Details of the process are described on a separate help page.

Set up the DI parameters

The EMEBSDDI program takes a lot of parameters via the usual name list mechanism; execute the following command:

EMEBSDDI -t

to generate the template file, which will have the following content (broken up into blocks with explanations for clarity):

 &EBSDIndexingdata
! The line above must not be changed
!
! The values below are the default values for this program
!
!###################################################################
! INDEXING MODE
!###################################################################
!
! 'dynamic' for on the fly indexing or 'static' for pre calculated dictionary
 indexingmode = 'dynamic',
!

The indexingmode can take two values: dynamic or static; in static mode, the program will use an existing dictionary file created by the EMEBSD program. This mode is only recommended if you have a lot of data sets to index and they all have the same detector parameters (for instance, multiple slices from a FIB experiment). This requires a computer with a lot of memory (many tens of Gb of RAM, and lots of disk space). In the dynamic indexing mode, the dictionary patterns will be generated on-the-fly during the indexing process.

!###################################################################
! DICTIONARY PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! do you want Email or Slack notification when the run has completed?
 Notify = 'Off',
! height of data set in pattern input file
 ipf_ht = 100,
! width of data set in pattern input file
 ipf_wd = 100,
! define the region of interest as x0 y0 w h;  leave all at 0 for full field of view
! region of interest has the point (x0,y0) as its lower left corner and is w x h patterns
 ROI = 0 0 0 0,
! X and Y sampling step sizes
 stepX = 1.0,
 stepY = 1.0,
! number of top matches to keep from the dot product results
 nnk = 50,
! number of top matches to use for orientation averaging
 nnav =  20,
! number of top matches to use for Orientation Similarity Map computation
 nosm = 20,
! to use a custom mask, enter the mask filename here; leave undefined for standard mask option
 maskfile = 'undefined',
! mask or not
 maskpattern = 'n',
! mask radius (in pixels, AFTER application of the binning operation)
 maskradius = 240,
! hi pass filter w parameter; 0.05 is a reasonable value
 hipassw = 0.05,
! number of regions for adaptive histogram equalization
 nregions = 10,

The parameters in this block are common to dynamic and static indexing modes. Since the final output of indexing is usually an inverse pole figure (ipf) map, you must specify the ipf width and height (in pixels) for the complete data set; let's say that our data region is 600x400 pixels. You can then select a sub-region via the ROI parameter, which has four integers; if all integers are set to 0, then the complete 600x400 ipf is indexed. If the integers are 60 100 200 200, then a square area of 200x200 pixels is selected with one corner located at the point (60,100). The sampling step size is next and is specified in microns. The next three integers (nnk, nnav, and nosm) define, respectively, how many of the top matches should be kept in the output file (typically 30-50 would be ok); how many of the top matches should be used to generate an IPF with orientations averaged over the top nnav matches; and how many top matches should be used to generate the orientation similarity map (OSM). Then the user can specify the filename for an optional mask file; this is an experimental option in which one can define an arbitrary mask to be applied to the patterns before indexing. For details of the file format, see at the bottom of this help page. If maskpattern is set to 'y', then a circular mask of radius maskradius will be applied before indexing; this can be used to exclude the outer portion of the patterns. Finally, the hipassw and nregions parameters define the preprocessing parameters for the high pass filtering and adaptive histogram equalization steps that all patterns (both experimental and simulated) undergo before indexing. See the manual page for the EMEBSDDIpreview program for an explanation on how to determine these parameters.

!###################################################################
! ONLY SPECIFY WHEN INDEXINGMODE IS 'DYNAMIC'
!###################################################################
!
! number of cubochoric points to generate list of orientations
 ncubochoric = 100,
! distance between scintillator and illumination point [microns]
 L = 15000.0,
! tilt angle of the camera (positive below horizontal, [degrees])
 thetac = 10.0,
! CCD pixel size on the scintillator surface [microns]
 delta = 50.0,
! number of CCD pixels along x and y
 numsx = 640,
 numsy = 480,
! pattern center coordinates in units of pixels
 xpc = 0.0,
 ypc = 0.0,
! angle between normal of sample and detector
 omega = 0.0,
! minimum and maximum energy to use for interpolation [keV]
 energymin = 10.0,
 energymax = 20.0,
! energy averaging method (0 for exact, 1 for approximate)
 energyaverage = 0,
! spatial averaging method ('y' or 'n' ;can't be used with approximate energy average)
 spatialaverage = 'n',
! incident beam current [nA]
 beamcurrent = 150.0,
! beam dwell time [micro s]
 dwelltime = 100.0,
! binning mode (1, 2, 4, or 8)
 binning = 1,
! intensity scaling mode 'not' = no scaling, 'lin' = linear, 'gam' = gamma correction
 scalingmode = 'not',
! gamma correction factor
 gammavalue = 1.0,
!

In this block we define the detector parameters and the orientation sampling. The ncubochoric parameter defines the angular step size in orientation space; typically aa value of 100 will produce good results. The detector parameters are L (distance to detector), thetac (detector tilt from vertical), CCD pixel size, the number of pixels along x and y, the pattern center in units of pixel size (for definition, see the EBSD patterns simulation help page), omega (sample misalignment along RD axis), the energy range to be used in the pattern interpolation, beam current and dwell time (values don't really matter for indexing, as long as they are both non-zero), binning, scalingmode (typically you would use gamma scaling), and the gamma value (0.33 is a good value). The parameters energyaverage and spatialaverage are experimental and should not be used; they will be removed in a later version.

CHANGES in VERSION 5.0.3
There are two new parameters in the namelist file starting with version 5.0.3:

!
! size of the *Experimental* patterns in pixels, and the binning factor to be used.
! Note that the binning factor is *only* applied to the experimental patterns. The 
! dictionary patterns will be declared to have the binned size. 
 exptnumsx = 640,
 exptnumsy = 480,
 binning = 1, 
! size of the *Dictionary* patterns in pixels; this will be set to be equal to the size
! of the experimental patterns divided by the binning factor by the EMEBSDDI program.
! You can set the next two parameters to the correct values if you wish, but these 
! parameters will be overwritten by the program; you can also comment out these two lines. 
 numsx = 640,
 numsy = 480,

The new parameters exptnumsx and exptnumsy take over the role of the old numsx and numsy parameters which are now obsolete (but still present in the file). The reasoning behind this is that one often has large experimental patterns, say 1244x1024, but the indexing should be done on smaller patterns, say with 8x binning. The dictionary pattern size will automatically be set to (exptnumsx,exptnumsy)/binning.

In addition to these changes, the pattern center coordinates (xpc, ypc) and detector pixel size delta also need to be modified. The assumption is that the PC coordinates were determined for the full size pattern before binning; to account for the binning, the pixel size delta must be multiplied by binning, and the (xpc,ypc) values need to be divided by binning. In the template file this is indicated as follows:

! the following three parameters require some careful consideration.  
!=================================================================================================!
! Please check the wiki EMEBSDDI help page for detailed information on how to set these parameters!
! This process has changed starting with EMsoft 5.0.3.                                            !
!=================================================================================================!
! detector pixel size [microns]
 delta = 50.0,
! pattern center coordinates in units of pixels, origin at center of detector
 xpc = 0.0,
 ypc = 0.0,

END CHANGES in VERSION 5.0.3

!###################################################################
! INPUT FILE PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! name of datafile where the patterns are stored; path relative to EMdatapathname
 exptfile = 'undefined',
! input file type parameter: Binary, EMEBSD, TSLHDF, TSLup1, TSLup2, OxfordHDF, OxfordBinary, BrukerHDF
 inputtype = 'Binary',
! here we enter the HDF group names and data set names as individual strings (up to 10)
! enter the full path of a data set in individual strings for each group, in the correct order,
! and with the data set name as the last name; leave the remaining strings empty (they should all
! be empty for the Binary and TSLup1/2 formats)
 HDFstrings = '' '' '' '' '' '' '' '' '' '',
!

Next we have information about the pattern input file. There are several types (described above) and the correct type should be entered in the inputtype variable. The filename goes in the exptfile parameter (along with the appropriate partial path). If the input file is an HDF5 file, then you must define the complete path inside this file. For instance, if the pattern data set is called EBSDpatterns, and it is located inside a nested group Scan 1/data/EBSD, then you would enter four strings for HDFstrings: 'Scan 1', 'data', 'EBSD', and the last one is the data set name 'EBSDpatterns'. Note that these strings are all case sensitive, so make sure you get them right. You can use the HDFView program from the HDF Group to figure out what the correct strings are. Leave the other strings (there are 10 in total) empty.

!###################################################################
! OTHER FILE PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! temporary data storage file name ; will be stored in $HOME/.config/EMsoft/tmp
 tmpfile = 'EMEBSDDict_tmp.data',
 keeptmpfile = 'n',
! output file ; path relative to EMdatapathname
 datafile = 'undefined',
! ctf output file ; path relative to EMdatapathname
 ctffile = 'undefined',
! average ctf output file ; path relative to EMdatapathname
 avctffile = 'undefined',
! ang output file ; path relative to EMdatapathname [NOT AVAILABLE UNTIL RELEASE 3.2!!!]
! angfile = 'undefined',
! euler angle input file
 eulerfile = 'undefined'

This block defines where all the results and temporary files will be kept. The indexing program uses a temporary file with the pre-processed patterns in the standard tmp folder (usually in the .config/EMsoft/tmp folder in your user home directory). You need to define the name of this temporary file in the tmpfile variable (no path necessary); it is important to pick a unique name if you are running multiple simultaneous indexing runs. You can keep the file is you want by setting keeptmpfile to 'y'. The indexing output is stored in two files: datafile is an HDF5 output file that has all the program output in it, whereas ctffile is a standard Oxford .ctf output file that can be read by most EBSD analysis programs. If you define the avctffile parameter (optional), the program will also generate a .ctf file with the averaged orientations, using the top nnav best matches). In a future version we will also have the option to output an EDAX/TSL .ang file. If you set the eulerfile parameter to anything other than 'undefined', then the program will use the orientations in that file instead of the cubochoric sampling of orientations controlled by the ncubochoric parameter. This can be useful if you know that all the orientations are clustered around some orientation; you can then use the EMsampleRFZ program to generate a uniform sampling around that orientation instead of sampling the complete Rodrigues fundamental zone.

!###################################################################
! ONLY IF INDEXINGMODE IS STATIC
!###################################################################
!
 dictfile = 'undefined',
!

In static indexing mode, this is where you define the file that has the complete dictionary in it. Dictionary files can get very, very large, so be careful if you decide to use static indexing. It can be useful for serial sectioning data sets, where you use the same dictionary for all consecutive slices. In our experience, it is usually best to use the dynamic indexing mode.

!###################################################################
! ONLY IF INDEXINGMODE IS DYNAMIC
!###################################################################
!
! master pattern input file; path relative to EMdatapathname
 masterfile = 'undefined',
!

In this block you define the master pattern file from which all the dictionary patterns are computed.

!###################################################################
! SYSTEM PARAMETERS: COMMON TO 'STATIC' AND 'DYNAMIC'
!###################################################################
!
! number of dictionary files arranged in column for dot product on GPU (multiples of 16 perform better)
 numdictsingle = 1024,
! number of experimental files arranged in column for dot product on GPU (multiples of 16 perform better)
 numexptsingle = 1024,
! number of threads for parallel execution
 nthreads = 1,
! platform ID for OpenCL portion of program
 platid = 1,
! if you are running EMEBSDDI, EMECPDI, EMTKDDI, then define the device you wish to use 
 devid = 1,
! if you are running EMEBSDDImem on multiple GPUs, enter their device ids (up to eight) here; leave others at zero
 multidevid = 0 0 0 0 0 0 0 0,
! how many GPU devices do you want to use?
 usenumd = 0,
 /

This final block controls the computational resources. Dictionary indexing requires a GPU (graphical processing unit); use the EMOpenCLinfo program to figure out the platform and device IDs for the GPU that you intend to use for indexing. In the present version of the code, only one single GPU can be used, but the namelist file already allows for multiple devices. If the GPU you want to use is part of platform 2, and is device number 4 (you should be so lucky...) then set platid to 2 and devid to 4; also, put usenumd to 1 and the first entry of multidevid to the same number as devid. The nthreads parameter defines how many CPU cores (threads) you wish to use for the pattern computations; the GPU takes care of computing the pattern dot products while the threads do their thing. Finally, the numdictsingle and numexptsingle parameters define how big the memory chunks are that the program will send to the GPU; for optimal performance, this number must be a multiple of 16. If you set these parameters too large, then the GPU will not have sufficient global memory to perform the computations, and the program will likely abort with an error message. So it can take a bit of experimenting to figure out what the best values are; it is suggested that you keep both numbers set to the same value. If your pattern size is 640 by 480, then the patterns will be organized as 1D vectors of length 640x480=307,200 and the GPU will receive two arrays of single precision floating point numbers of dimensions 307,200 by numdictsingle.

Note regarding the temporary files

It should be noted that the temporary pattern files that are generated by the indexing program can become very large, in some cases tens to hundreds of Gb, depending on your pattern size and how many patterns you have (obviously). If for some reason the indexing program aborts, or you decide to cancel the run, then this file will likely not be deleted. So, sporadically, you may want to make sure that the $HOME/.config/EMsoft/tmp folder on your drive is emptied, just so you won't run out of disk space. It is very easy to fill entire hard drives with these indexing runs...

Format for optional mask file

You can define your own pattern mask by generating a text file with the mask defined by strings of 1s and 0s. Let's assume that your patterns are 16 by 16 pixels (unlikely to happen, but this is just an example); then you generate a text file that has 16 strings of 16 characters each, with character 0 meaning no intensity will be allowed in that pixel, and 1 the opposite. So, your text file will look something like this:

0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000
1111111111111111
1111111111111111
1111111111111111
1111111111111111
0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000
0000001111000000

This mask consists of a horizontal and a vertical band, each 4 pixels wide. You can play around with these masks to see how much pattern information you can remove and still be able to index the patterns. Obviously, the mask must have the correct size for your patterns (after binning).

THIS OPTION HAS BEEN DIABLED STARTING IN VERSION 5.0.3

Executing the indexing program

Once you have set up the name list file, you can run the program in the usual way:

EMEBSDDI inputfile.nml

Indexing runs can take a long time, and produces only a little bit of output; the dictionary is divided into chunks of numdictsingle patterns, and the GPU then computes the dot products between all experimental patterns and this chunk of the dictionary. When that is completed, a single line of output is produced that shows the largest dot product found for this chunk. Every ten chunks, an update of the 'time remaining' is shown as well. At the end of the run, all the requested output files are generated. At this point in time, it is not possible to interrupt the program and have it restart from where it was interrupted; this would obviously be a useful feature to have, but it is actually rather difficult to implement, given the complexity of the GPU + multi-threads coding.

Wiki pages are maintained by M. De Graef; they are part of the EMsoft package and fall under the same copyright (BSD2).