Skip to content
This repository has been archived by the owner on Nov 29, 2020. It is now read-only.
daa2018 edited this page May 4, 2018 · 3 revisions

1. What does the question mark on a thumbnail in the preview panel mean?

It means that this page has not yet passed the current stage, and therefore, ST can not display it as being processed at this stage. For example, if the current stage is DESKEW, non-processed pages will be displayed without slope compensation and with a question mark. It is also possible that the page has passed the current stage, but after that, some changes were made that require the reprocess of this stage. For example, the page had passed the SPLIT PAGES stage, but after that you made changes at FIX ORIENTATION.


2. I'm not allowed to do Output, saying that I need to complete previous stages - but I've already passed them !

It means, that after processing all the previous stages, you went several steps back and changed something. In this case, you will have to do batch processing at the stages "SELECT CONTENT" or "MARGINS" ("Page Layout") again. Do not worry, reprocessing will be finished much faster than the first time, because probably it will affect only several changed pages.


3. I started working on a scantailor project, but missed some pages. Can I add them to the existed project?

At all stages of the processing, it is possible to add / delete files from the project using the pop-up menu, which is called by right-clicking on a page thumbnail. The new files don't have to be in the same directory. In fact, they are not even required to have names distinct from those files that are already in the project. If you try to add a file with an explicitly wrong DPI, the program will call the Fix DPI dialog.


4. Scan Tailor's auto mode often makes failures with my scans. Why ?

If errors occur very often, it means that your source material violates pre-assumptions made by Scan Tailor. Here are some of them:

  • there are fields around the content, and the larger they are, the better. Anything touching the edge is assumed to be garbage;
  • there are at least two lines of text on the page;
  • the DPI written in images or specified manually is near the real one.
  • letters are not too small (text is readable, not degraded), what sometimes happens for camera shots;

Frequently problems arise when scanner software pre-processes (improves) raw scans. This material is an inexhaustible source of problems. In general, good work on such scans should be perceived as a miracle, and a bad one - as a reality. Unless you have a good reason, feed Scan Tailor with raw scans / photos. Scan Tailor likes its input to be as raw as possible, that is the less "enhancements" done by the scanner software, the better.


5. What does the top right button (located over the thumbnails panel) do? What is it useful for?

It has tooltip "keep current page in view". This button scrolls the thumbnails, bringing the current page (opened also in the central window) back into your view in the middle of the thumbnails panel. Usage scenarios:

  1. You can inspect arbitrary thumbnails on the panel during batch processing (in search for errors). After manually scrolling through the thumbnails, the button switches to the non-pressed position. If you press the button, the tape of thumbnails will scroll back to the current processing image. You will see how much is already done and how much is left. So in batch mode this button works as auto-scroll mode switch.
  2. This button is also very useful if you ordered the thumbnails by increasing width or by increasing height and are doing some corrections. If so, you will lose your page out of view after its margins or content area changes in the central window. Pressing this button brings it back into view. So in manual mode this button works as a command "go to the current page".

6. On some pages the content is reduced to a fraction of its original size. Is there a workaround for this problem?

You should check if there is a mismatch in dpi settings for that page. The current DPI provided either by the file itself or by you manually, is much higher than the real value. Call "Fix DPI" command from the menu and find this page in the list. Don't do it in the "Needs Fixing" tab, as it only contains obviously wrong pages. Do it in "All Pages" tab instead. If you inspect DPIs for each page in the "Fix DPI" dialog, you should be able to locate and fix such files there. Note that it doesn't change the files themselves, it just saves the correct values to Scan Tailor's project file.


7. I don't understand the DPI concept. How to estimate unknown DPI ?

DPI stands for "dots per inch" which defines the correspondence between the physical dimensions (inches, centimeters), and pixel size of an image. Pixels here are the points which make up digital image. Horizontal and vertical components of the DPI are usually the same.

For example, you have scans with the size of 2816 x 2112 pixels. If we assume that their DPI is 300 (300 horizontal & 300 vertical), then the physical dimensions are: (2816 / 300) x (in 2112 / 300) = 9,4 x 7 inches = 23,8 x 17,8 cm. Resolution is determined by the scanner settings, and scanning software usually produces correct DPI. However during further manipulations it can be lost. In this case you have to restore its normal value. The most common DPI's are 300 and 600. If you know the media size roughly and image pixel sizes exactly, then the following table with A4 and half of A4 sizes at different DPIs may help you evaluate the scanning resolution:

For digital camera images you have always to correct DPI manually. A camera can't give you the correct DPI, as it doesn't know neither the real size of the object, nor the distance to it. You may need to measure the physical dimensions of the photographed area and then divide the number of pixels by the length in inches to obtain the DPI. Keep this in mind before you get rid of a book, though you can often find the book's physical dimensions online. In case you don't have access to the physical book any more to measure it, there is still a way to roughly estimate the DPI. You do it by loading one page into a graphics editor (Gimp will do) and use a rectangular selection tool to select 6 lines of text. Gimp displays the pixel size of your selection while you are selecting it. The height of that selection is roughly your DPI. That's because most books are printed in such a way that 6 lines of text fit vertically into one inch. Sure, no one forces book publishers to fit exactly 6 lines in one inch. Still, it's a reasonable estimate that's expected to produce good results with Scan Tailor. BTW, for books with a small font size, it might be beneficial to make Scan Tailor think it's larger than it is by specifying a lower DPI.

Changing DPI in the "Fix DPI" dialog you need to work on the "All Pages" tab rather than on "Need fixing", as some of these images apparently look normal to Scan Tailor.


8. Everything is OK with the DPI in my scans. Why does ST ask me to fix it ?

When ST declares that the image resolution (dpi) requires correction, it checks the following horizontally and vertically:

  1. The resolution is within [150-9999]. If dpi <150, then it is too small to get acceptable results. If dpi> 9999, then it is most likely wrong.
  2. The physical image size for the given DPI does not exceed 50 cm. (500 < 25.4 * pixel_size / dpi) So, if not, then in fact, not DPI is wrong, but the image size is too big. This may really mean that you are working with very large print materials, but most likely this indicates an incorrect DPI (too low). An underestimated DPI can easily lead to lack of memory, especially in 32bit PC environment.

Hint: In this case you can try to cheat Scan Tailor by entering fake higher dpi values in Fix DPI dialog. But you need to make it proportionally higher for other pages as well.


9. There is strange light exposure on white/black scans. The cover images sometimes are not outputed well either. What is this? How to process such pages ?

Scan Tailor wasn't designed to work with light text on a dark background. Output this text page in Grayscale/Color mode and then binarize the text in Gimp / Photoshop or some other software. ST's binarization doesn't really good work for non-document color images. It is also suggested to output this cover page in "Color / Grayscale" mode. Don't enable illumination equalization there.


10. How to correct geometric distortions for the photo-page below ?

First of all, if you have two pages on a single output image, your "Split Pages" settings are wrong. Dewarping is not going to work on two pages at a time. However, your case is too hard for automatic dewarping to handle, so you will need to adjust the dewarping grid manually.


11. How can ScanTailor be automated? I have programmed an application ... [calling some others in a batch]. Is there a command-line version of Scan Tailor?

The CLI branch (Scan Tailor Command Line Interface) was actually included in the main version some time ago. So, if you are on Windows, you can just download the ST distribution package and you'll find scantailor-cli.exe inside. The CLI version is able to process images from command line. You also can give it an existed project file. If output_directory is specified as last argument, it overwrites the one in project file.

scantailor-cli [options] <project_file> [output_directory]

scantailor-cli [options] <image, image, ...> <output_directory>


12. How to build ST for Windows from the source files?

There is a README file in the source archive. It is the required assembly instruction.


13. How does output TIFF type correlate with the input type ? Will the 16-bit TIFF sent to the ST produce better results or is it not necessary?

Scan Tailor output TIFFs for B/W output mode are 1 bit, for Color/Grayscale mode - RGB or grayscale, depending on the original. It's either 8, 24 or even 32 bits if the original image had transparency. It never goes beyond 8 bit per channel though. In Mixed mode ST outputs grayscale or color images and doesn't output B/W ones, even for pages without any pictures. That's an implementation detail. It's always the same as "Color / Grayscale" mode, that is 8 bit Grayscale or 8+8+8 bit RGB or 8+8+8+8 bit RGBA. There is no benefit to export 16 bit images for input into Scan Tailor. As all internal processing is done in 8 bits per channel, the very first thing that's done to an input image is converting it to that format.


14. Is it possible to get separate output for the mask and background layers in ST instead of assigning the task of dividing them to the DJVU encoder?

Separate output of picture images and text images in ST is not planned, although it is not difficult to organize, as this separation still occurs in a mixed mode. (FOR REFERENCE: such export of separated text/pictures output exists in branch "Featured".) ST's Mixed output and in fact all other output modes follow a certain convention to make it easy to separate B/W content from pictures. The convention is to never use pure black and pure white colors in pictures. Pure colors are reserved for text areas only. For 8bit grey images in picture areas color 0x00 is replaced with color 0x01 and color 0xFF with 0xFE. For both RGB32 and ARGB32 picture areas color 0x00000000 switches to 0xFF010101 and color 0x00FFFFFF to 0xFFFEFEFE. This behaviour makes it possible to detect those picture areas later and treat them differently, for example encoding them as a background layer in DjVu format. That's how such external utilities as PDFMaker and djvubind separate text and pictures, using ST output.

Clone this wiki locally