Skip to content

Latest commit

 

History

History
68 lines (50 loc) · 3.63 KB

README.md

File metadata and controls

68 lines (50 loc) · 3.63 KB

Processing Modules

The processing modules in Parsr perform a central role of cleaning and enriching the extracted raw output. Each module performs a particular operation on a document representation, generates a new valid Document, and then passes it on to the next module for the next treatment. Each module can contain a set of configurable parameters, which can be consulted in the per-module documentation pages below:

1. Current Processing Modules

  1. Drawing Detection
  2. Header and Footer Detection
  3. Heading Detection
  4. Hierarchy Detection
  5. Image Detection
  6. Key-Value Pair Detection
  7. Lines to Paragraph
  8. Link Detection
  9. List Detection
  10. Number Correction
  11. Out of Page Removal
  12. Page Number Detection
  13. Reading Order Detection
  14. Redundancy Detection
  15. Regex Matcher
  16. Remote Module
  17. Separate Words
  18. Table Detection
  19. Table of Contents Detection
  20. Whitespace Removal
  21. Words To Line

2. Create your own Processing Module

Creating a custom module can be very useful to add some treatment on the document.

You have two ways to do it:

  1. Use the Remote Module that will send the JSON by HTTP and expect the modified JSON as an answer
  2. Create a Typescript Module and add it to the pipeline

2.1. Creating and Naming your Typescript Module

The template module folder shows how a module tree needs to be structured. The folder name, the module's filename and the class's name need to follow the PascalCase naming convention.

You can copy the entire folder to help you having a boilerplate. The template code also contains some handy comments to help you get started.

2.2. Add to Register

To add your newly created module to the register, simply open the Cleaner file /server/src/Cleaner.ts and add your module class to the Cleaner.cleaningToolRegister attribute.

2.3. Add it to the Configuration

If you want your module to run you need to enable it in your configuration.

Simply add a line in the cleaner array with the name of your module, and potential options.

2.4. Run it!

That's it! Your new awesome processing module should run and modify the document according to your needs!