DDEx Project

Extracting data independently of file formats

DDEx - Document Data Extractor - is a framework that allows applications to transparently open and extract the content of documents, regardless of formats.

We are working to provide support for:

OLE2 file formats [.doc, .xls, .ppt]
OOXML file formats [.docx, .xlsx, .pptx]
ODF file formats [.odt, .ods, .odp]
CSV
PDF
Google Docs (minimal support)

Goal, Challenges, Differentials

DDEx is based on the Builder Design Pattern, and can be easily extended to support other formats. DDEx aims at decoupling the process of content extraction from the content processing, handling the diversity of file formats and providing access to the document's content independently of file formats.

DDEx manages the intersection between multiple APIs (such as Apache POI and ODFDOM) by offering a common interface, allowing applications to use document's content in other contexts, encapsulating and performing the extraction independently of formats.

Who is using DDEx?

DDEx was born on the academia and ended up being used by other Ph.D. and MSc students during their research. DDEx is also being used by other projects and is associated with academic productions, such as:

Project BioSpread - Integrating data from Web spreadsheets
2graph - An API for abstracting graph databases
Paper: "Automatic Interpretation of Biodiversity Spreadsheets Based on the Recognition of Construction Patterns"
Paper: "Extracting and Semantically Integrating Implicit Schemas from Multiple Spreadsheets"
Paper: "Introducing shadows: Flexible document representation and annotation on the Web"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.MD

README.MD

DDEx Project

Extracting data independently of file formats

Goal, Challenges, Differentials

Who is using DDEx?

Files

README.MD

Latest commit

History

README.MD

File metadata and controls

DDEx Project

Extracting data independently of file formats

Goal, Challenges, Differentials

Who is using DDEx?