Skip to content

WadeaHijjawi/EmailFeaturesExtraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

EmailFeaturesExtraction

implemented an open source and flexible tool to provide the ability to extract the features that are mentioned in previous section using any email corpus have emails with EML extension. Then the tool will split the emails’ parts (From, To, CC, BCC, Subject, Text Body, and HTML Body) in an output file and extract the selected features in another one as a CSV file., if errors occurred during the extraction process an error file will be generated as well.

EML file extension is one of the most common extensions for the exported emails from many email applications such as Outlook, Thunderbird and Gmail. Email files in the corpora can be converted easily to the EML extension by rename it to any names with this extension.

The tool uses the following external packages:

1.HTML Agility Pack :External package to parse HTML. Available at: https://htmlagilitypack.codeplex.com/

2.IKVM.NET :External package to enable Java and .NET interoperability. Available at: https://www.ikvm.net/

3.Stanford.NLP.CoreNLP :External package to provides a set of natural language analysis tools. Available at: https://www.nuget.org/packages/Stanford.NLP.CoreNLP/

Releases

No releases published

Packages

No packages published