Skip to content

Commit

Permalink
Fix regexp to handle datetime lines starting wih a few characters bef…
Browse files Browse the repository at this point in the history
…ore `On...` (#79)
  • Loading branch information
Nuranto committed May 1, 2021
1 parent ce2d030 commit 4b96a55
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 5 deletions.
10 changes: 5 additions & 5 deletions src/EmailReplyParser/Parser/EmailParser.php
Expand Up @@ -31,11 +31,11 @@ class EmailParser
* @var string[]
*/
private $quoteHeadersRegex = array(
'/^\s*(On(?:(?!^>*\s*On\b|\bwrote:).){0,1000}wrote:)$/ms', // On DATE, NAME <EMAIL> wrote:
'/^\s*(Le(?:(?!^>*\s*Le\b|\bécrit:).){0,1000}écrit(\s|\xc2\xa0):)$/ms', // Le DATE, NAME <EMAIL> a écrit :
'/^\s*(El(?:(?!^>*\s*El\b|\bescribió:).){0,1000}escribió:)$/ms', // El DATE, NAME <EMAIL> escribió:
'/^\s*(El(?:(?!^>*\s*El\b|\bha escrit:).){0,1000}ha escrit:)$/ms', // El DATE, NAME <EMAIL> ha escrit:
'/^\s*(Il(?:(?!^>*\s*Il\b|\bscritto:).){0,1000}scritto:)$/ms', // Il DATE, NAME <EMAIL> ha scritto:
'/^.{0,5}(On(?:(?!\bOn\b|\bwrote(\s|\xc2\xa0)?:).){0,1000}wrote(\s|\xc2\xa0)?:)$/ms', // On DATE, NAME <EMAIL> wrote:
'/^.{0,5}(Le\b(?:(?!\bLe\b|\bécrit(\s|\xc2\xa0)?:).){0,1000}écrit(\s|\xc2\xa0)?:)$/ms', // Le DATE, NAME <EMAIL> a écrit :
'/^.{0,5}(El(?:(?!\bEl\b|\bescribió\s?:).){0,1000}escribió\s?:)$/ms', // El DATE, NAME <EMAIL> escribió:
'/^.{0,5}(El(?:(?!\bEl\b|\bha escrit\s?:).){0,1000}ha escrit\s?:)$/ms', // El DATE, NAME <EMAIL> ha escrit:
'/^.{0,5}(Il(?:(?!\bIl\b|\bscritto(\s|\xc2\xa0)?:).){0,1000}scritto(\s|\xc2\xa0)?:)$/ms', // Il DATE, NAME <EMAIL> ha scritto:
'/^[\S\s]+ (написа(л|ла|в)+)+:$/msu', // Everything before написал: not ending on wrote:
'/^\s*(Op\s.+?(schreef|geschreven).+:)$/ms', // Op DATE schreef NAME <EMAIL>:, Op DATE heeft NAME <EMAIL> het volgende geschreven:
'/^\s*((W\sdniu|Dnia)\s.+?(pisze|napisał(\(a\))?):)$/msu', // W dniu DATE, NAME <EMAIL> pisze|napisał:
Expand Down
16 changes: 16 additions & 0 deletions tests/EmailReplyParser/Tests/Parser/EmailParserTest.php
Expand Up @@ -273,6 +273,22 @@ public function testEmailGmailNo()
$this->assertEquals(static::COMMON_FIRST_FRAGMENT, trim($fragments[0]));
}

public function testEmailFreeZimbraFr()
{
$email = $this->parser->parse($this->getFixtures('email_zimbra_free_fr.txt'));
$fragments = $email->getFragments();
$this->assertStringContainsString('Michael Scott', $fragments[0]);
$this->assertStringNotContainsString('Toby Flenderson', $fragments[0]);
}

public function testEmailFreeZimbraEn()
{
$email = $this->parser->parse($this->getFixtures('email_zimbra_free_en.txt'));
$fragments = $email->getFragments();
$this->assertStringContainsString('Michael Scott', $fragments[0]);
$this->assertStringNotContainsString('Toby Flenderson', $fragments[0]);
}

public function testReadsEmailWithCorrectSignature()
{
$email = $this->parser->parse($this->getFixtures('correct_sig.txt'));
Expand Down
16 changes: 16 additions & 0 deletions tests/Fixtures/email_zimbra_free_en.txt
@@ -0,0 +1,16 @@
Bonjour Toby,

​Text paragraph 1​

On this line is the issue

​Text paragraph 2​

Michael Scott

- On 2021-04-09 23:25, Toby Flenderson a wrote:
> [1]
>
> BONJOUR Mike,
>
> Vous venez de recevoir un message. Le visiteur vous l'envoie depuis:
17 changes: 17 additions & 0 deletions tests/Fixtures/email_zimbra_free_fr.txt
@@ -0,0 +1,17 @@
Bonjour ​Test​,

​Text paragraph 1​


Le paragraphe qui pose problème.

​Text paragraph 2​

Michael Scott

- Le 2021-04-09 23:25, Toby Flenderson a écrit :
> [1]
>
> BONJOUR TEST,
>
> Vous venez de recevoir un message. Le visiteur vous l'envoie depuis:

0 comments on commit 4b96a55

Please sign in to comment.