Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MsDoc reader borks up central europen encoding #2565

Open
verybigelephants opened this issue Feb 2, 2024 · 0 comments
Open

MsDoc reader borks up central europen encoding #2565

verybigelephants opened this issue Feb 2, 2024 · 0 comments

Comments

@verybigelephants
Copy link

Describe the Bug

Hello, when trying to read a .doc file with central european characters (file example here srncik.zip ) the reader messes up all the diacritics

Steps to Reproduce

                        $type_word_reader = IOFactory::createReader('MsDoc');
			$text = "";

			$word = $type_word_reader->load($working_file_path);
			foreach($word->getSections() as $section){
				$els = $section->getElements();
				foreach ($els as $el) {			
					$class = get_class($el);
					if (method_exists($class, 'getText')) {
						//i have tried everything, nothing works
						 // \PhpOffice\PhpWord\Shared\Text::toUTF8($el->getText());
						// \ForceUTF8\Encoding; Encoding::fixUTF8($el->getText())); 
						$text .= $el->getText()."\n";
					} else {
						$text .= "\n";
					}
				}
			}
                       file_put_contents('test.log',  $text);

Expected Behavior

not mess up the characters

Current Behavior

messing up the characters

Context

Please fill in your environment information:

  • PHP Version: 8.2
  • PHPWord Version: ^1.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant