Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question marks in Cyrillic file names #31

Open
KonstantinKorepin opened this issue Aug 5, 2022 · 2 comments
Open

Question marks in Cyrillic file names #31

KonstantinKorepin opened this issue Aug 5, 2022 · 2 comments

Comments

@KonstantinKorepin
Copy link

Hi all!

I have a problem with the Archive7z class. I made a function

public function unzip(string $pathToSource): DirectoryIterator
   {
       ...

      $obj = new Archive7z($pathToSource);
      $obj->setOutputDirectory($pathToDestination);
      $obj->extract();

      return new DirectoryIterator($pathToDestination);
   }

which returns a DirectoryIterator with the directory to unpack
files.

Next, I collect information about the unpacked files:

 $iterator = $this->zipper->unzip($zipFile->getRealPath());
 foreach ($iterator as $unzippedFile) {
     if (!in_array($unzippedFile->getFilename(), ['.', '..'])) {
         $encoding = mb_detect_encoding($unzippedFile->getFilename()); // ASCII
         $fileName = $unzippedFile->getFilename(); // 12_1_?????.txt
     }
  }

And on my server, the encoding is defined as ASCII, and in the file names instead of Cyrillic letters
question marks.

In the local environment(Docker), everything is displayed normally. mb_detect_encoding($unzippedFile->getFilename())
returns UTF-8 and the file names are correct.

I also tried to reproduce this error in docker and I managed to do it using the link https://zalinux.ru/?p=5740.
That is, I commented out the en_US.UTF-8 UTF-8 encoding in the PHP container in the /etc/locale.gen file and ran the command
locale-gen. After that, I only had ru_RU.UTF-8 UTF-8 encoding left. And after that the encoding of the unpacked files
also began to be defined as ASCII, not UTF-8, and question marks began to appear instead of Cyrillic characters ?

If we return the en_US.UTF-8 UTF-8 encoding in the container and execute locale-gen, then again everything works fine. Tell,
please, what can I do so that when unpacking files, Cyrillic characters are displayed in the file names, not signs
questions. How to make files unpacked in UTF-8 encoding and not ASCII?

@Gemorroj
Copy link
Owner

Gemorroj commented Aug 5, 2022

yes. we have some problems with cyrillic charactes. see https://github.com/Gemorroj/Archive7z/blob/5.4.0/tests/Archive7zTest.php#L841 for example. but now i don't have any ideas how to fix this.
see #15

@KonstantinKorepin
Copy link
Author

KonstantinKorepin commented Sep 6, 2022

Solution:

class Archive7zRu extends Archive7z
{
    /**
     * Exit codes
     * 0 - Normal (no errors or warnings detected)
     * 1 - Warning (Non fatal error(s)). For example, some files cannot be read during compressing. So they were not compressed
     * 2 - Fatal error
     * 7 - Bad command line parameters
     * 8 - Not enough memory for operation
     * 255 - User stopped the process with control-C (or similar).
     *
     * @throws \Symfony\Component\Process\Exception\ProcessFailedException
     */
    protected function execute(Process $process): Process
    {
        $locale='ru_RU.UTF-8';
        $env = $process->getEnv();
        $env['LC_ALL'] = $locale;
        $process->setEnv($env);

        return $process->mustRun();
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants