Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do I get "Data at the root level is invalid. Line 1, position 1" for this file? #1388

Open
nmg196 opened this issue Mar 29, 2024 · 5 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@nmg196
Copy link

nmg196 commented Mar 29, 2024

EPPlus usage

Commercial use (I have a commercial license)

Environment

Windows console app

Epplus version

7.1.0

Spreadsheet application

Exce;

Description

Why do I get "Data at the root level is invalid. Line 1, position 1" for this file?

Notworking.xlsx

The file opens OK in Excel. If I make any change and click Save then EPPlus can then read it OK. But why can't it read the original file? I'm trying to avoid having to open each file and click save as I have hundreds to process.

Unfortunately i have no control of the process which generates the files and I'm unable to spot what's wrong with the unmodified file.

@nmg196 nmg196 added the bug Something isn't working label Mar 29, 2024
@swmal
Copy link
Contributor

swmal commented Apr 3, 2024

Hi @nmg196,

an xlsx file is just a zipped library of xml-files - to view the internal files of any *.xlsx file you can just rename the file from *.xlsx to *zip and open it.

The reason for the "Data at the root level is invalid" error is that the xml files inside this particular xlsx-file are encoded with UNICODE whereas EPPlus uses UTF-8 when reading them.

Since EPPlus assumes that the xml-files are encoded in UTF-8 the result of the file becomes like this:

��<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i\0o\0n\0=\0\"\01\0.\00\0\"\0?\0>\0\r\0\n\0 \0<\0T\0y\0p\0e\0s\0 \0x\0m\0l\0n\0s\0=\0\"\0h\0t\0t\0p\0:\0/\0/\0s\0c\0h\0e\0m\0a\0s\0.\0o\0p\0e\0n\0x\0m\0l\0f\0o\0r\0m\0a\0t\0s\0.\0o\0r\0g\0/\0p\0a\0c\0k\0a\0g\0e\0...

...and the "Data at the root level..." exception is thrown when EPPlus tries to load the xml-string into an XmlDocument.

The reason that it works when you have resaved the file with Excel is most likely converts the files to UTF-8.

I don't think that we have had this reported before, but we will have a look at it and see if we can handle this internally in EPPlus and provide a fix in a coming version.

With the current version EPPlus will unfortunately not be able to read these files.

swmal added a commit that referenced this issue Apr 4, 2024
@swmal
Copy link
Contributor

swmal commented Apr 4, 2024

@nmg196 - May I ask how your xlsx-files were created? By Excel or another tool? If Excel, do you know which version of it?

@swmal swmal self-assigned this Apr 4, 2024
@swmal swmal added the enhancement New feature or request label Apr 4, 2024
@nmg196
Copy link
Author

nmg196 commented Apr 5, 2024

The files are output by a widely used trading platform called MetaTrader 5 (https://www.metatrader5.com/). If this is an edge case, then perhaps I can extract and repack the files if I detect this error. However that said, Excel is able to open them OK which indicates this is a supported file encoding for XML/xlsx files.

@swmal
Copy link
Contributor

swmal commented Apr 10, 2024

I did some testing and we can add support for this in EPPlus by checking the BOM of the files and then process the internal files according to the format. I will add this as a feature to implement in one of the coming versions.

@nmg196
Copy link
Author

nmg196 commented Apr 15, 2024

That would be great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants