Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting loss after save_as in xlsx, xls, ods #248

Open
jdaniele71 opened this issue Sep 22, 2021 · 3 comments
Open

Formatting loss after save_as in xlsx, xls, ods #248

jdaniele71 opened this issue Sep 22, 2021 · 3 comments

Comments

@jdaniele71
Copy link

jdaniele71 commented Sep 22, 2021

Hi,
I wrote a script to do some elaboration of a ods file and i noticed it looses the cells formatting.
So i tried with xlsx and xls and it seems the issue is present there as well, with a little difference in xls (that keeps the date format).

The versions of the packages are the following:

pyexcel          0.6.7
pyexcel-ezodf    0.3.4
pyexcel-io       0.6.4
pyexcel-ods3     0.5.3
pyexcel-xls      0.6.2
pyexcel-xlsx     0.6.0
python           3.9.7

I wrote this python code to test the issue:

import pyexcel as pe

book_xlsx = pe.load_book(file_name="test2.xlsx")
book_xlsx.save_as("test2_saved.xlsx")

book_xls = pe.load_book(file_name="test2.xls")
book_xls.save_as("test2_saved.xls")

book_ods = pe.load_book(file_name="test2.ods")
book_ods.save_as("test2_saved.ods")

ODS before saving:
Test2_ODS_before_save

ODS after saving:
Test2_ODS_after_save

XLSX before saving:
Test2_XLSX_before_save

XLSX after saving:
Test2_XLSX_after_save

XLS before saving:
Test2_XLS_before_save

XLS after saving:
Test2_XLS_after_save

What i noticed more is that BEFORE installing the XLS package, the XLSX package was adding the time 00:00:00 to the date too.
After having installed the XLS package, the XLSX changed his behavior, stopping to add the time to the date (as you can see in the picture).
Maybe the XLS package has the priority over the XLSX and it has fixed the date format issue.

I have also extracted the folders from the good original ODS file and the (bad) saved one.
They differ a lot, in terms of number of files and contents.

ODS Original
Test2_ODS_before_save_folders

ODS Saved
Test2_ODS_after_save_folders

Why those plugins heavily modify the files just for loading and saving, without even touching a cell?

I have some Excel files containing tons of data, formatting/conditional formatting, graphs and so on, then I'm not able to use pyexcel at the moment because obviously i have no time to re-format everything manually over 20 sheets, every time i simply run a python script against the file.
I was so much happy to get rid of VBA but I think I was wrong...

Thanks for your kind attention and time.
I hope you can fix those issues.

Daniele

@chfw
Copy link
Member

chfw commented Sep 23, 2021

Thanks for using pyexcel. I am afraid that so far formatting is one of the pyexcel disabilities and I am not sure if we will ever fullfil your request.

But if it is only date format, we can assign it to individual plugins to sort them out one by one.

@jdaniele71
Copy link
Author

Hi chfw, thanks for your prompt reply.
Unfortunately, as you can clearly see from my screenshots, any type of cells formatting is lost:

  • Date
  • Time
  • Percent
  • Number
  • Word wrap
  • Column height
  • Column width
  • Font color
  • Background color
  • ... and probably much more not yet tested

You know how much time it needs to format any single property of many columns on dozen of sheets....
If every time I apply a simple Pyexcel script it will destroy everything, I'm sorry to say it is useless for my needs.

Do not misunderstand me: Pyexcel is a great package to access data on Excel files, just in READ-ONLY, to migrate to SQL or elaborate and present them in some other ways.
The only thing it is NOT able to do properly is to SAVE files, and that's a pity!

What I really cannot understand is where the loss happens.
Files saved from Excel or LibreOffice, for sure contain information about formatting for every cell/row/column.
So I wonder if pyexcel discards them during the load of the book or writing back to a file.
Of course, if pyexcel is not able to manage the cell formatting, why the plugins for loading or saving data should care about it?

Could you estimate how much time (man hours) would it take to upgrade the packages to resolve the "saving" issue?
How much time on pyexcel?
How much time on plugins?
I would help you, but I have no time.

LibreOffice has the worse documentation about the "BASIC" programming I have ever seen and it is not fully compatible with VBA. It's a hell.
After 2 hours of struggling and searching on internet, I gave up.
I would like to avoid installing the Microsoft Office suite bloatware (paying for it).
So that's why i would use pyexcel.

Looking at other alternatives, if found the openpyxl python package.
It should not be able to read ODS files, but trying to load and save the test2.xlsx file, that's the result:

Test2_XLSX_after_save_openpyxl

It looks keeping any cell formatting, at least what is present inside the test sheet.
I will test it extensively before applying it to my Excel files, but it is a good starting point and alternative for now.

Anyway, I hope you will improve your library, because i like the modular approach it has.

Thanks for your time.
Cheers.
Daniele

@brianjmurrell
Copy link

Thanks for using pyexcel. I am afraid that so far formatting is one of the pyexcel disabilities and I am not sure if we will ever fullfil your request.

@chfw Is that to say that losing all formatting is an expected outcome -- that pyexcel simply doesn't currently support preserving the formatting?

Just want to clarify this situation as opposed to this being a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants