Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Styler.to_excel does not export styles and formats correctly #42276

Open
3 of 5 tasks
attack68 opened this issue Jun 28, 2021 · 17 comments
Open
3 of 5 tasks

BUG: Styler.to_excel does not export styles and formats correctly #42276

attack68 opened this issue Jun 28, 2021 · 17 comments
Labels
Bug IO Excel read_excel, to_excel Master Tracker High level tracker for similar issues Styler conditional formatting using DataFrame.style
Milestone

Comments

@attack68
Copy link
Contributor

attack68 commented Jun 28, 2021

This is a tracker / explainer for the various issues:

Essentially these issues record 3 things:

Set_table_styles

Styler.set_table_styles is not exported to excel. This will not be changed (at least by me). To write xlsx, excel styling needs to be attached on a per-cell basis, whereas in HTML indirect references can be created in the CSS language that the browser will parse, for example thead th will apply to all header cells in the header section, and tbody th:nth-child(3n+0) will apply to every third header cell in the body section starting with first. Without writing our own HTML to cell translator for the CSS language it is therefore impossible to map and account for all the complex CSS rules that can be used within set_table_styles. This is well documented.

Todo:

Exporting formatting

The number of possible formatting constructs allowed by Python is greater than what excel offers. Excel also has specified structures that differ from Pythons structures. It is impossible to dynamically code these relationships in some ambiguous cases. Therefore this will not be implemented.

Todo:

  • There is currently a pseudo CSS attribute: number-format which can be used to apply specific Excel based formatting. This should be much better documented with examples.

Border styles bug

Borders in CSS can be specified in many different ways, and the parsing code to translate this into excel's border structure is broken

Todo

Hiding and Concatening

The Styler uses the base implementation of DataFrame.to_excel. It does not do any preliminary filtering and/or alteration of the ctx object to format cells in the right place. It also doesnt react to hidden indexes and/or elements.

Todo

  • document some of the missing features in Styler.to_excel.
  • review how this can be implemented or insert a series of small PRs gradually improving the consistency.
@jnothman
Copy link
Contributor

At one point I tried changing the backend to use existing CSS parsing libraries so that issues like border specification would be more generic, but those I considered required substantial modification, and I ran out of time to pursue this... What's out there might have changed a lot in four years.

@tehunter
Copy link
Contributor

tehunter commented Jan 7, 2022

Regarding border styles, the error in #30008 appears when a border color is defined but border style is not (or border style is none). The source of the error is in pd.io.excel._XlsxStyler where the style mapping (("top", "style"), "top") causes the prop dictionary to set props["top"] = {"color": "#000000"}.

The root cause however is in io.formats.excel under _border_style. Whenever the function returns None, it should instead return "none" as a string. The reason for this is that "none" is a valid border style. When it is set to None, the style gets removed in remove_none during build_xlstyle, which leads to the TypeError.

In summary, I believe replacing each return None with return "none" in _border_style will fix the error. Alternatively, set color to None if the style is none in build_border.

@attack68
Copy link
Contributor Author

attack68 commented Jan 7, 2022

do you know with this solution how to format your styles, e.g.:

  • border: 2px solid red;
  • border-bottom: 2px solid red + border-top: 2px solid red; etc..
  • border-bottom-color: red; + border-bottom-style: solid; + border-bottom-width: medium; + ...

also with xlsxwriter and openpyxl as different writers?

@tehunter
Copy link
Contributor

tehunter commented Jan 8, 2022

take

@jreback jreback modified the milestones: Contributions Welcome, 1.5 Jan 16, 2022
@tehunter tehunter removed their assignment Mar 21, 2022
@vovavili
Copy link

vovavili commented Aug 9, 2022

Any updates on this issue?

@attack68
Copy link
Contributor Author

the updates have been provided where merged prs have been linked. any issues not currnetly being worked on do not have links or merged prs.

@Freud16
Copy link

Freud16 commented Dec 19, 2022

Just for anyone who still want to style their table index, here is my solution:

Note: This is a method which focuses on exporting your multi-index table as an excel with alternating colours for primal index, as long as it looks like what it is inside python, I dont care whether certain column in excel is really 'column' or 'index' inside python.

First, do the routine:

from io import BytesIO
from openpyxl import load_workbook
filename = r'<your local path>'
wb = load_workbook(filename, read_only=False, keep_vba=False)
writer = pd.ExcelWriter(filename, engine='openpyxl')

Divide your dataframe df into two parts - coloured half and blank half. I am using light blue for this example.

lightblue_primalindex=sorted(set([_[0] for _ in df.index]), key=[_[0] for _ in df.index].index)[::2]
lightblue_index=[df.index.get_loc(i) for i in lightblue_primalindex]

Here is the tricky part. Reset your dataframe as single index but preserve the structure by filling NaN to extra primal index column, this way they will not show in the styled dataframe nor in the exported excel later.

y=df.reset_index()
y.loc[y.duplicated('<primal index column name>'),'<primal index column name>']=np.nan

Then apply the colouring with style and we are golden.

lightblue_df=pd.concat([y[lightblue_index[i]] for i in range(len(lightblue_index))])
y.style.apply(lambda x: pd.Series(['background-color: lightblue' 
                            if i in list(lightblue_df.index)
                            else '' for i in range(0, len(y))],index=y.index)).to_excel(writer, sheet_name='<your sheet name>', startrow=0 , startcol=0, index=False)
writer.save()

Then you will get one excel with table looking exactly the same as that in python.

111

222

References:

  1. Style single-index dataframe using style.applymap
  2. Style multi-index dataframe using Styler.set_table_styles

@attack68
Copy link
Contributor Author

or just use styler.applymap_index which is one of the PRs that addressed part of these issues

@Freud16
Copy link

Freud16 commented Dec 19, 2022

or just use styler.applymap_index which is one of the PRs that addressed part of these issues

didn't know we have that, tks!

@mroeschke mroeschke modified the milestones: 2.0, 3.0 Feb 8, 2023
@keelung-yang
Copy link

This issue is fired on Jun 28, 2021. And it still not fix today in 2024.
Following lines doesn't work if you saving df to excel for pandas v2.2.2
df.style.background_gradient(axis=0)

@attack68
Copy link
Contributor Author

Background gradient is HTML and LaTeX only. This will not be ported to Excel.

You are welcome to submit your own PR to fix pandas if the 3y timeline from software volunteers for a free library does not suit your requirements. Thanks for the message.

@keelung-yang
Copy link

Background gradient is HTML and LaTeX only. This will not be ported to Excel.

Thanks for your info!
I checked https://pandas.pydata.org/docs/reference/style.html, it does say Helps style a DataFrame or Series according to the data with HTML and CSS.
So, I shouldn't expect applying styles to excel.
It's better to document it in more APIs or output a warning while apply styles to excel.

@attack68
Copy link
Contributor Author

If you use background_gradient it can create HTML or LaTeX. If you create HTML then some of those styles, in particular background-color CSS attribute can be exported to Excel. You can see which CSS attributes can be exported to Excel
here.

If you do this:

import pandas as pd
df = pd.DataFrame([[1,2],[3,4]])
df.style.background_gradient().to_excel("my_wb.xlsx")

you get this in Excel:

Screenshot 2024-05-11 at 17 55 08

If you feel you can document this better than currently, PRs are welcome and appreciated.

@keelung-yang
Copy link

@attack68 ,
Thanks for you demo! It works!
I'd like to contribute. But contributions need to understand the design and some relations of class/api.
For me, such as,

  1. Why a Style need an API to_excel() which is duplicated as in DataFrame?
  2. Sicne the code df.sytle.set_xxx() is setting the style of df straight forward, why there is no inplace parameter in Sytle.set_xxx()? If inplace is added in future, it should be True or False by default?
  3. What's about applying sytle lazily? I mean applying style only when it need to be shown or saved. Such as no to_excel() in Style, the style is only applied while calling to_excel() or to_html() and so on.

As you can see, I'm the beginer of panas, I cannot impove it before understanding.

@attack68
Copy link
Contributor Author

  1. Because a Styler is a completely different class to a DataFrame.
  2. inplace serves no purpose in Styler.
  3. Almost everything is done lazily already.

@keelung-yang
Copy link

@attack68
Thanks very very much!
Now I've only one question which I should dig it myself firstly, but I'm not familar with Jupyter Notebook too.

Why calling df.style.background_gradient() works in Notebook without calling .to_html() ?
On the other hand, to_excel() must be called to make it show gradient content in excel.

@attack68
Copy link
Contributor Author

Becuase Jupyter Notebook and Excel are different software applications and they work in completely different ways.
More obviously becuase to_html (which Jupyter calls automatically) returns a string which Jupyter knows to render in a web browser in the appropriate way, whilst to_excel returns a system file which has to be opened and interpreted by the application - excel.

Please submit any further user questions to appropriate forums, e.g. stack overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Master Tracker High level tracker for similar issues Styler conditional formatting using DataFrame.style
Projects
None yet
Development

No branches or pull requests

8 participants