Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSSFColumn class implementation #1329

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

artemkoloskov
Copy link

XSSFColumn class

This PR adds an IColumn interface (similar to an IRow interface) and implements it in a XSSFColumn object. XSSFSheet is refactored to use that new object for all operations on columns in a very similar way to the way operations on rows are done: copying, cuting, pasting, shifting (with attention to formulas), formatting (both styles and width, hidennes, un-/grouping), adding new columns, removing cells by column - pretty much all things that are possible for XSSFRows are now possible with XSSFColumn with a more fluent and easy to use API, than the ColumnHelper had provided.

Major hurdle for this was the way columns are stored in the sheet.xml - CT_Col objects are not individual columns, but are "spans" of columns. If columns from 5 to 10 have the same style, width, hidden status and outline level - they will be represented as a CT_Col object with min field of 5 and max field of 10. So its one "span" of columns from 5 to 10, rather than 6 columns. This makes it hard to work with individual columns. For this reason this PR also changes the way CT_Col objects are parsed and stored in XSSFSheet - the "spans" are now broken down to individual CT_Col objects with min and max fields set to the same value, so that each CT_Col object represents a single column. This happens when a workbook is read from a file. When a workbook is written to a file, the CT_Col objects are again merged into "spans" of columns, depending on their style, width, hidden status and outline level, to match the way Excel stores columns in the sheet.xml.

This still had drawbacks - in case if all columns on the sheet had the same formatting all the way to the end of the sheet the last CT_Col object in the sheet.xml would have max field set to the maximum number of columns in Excel (16384). This would make the sheet.xml file very large and slow to parse. This happens even if only one column is actually used in the sheet. A compromise was made - if the last CT_Col object has max field set to the maximum number of columns in Excel - the max field is set to be a maximum of:

min field + 1 of the last CT_Col object that has max field set to the maximum number of
columns in Excel
column index of the cell with a maximum column index in the sheet
This is a compromise, because if that was done on a workbook where it was intended to have all columns formatted the same way - this information about the columns beyond this calculated max field will be lost. The good news is usually this is not intended and the max field of the last CT_Col object is set to the maximum number of columns in Excel is just a result of a hastily applied formatting to the whole sheet.

The PR comes with a lot of tests, that test the new XSSFColumn object and the refactored XSSFSheet object.

This PR is a replacement of #1261

Artem Koloskov added 30 commits February 8, 2024 15:29
This is needed for ease of column shifting.
…r class

These update formulas according to the new positions of the columns. Simillar to UpdateRowFormulas and  ShiftFormula for IRow.
Add SheetUtil.CopyColumn method
Simillar to SheetUtil.CopyRow.

Implement column related methods in XSSFSheet
…ncluded in the column

And delete it from there too.
… fields into individual "CT_Col"s

This drastically hinders performance if used with ColumnHelper in its current state, so it needs to be deprecated and it;s functionality replaced with XSSFColumn.
XSSFColumn is changing it's cell storage to a dynamically generated from rows, so there is no more need for a double check.
…edSpecifiedField to false.

Doesnt make any sense for them to be true by default. OOXLM XSD specifies collapsed attribute as false by default too.
It used to hold all cells in its own list. Now it just looks them up from rows.
These methods now use XSSFColumn to perform their tasks.
…and collapsed properites into their fields, to not trigger setters.
…larity before writing CT_Cols node.

This will make sure the CT_Cols follow the excel's rules of storing these objects, where adjscent columns with equal attributes will be combines into CT_Col spans.
Tests are specifically designed to check that the merged regions are treated correctly during shift
Bug was that lastCol was almost always 0, thus the check for the removal of the merged region will return false and regions will stay in place
… area when dealing with merged regions.

Overwritten area is the one where the row/column will be, once moved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants