Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Excel Adding At-Signs to Functions #3962

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

oleibman
Copy link
Collaborator

@oleibman oleibman commented Mar 27, 2024

This has come up a number of times, most recently with issue #3901, and also issue #3659, and issue #1834. It will certainly come up more often in days to come. Excel is changing formulas which PhpSpreadsheet has output as =UNIQUE(A1:A19); Excel is processing the formula as it were =@UNIQUE(A1:A19). This behavior is explained, in part, by #3659 (comment). It is doing so in order to ensure that the function returns only a single value rather than an array of values, in case the spreadsheet is being processed (or possibly was created) by a less current version of Excel which cannot handle the array result.

PhpSpreadsheet follows Excel to a certain extent; it defaults to returning a single calculated value when an array would be returned. Further, its support for outputting an array even when that default is overridden is incomplete. I am not prepared to do everything that Excel does for the array functions (details below), but this PR is a start in that direction. If the default is changed via:

use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
Calculation::setArrayReturnType(Calculation::RETURN_ARRAY_AS_ARRAY);

When that is done, getCalculatedValue will return an array (no code change necessary). However, Writer/Xlsx will now be updated to look at that value, and if an array is returned in that circumstance, will indicate in the Xml that the result is an array and will include a reference to the bounds of the array. This gets us close, although not completely there, to what Excel does, and may be good enough for now. Excel will still mess with the formula, but now it will treat it as {=UNIQUE(A1:A19)}. This means that the spreadsheet will now look correct; there will be superficial differences, but all cells will have the expected value.

Technically, the major difference between what PhpSpreadsheet will output now, and what Excel does on its own, is that Excel supplies values in the xml for all the cells in the range. That would be difficult for PhpSpreadsheet to do; that could be a project for another day. Excel will treat the output from PhpSpreadsheet as "Array Formulas" (a.k.a. CSE (control shift enter) formulas because you need to use that combination of keys to manually enter them in older versions of Excel). Current versions of Excel will instead use "Dynamic Array Formulas". Dynamic Array Formulas can be changed by the user; Array Formulas need to be deleted and re-entered if you want to change them. I don't know what else might have to change to get Excel to use the latter for PhpSpreadsheet formulas, and I will probably not even try to look now, saving it for a future date.

Unit testing of this change uncovered a bug in Calculation::calculateCellValue. That routine saves off ArrayReturnType, and may change it, and is supposed to restore it. But it does not do the restore if the calculation throws an exception. It is changed to do so.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

This has come up a number of times, most recently with issue PHPOffice#3901, and also issue PHPOffice#3659. It will certainly come up more often in days to come. Excel is changing formulas which PhpSpreadsheet has output as `=UNIQUE(A1:A19)`; Excel is processing the formula as it were `=@unique(A1:A19)`. This behavior is explained, in part, by PHPOffice#3659 (comment). It is doing so in order to ensure that the function returns only a single value rather than an array of values, in case the spreadsheet is being processed (or possibly was created) by a less current version of Excel which cannot handle the array result.

PhpSpreadsheet follows Excel to a certain extent; it defaults to returning a single calculated value when an array would be returned. Further, its support for outputting an array even when that default is overridden is incomplete. I am not prepared to do everything that Excel does for the array functions (details below), but this PR is a start in that direction. If the default is changed via:
```php
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
Calculation::setArrayReturnType(Calculation::RETURN_ARRAY_AS_ARRAY);
```
When that is done, `getCalculatedValue` will return an array (no code change necessary). However, Writer/Xlsx will now be updated to look at that value, and if an array is returned in that circumstance, will indicate in the Xml that the result is an array *and* will include a reference to the bounds of the array. This gets us close, although not completely there, to what Excel does, and may be good enough for now. Excel will still mess with the formula, but now it will treat it as `{=UNIQUE(A1:A19)}`. This means that the spreadsheet will now look correct; there will be superficial differences, but all cells will have the expected value.

Technically, the major difference between what PhpSpreadsheet will output now, and what Excel does on its own, is that Excel supplies values in the xml for all the cells in the range. That would be difficult for PhpSpreadsheet to do; that could be a project for another day. Excel will treat the output from PhpSpreadsheet as "Array Formulas" (a.k.a. CSE (control shift enter) formulas because you need to use that combination of keys to manually enter them in older versions of Excel). Current versions of Excel will instead use "Dynamic Array Formulas". Dynamic Array Formulas can be changed by the user; Array Formulas need to be deleted and re-entered if you want to change them. I don't know what else might have to change to get Excel to use the latter for PhpSpreadsheet formulas, and I will probably not even try to look now, saving it for a future date.

Unit testing of this change uncovered a bug in Calculation::calculateCellValue. That routine saves off ArrayReturnType, and may change it, and is supposed to restore it. But it does not do the restore if the calculation throws an exception. It is changed to do so.
@oleibman oleibman marked this pull request as draft March 27, 2024 00:24
@MarkBaker
Copy link
Member

There was a lot of code that I'd implemented in #2787 that was designed to provide this full support for array formulae, including the handling of ranged returns and support for dynamic ranging with all the relevant tests and some of the new dynamic array functions.
Unfortunately it got too wrapped with other issues and resolutions making it difficult to extract just those parts of the code, and did require a change to the Read Filters that made it a BC break.

Thinking about PHPOffice#3958 - user wondered if unsupported formulas with array results could be handled better. I said that the answer was "no", but I think Xlsx Reader can make use of the dimensions of the result after all, so the answer is actually "sometimes". This is an initial attempt to do that. Implementing it revealed a bug in how Xlsx Reader handles array formula attributes, and that is now corrected. Likewise, Xlsx Writer did not indicate a value for the first cell in the array, and does now.
@oleibman
Copy link
Collaborator Author

@MarkBaker Thanks, I will look over your code. We took very different approaches. I'll see if I can can reconcile some of the differences. Did you ever figure what was needed to make Excel open the PhpSpreadsheet-generated file so that the array functions are treated as dynamic rather than CSE?

@jr212
Copy link

jr212 commented Apr 18, 2024

The solution above doesn't work for me.
My test files(php and a json for the data) can be found on my website https://janr.be/excel.zip. I leave it there for ±1month

Address sample code submitted by @jr212 which was not working correctly.
@oleibman
Copy link
Collaborator Author

@jr212 Thank you for the sample code. Please try again against this PR, which I have changed based on your example. That demonstrated at least 2 problems with the existing code. One, relatively easily handled, was that, since your code used auto-sized columns, the PhpSpreadsheet code wound up formatting the array results, and the formatting code was not expecting array inputs. I will note the array calculation supplies a value only to the first cell of the array, so the auto-sizing is ineffective anyhow. I will think about this some more; perhaps it is something we can live with, or perhaps the code can do better.

The second problem is a little more troublesome. A formula like =sheet!cell winds up returning a 1-element array (row->column->value). With "normal" array handling, this winds up reduced to value. But, with RETURN_ARRAY_AS_ARRAY, it remains as an array. I have added code to recognize this situation and return just the value part, but I'm not entirely convinced that this change won't fail due to false positives or negatives.

Finally, even though the Excel spreadsheet generated by your code is now correct (I think - please verify), I don't think that getCalculatedValue for cell A2 on sheet 'Gesorteerd verlorenpunten' agrees with what Excel shows.

All in all, your example is very useful. But it may cause me to leave this change in draft status for quite a bit longer than I had planned.

@jr212
Copy link

jr212 commented Apr 19, 2024 via email

@oleibman
Copy link
Collaborator Author

@jr212 It is lucky then that I didn't realize your initial problem was with Composer, otherwise I might not have noticed the code problems. I am also not expert in Composer, but I think the following should work. Assume that you have a main directory called "git" (change to whatever is appropriate). Go to that directory and issue the following command:

git clone -b atsign --single-branch https://github.com/oleibman/PhpSpreadsheet.git atsign3

This will create a new directory "git/atsign3". Change to that directory and run:

composer install --prefer-source

It would not surprise me in the least if there is a simpler way to do this, but this should work.

@oleibman
Copy link
Collaborator Author

Merged master to eliminate a no-longer-valid dependency in composer.lock, not as prep work before installing.

@jr212
Copy link

jr212 commented Apr 21, 2024

I've finally succeeded to build everything 😊
Never worked with git before but no problem there.
Test is now complete, with success and in production. Remains only the width of columns. (I can live with that) You wrote before you look at the header of the column (first cell) but people’s names are bigger than only “Name” or in nl-be “Naam”

Questions now
• What files/folder are needed for all functions? I’ve placed now the vendor and src folder but that is 6.98 GiB.
The file PhpSpreadsheet_2.0\vendor\phpstan\phpstan.git\objects\pack\pack-35b3e82d34e6138656d21f8ef47f303a6a20f372.pack alone is 4.8 Gib.
• Do you still need my test code
• Is this normal behaviour? first_open
Above when opening the file all is 0 except the first cell. After click "allow edit" (or something like that) all is fine(recalculated).

My complements for all the hard work in this project and the help to me.

Jan

@oleibman
Copy link
Collaborator Author

@jr212 I believe you can get a much smaller directory by using the following install command rather than the one I initially suggested:

composer install --no-dev

Please bear in mind that the code you will be using is some way from being "production ready". I will be glad to continue to work with you, but this code may never make it into production, and, even if it does, I don't know how soon that might happen.

On Windows systems, when a spreadsheet is downloaded from a browser, the behavior you described is normal.

I don't need you to maintain the zip file you uploaded any longer. However, I will eventually need to add a test case based on what it showed me. Do you have a problem with my basing that test case on the data and code you supplied?

@jr212
Copy link

jr212 commented Apr 22, 2024 via email

@jr212
Copy link

jr212 commented May 2, 2024 via email

@oleibman
Copy link
Collaborator Author

@jr212 Your spreadsheet is no longer available at the address you supplied, probably because I took too long to download it. If you have something you want me to look at, please make it available again.

@jr212
Copy link

jr212 commented May 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants