New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Excel Adding At-Signs to Functions #3962
base: master
Are you sure you want to change the base?
Conversation
This has come up a number of times, most recently with issue PHPOffice#3901, and also issue PHPOffice#3659. It will certainly come up more often in days to come. Excel is changing formulas which PhpSpreadsheet has output as `=UNIQUE(A1:A19)`; Excel is processing the formula as it were `=@unique(A1:A19)`. This behavior is explained, in part, by PHPOffice#3659 (comment). It is doing so in order to ensure that the function returns only a single value rather than an array of values, in case the spreadsheet is being processed (or possibly was created) by a less current version of Excel which cannot handle the array result. PhpSpreadsheet follows Excel to a certain extent; it defaults to returning a single calculated value when an array would be returned. Further, its support for outputting an array even when that default is overridden is incomplete. I am not prepared to do everything that Excel does for the array functions (details below), but this PR is a start in that direction. If the default is changed via: ```php use PhpOffice\PhpSpreadsheet\Calculation\Calculation; Calculation::setArrayReturnType(Calculation::RETURN_ARRAY_AS_ARRAY); ``` When that is done, `getCalculatedValue` will return an array (no code change necessary). However, Writer/Xlsx will now be updated to look at that value, and if an array is returned in that circumstance, will indicate in the Xml that the result is an array *and* will include a reference to the bounds of the array. This gets us close, although not completely there, to what Excel does, and may be good enough for now. Excel will still mess with the formula, but now it will treat it as `{=UNIQUE(A1:A19)}`. This means that the spreadsheet will now look correct; there will be superficial differences, but all cells will have the expected value. Technically, the major difference between what PhpSpreadsheet will output now, and what Excel does on its own, is that Excel supplies values in the xml for all the cells in the range. That would be difficult for PhpSpreadsheet to do; that could be a project for another day. Excel will treat the output from PhpSpreadsheet as "Array Formulas" (a.k.a. CSE (control shift enter) formulas because you need to use that combination of keys to manually enter them in older versions of Excel). Current versions of Excel will instead use "Dynamic Array Formulas". Dynamic Array Formulas can be changed by the user; Array Formulas need to be deleted and re-entered if you want to change them. I don't know what else might have to change to get Excel to use the latter for PhpSpreadsheet formulas, and I will probably not even try to look now, saving it for a future date. Unit testing of this change uncovered a bug in Calculation::calculateCellValue. That routine saves off ArrayReturnType, and may change it, and is supposed to restore it. But it does not do the restore if the calculation throws an exception. It is changed to do so.
There was a lot of code that I'd implemented in #2787 that was designed to provide this full support for array formulae, including the handling of ranged returns and support for dynamic ranging with all the relevant tests and some of the new dynamic array functions. |
Thinking about PHPOffice#3958 - user wondered if unsupported formulas with array results could be handled better. I said that the answer was "no", but I think Xlsx Reader can make use of the dimensions of the result after all, so the answer is actually "sometimes". This is an initial attempt to do that. Implementing it revealed a bug in how Xlsx Reader handles array formula attributes, and that is now corrected. Likewise, Xlsx Writer did not indicate a value for the first cell in the array, and does now.
@MarkBaker Thanks, I will look over your code. We took very different approaches. I'll see if I can can reconcile some of the differences. Did you ever figure what was needed to make Excel open the PhpSpreadsheet-generated file so that the array functions are treated as dynamic rather than CSE? |
The solution above doesn't work for me. |
Address sample code submitted by @jr212 which was not working correctly.
@jr212 Thank you for the sample code. Please try again against this PR, which I have changed based on your example. That demonstrated at least 2 problems with the existing code. One, relatively easily handled, was that, since your code used auto-sized columns, the PhpSpreadsheet code wound up formatting the array results, and the formatting code was not expecting array inputs. I will note the array calculation supplies a value only to the first cell of the array, so the auto-sizing is ineffective anyhow. I will think about this some more; perhaps it is something we can live with, or perhaps the code can do better. The second problem is a little more troublesome. A formula like Finally, even though the Excel spreadsheet generated by your code is now correct (I think - please verify), I don't think that getCalculatedValue for cell A2 on sheet 'Gesorteerd verlorenpunten' agrees with what Excel shows. All in all, your example is very useful. But it may cause me to leave this change in draft status for quite a bit longer than I had planned. |
I did the test but nothing changed ☹ I composed twice with
* composer require --prefer-source phpoffice/phpspreadsheet
* composer require phpoffice/phpspreadsheet:dev-master
If that’s wrong remember that I know about nothing of composer.
I did redownloaded the PhpSpreadsheet-2.0-Development.zip code
Jan
|
@jr212 It is lucky then that I didn't realize your initial problem was with Composer, otherwise I might not have noticed the code problems. I am also not expert in Composer, but I think the following should work. Assume that you have a main directory called "git" (change to whatever is appropriate). Go to that directory and issue the following command:
This will create a new directory "git/atsign3". Change to that directory and run:
It would not surprise me in the least if there is a simpler way to do this, but this should work. |
Merged master to eliminate a no-longer-valid dependency in composer.lock, not as prep work before installing. |
@jr212 I believe you can get a much smaller directory by using the following install command rather than the one I initially suggested:
Please bear in mind that the code you will be using is some way from being "production ready". I will be glad to continue to work with you, but this code may never make it into production, and, even if it does, I don't know how soon that might happen. On Windows systems, when a spreadsheet is downloaded from a browser, the behavior you described is normal. I don't need you to maintain the zip file you uploaded any longer. However, I will eventually need to add a test case based on what it showed me. Do you have a problem with my basing that test case on the data and code you supplied? |
No problem its anonymised.
|
There are 2 new problems. Still related I think therefore I use the same
ticket.
Code and example to find https://janr.be/excel/excel.zip. free to use.
1. When running the code the file is created but I got an error from
Excel.(sorry can't translate) but the file seems fine.
2. In the function doTS there are 2 functions. the one with CHOOSECOLS
doesn't work. Not implemented yet?
Op ma 22 apr 2024 om 09:05 schreef ***@***.***>:
… No problem its anonymised.
--
Met vriendelijke groeten
Jan
***@***.***
|
@jr212 Your spreadsheet is no longer available at the address you supplied, probably because I took too long to download it. If you have something you want me to look at, please make it available again. |
Vind it on https://janr.be/excel.zip
|
This has come up a number of times, most recently with issue #3901, and also issue #3659, and issue #1834. It will certainly come up more often in days to come. Excel is changing formulas which PhpSpreadsheet has output as
=UNIQUE(A1:A19)
; Excel is processing the formula as it were=@UNIQUE(A1:A19)
. This behavior is explained, in part, by #3659 (comment). It is doing so in order to ensure that the function returns only a single value rather than an array of values, in case the spreadsheet is being processed (or possibly was created) by a less current version of Excel which cannot handle the array result.PhpSpreadsheet follows Excel to a certain extent; it defaults to returning a single calculated value when an array would be returned. Further, its support for outputting an array even when that default is overridden is incomplete. I am not prepared to do everything that Excel does for the array functions (details below), but this PR is a start in that direction. If the default is changed via:
When that is done,
getCalculatedValue
will return an array (no code change necessary). However, Writer/Xlsx will now be updated to look at that value, and if an array is returned in that circumstance, will indicate in the Xml that the result is an array and will include a reference to the bounds of the array. This gets us close, although not completely there, to what Excel does, and may be good enough for now. Excel will still mess with the formula, but now it will treat it as{=UNIQUE(A1:A19)}
. This means that the spreadsheet will now look correct; there will be superficial differences, but all cells will have the expected value.Technically, the major difference between what PhpSpreadsheet will output now, and what Excel does on its own, is that Excel supplies values in the xml for all the cells in the range. That would be difficult for PhpSpreadsheet to do; that could be a project for another day. Excel will treat the output from PhpSpreadsheet as "Array Formulas" (a.k.a. CSE (control shift enter) formulas because you need to use that combination of keys to manually enter them in older versions of Excel). Current versions of Excel will instead use "Dynamic Array Formulas". Dynamic Array Formulas can be changed by the user; Array Formulas need to be deleted and re-entered if you want to change them. I don't know what else might have to change to get Excel to use the latter for PhpSpreadsheet formulas, and I will probably not even try to look now, saving it for a future date.
Unit testing of this change uncovered a bug in Calculation::calculateCellValue. That routine saves off ArrayReturnType, and may change it, and is supposed to restore it. But it does not do the restore if the calculation throws an exception. It is changed to do so.
This is:
Checklist:
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.