Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive iteration #36

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Recursive iteration #36

wants to merge 16 commits into from

Conversation

halaxa
Copy link
Owner

@halaxa halaxa commented Oct 22, 2022

Enable iteration inside iteration to lazily iterate any structure.

foreach (RecursiveJsonMachine::fromFile('universe.json', '/results') as $gallaxy) {
    foreach($gallaxy as $solarSystem) {
        foreach($solarSystem as $solarSystemObject) {
            // process star, planet, asteroid ...
        }
    }
}

The solution lies in yield from. (It doesn't. yield from also tries to rewind the generator. It must be a good old iteration via Iterator methods.) "Child" generator will be given a reference to the top level SyntaxCheckedTokens/Tokens generator and will yield from yield from it until the end of its iteration level. Then the control will be returned back to the higher-level generator. All will be happening on the same instance of SyntaxCheckedTokens/Tokens

How this will reflect in public API must be well thought through.

@halaxa halaxa added this to the 1.0.0 milestone Nov 10, 2020
@halaxa halaxa added the help wanted Extra attention is needed label Nov 10, 2020
@halaxa halaxa removed this from the 1.0.0 milestone Mar 26, 2021
@halaxa halaxa removed the help wanted Extra attention is needed label Nov 2, 2021
@halaxa halaxa added the enhancement New feature or request label Sep 2, 2022
@pfructuoso
Copy link

I like (and need) this. How is this going? Can I help? Thanks

@halaxa
Copy link
Owner Author

halaxa commented Oct 20, 2022

Thank you. No progress whatsoever :) The idea persists though.

@halaxa
Copy link
Owner Author

halaxa commented Oct 20, 2022

Could a nested JSON pointer - as explained in the readme - help you in the meantime?

@halaxa
Copy link
Owner Author

halaxa commented Oct 22, 2022

I tried to implement it and turned this Issue into PR. Can you try it and throw in some suggestions?

@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2022

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (5147f38) 100.00% compared to head (01fc434) 99.67%.
Report is 1 commits behind head on master.

Files Patch % Lines
src/Parser.php 95.65% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@              Coverage Diff              @@
##              master      #36      +/-   ##
=============================================
- Coverage     100.00%   99.67%   -0.33%     
- Complexity       185      220      +35     
=============================================
  Files             17       19       +2     
  Lines            526      607      +81     
=============================================
+ Hits             526      605      +79     
- Misses             0        2       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@halaxa
Copy link
Owner Author

halaxa commented Oct 22, 2022

I ponder that a better name for this feature might be something like lazy_vectors rather than recursive. Because for example lazy_scalars option (as stream resources) may arrive later. Moreover, recursive is somehow vague and does not exactly express what the option does. Any ideas on the name anyone? Ping @fwolfsjaeger @cerbero90

@cerbero90
Copy link
Contributor

Hey @halaxa, I was looking into a very similar solution a couple of days ago. Great timing!

I'd say that what we are leveraging here are technically nested iterators, so probably naming them as what they actually are may cause less confusion.

I see that you generate iterators for objects too, wouldn't iterators for arrays be enough?

@fwolfsjaeger
Copy link
Contributor

I agree that recursive is not the right word to use. As @cerbero90 suggested, nested iterator would make sense.

Also, by returning an actual Iterator object it would be possible to use SPL iterators to further filter the data.

@halaxa
Copy link
Owner Author

halaxa commented Oct 24, 2022

So the name of the feature and the option would be nested iterators?

The point about not iteratig objects made me think of two optinons instead of just one. One for arrays as iterators and second for objects as iterators. That way users can decide what they want.

Also, what about providing a method called for example $nestedIterator->materialize() so users can decide which level/item should be accessible via object/array?

The nested iterator should also be RecursiveIterator, not just Generator instance so it's compatible with spl recursive iterators.

@halaxa
Copy link
Owner Author

halaxa commented Oct 24, 2022

Looking at the recursive example in the readme in this PR, I think a very useful method might also be $user->advanceTo('friends') with support for syntax sugar $user['friends']. The example would then simplify to something like

foreach ($users as $user) {
    foreach ($user['friends'] as $friend) {
        $friend = $friend->materialize(); // or maybe $friend->decode()?
        $friend['name'], $friend['avatar'];
    }
}

@pfructuoso
Copy link

First of all, thanks for get back to this. It would help a lot.
Regarding the naming, nested iterators, makes sense to me.

The example you show in previous commit with syntax sugar looks promising. However, the ideal would be being able to materialize one item without materialize its soons. EG:

foreach ($users as $user) {
    echo $user['name']; // $user['name']->materialize() should work too
    foreach ($user['friends'] as $friend) { // 'friends' instanceof Traversable, not an array/object
        $friend = $friend->materialize();
        $friend['name'], $friend['avatar'];
    }
}

So, I would differenciate between materialize() and decode():

  • materialize: In case of an array/object, if items/props aren't parsed, only scalar data types.
  • decode: Equals to json_decode()

From here, it shouldn't be difficult decode omiting one or more props $user->decode(['omit' => ['friends']])

@halaxa
Copy link
Owner Author

halaxa commented Oct 25, 2022

Thanks for the feedback.

I'm not sure about $user['name']->materialize() as it's expected to return string and not an object.

As for the decode() method, I like it less and less. Its first meaning - simply decoding a value - is superfluous as decoding is implicit and transparent from the moment user specifies a decoder option. Its possible second meaning - similarity to the word materialize - may be misleading. Especially so, because I plan to stack decoding as another generator on top of Parser, and then the decode option might be null as PassthruDecoder will be a thing of the past. What would the user think the method decode() does when they explicitly specified no decoding?

What I'm currently wondering about is the format of the key of advanceTo($key). Should it be in the expected decoded format according to a decoder provided or should it be always in pure JSON string format, so it does not change when a decoder is swapped?

Additionally, if we later stack decoding on top of Parser, the decoding traversable should know methods advanceTo() and materialize(). Probably not. Not SRP enough. So we can put another generator on top of it to handle those syntax sugar methods. Unfortunately, the problems don't end there. Too late to explain myself for today. Good night ;)

@halaxa
Copy link
Owner Author

halaxa commented Mar 12, 2023

So, after some thinking about it, I would lean towards the separate RecursiveItems facade class (similar to Items) to wrap this functionality in. It will better fit the single responsibility principle. It itself may also implement PHP's RecursiveIterator and Items won't become a mess with the additional recursive option or with some other API extensions that might come in the future. Any takes on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants