Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh: Introduce FindInstanceInterface - Part 2: Implementing throughout the code #6463

Open
wants to merge 26 commits into
base: statable
Choose a base branch
from

Conversation

martin-rueegg
Copy link
Contributor

@martin-rueegg martin-rueegg commented Jul 25, 2023

Introduce FindInstanceInterface

Part 2: Implementing throughout the code

FindInstanceInterface series:

PR Admin

What kind of change does this PR introduce?**

  • Feature
  • Refactor

Does this PR introduce a breaking change?

  • No

The PR fulfills these requirements:

  • It's submitted to the develop branch
  • All tests are passing
  • New/updated tests are included
  • Changelog was modified

Other information:

#6457 (comment) and following

@martin-rueegg
Copy link
Contributor Author

martin-rueegg commented Jul 25, 2023

#6457 (comment)

One small point, if possible, I would prefer when the method Membership::findMembership(Space $space, User $user): ?membership would have strict types and does not allow int argument. What do you think?

Thank you for asking for my opinion.
I personally would suggest to create at least two corresponding methods for the User and Space classes (potentially with caching, too):

  • User::findUser($user): ?User
  • User::findUserAsId($user): ?int
  • Space::findSpace($space): ?Space
  • Space::findSpaceAsId($space): ?int

The User::findUser and Space::findSpace would do the real work, the other two only get the $object->id from the main methods, unless the result is null.
The code in https://github.com/metaworx/humhub/blob/fbc093a9358bb02fe4d8dcdc1309f74396855f85/protected/humhub/modules/space/models/Membership.php#L413-L425 would then be moved into the respective methods.
This would allow to simplify all these countless occurrences of similar code to be reduced to one function call and you know exactly that you either get a User object - or a UserID respectively - or null, no matter what you put in (int, User, or null). In case of an invalid data type, an exception is thrown.
Happy to create a PR for this, if that would be in your interest.

So if we could introduce two methods:

  • Membership::findMembership(Space $space, User $user): ?self
  • Membership::findMembershipByIds(int $spaceId, int $userId): ?self

I'm fine with this too.

But not sure if we need User::findUser or User::findUserAsId. There is already a method User::findByPk()`. Or did I missed something here?

If you're referring to User::findIdentity(), than yes, that's kind of what I meant. And it is not typed at all, neither with the input, not the output.

I would personally allow User|int|string|null for input. There are so many places in the code, where this could be used:

$userId = ($user instanceof User) ? $user->id : $user;

$userId = ($user instanceof User) ? $user->id : $user;

if (!($user instanceof User)) {
$user = User::findOne(['id' => $user]);

if (!($user instanceof User)) {
$user = User::findOne(['id' => $user]);

$user = User::findOne(['id' => $this->id]);

if ($user == null) {
$user = Yii::$app->user->getIdentity();
}
$userId = ($user instanceof User) ? $user->id : $user;

And there are potentially many more if considering modules.

Having those functions, for the Membership class we only require one as the rest is taken care of in the User and Space classes.

What is the advantage of having a strongly typed input, if the type is verified in the function and throws an exception, as invoking the function with a wrong type would?


#6457 (comment)

@martin-rueegg Ok, I see, if with such methods we can avoid such duplicate code and get more type strict, I highly support.

For me, it's just one important thing to keep the code base especially models as compact, consistent, and simple as possible. Currently we try to reduce the codebase of e.g. User/Space models, so I'm a little conservative with new methods :-)

But if we introduce such "helpers", we should ensure to use the same convemntion in all core Models (e.g. User/Space) and also in the modules.

@martin-rueegg
Copy link
Contributor Author

The alternative to add those methods to User/Space classes would be to add them to either one helper class (in /humhub/libs ! :-D ) or to two separate helper classes in the respective modules.

However, performance-wise it's probably better to have them in the User/Space classes, as the methods are very likely to be called and those classes already loaded. Loading another or two other classes would create more overhead.

The User::findIdentity() method is part of the \yii\web\IdentityInterface and hence cannot be changed. But it could either be used as the base function, oder it's alias (my preference as the main function would be called according to the same pattern in the different classes)

I did not want to use get* method names, as they imply a property of the object. So I opted for the find* "namespace".

We could opt for a more generic pattern, as in:

findInstance($identifier, bool cached = true): ?self:
findInstanceAsId($identifier, bool cached = true): ?int:

In case of a multi-row-pk (as with Membership), the $identifier could be an array [$space, $user]. However, in this case the second function would rather have to be findInstanceAsId($identifier, bool cached = true): ?array. Or left out.

Having this generic approach, this would also allow to use an interface to define the methods.

Otherwise we could use

find{$ClassName}($identifier, bool cached = true): ?self:
find{$ClassName}AsId($identifier, bool cached = true): ?int:

Any other suggestion, @luke- ?

@martin-rueegg
Copy link
Contributor Author

martin-rueegg commented Jul 25, 2023

In case of a multi-row-pk (as with Membership), the $identifier could be an array [$space, $user]. However, in this case the second function would rather have to be findInstanceAsId($identifier, bool cached = true): ?array. Or left out.

In order to avoid this problem, two interfaces FindInstanceInterfaceSimplePK and FindInstanceInterfaceMultiPK could inherit from FindInstanceInterface.

Just an idea. Probably overkill.

Actually, I've just set no return type on the findInstanceAsId() but added it to the implementation. So one interface is enough and the type is still checked.

@martin-rueegg martin-rueegg changed the title Enh: Introduce generic find* methods to get (cached) instances of Users and Spaces Enh: Introduce FindInstanceInterface with methods to get (cached) instances of e.g. Users, Spaces, and Memberships Jul 26, 2023
@martin-rueegg
Copy link
Contributor Author

Note for reviewers:

This PR is now building upon and including PR #6457. Hence code review would be easier if either 6457 is already merged into develop, or both are applied to a separate branch.

@martin-rueegg martin-rueegg marked this pull request as ready for review July 26, 2023 08:02
@martin-rueegg martin-rueegg mentioned this pull request Aug 4, 2023
5 tasks
@martin-rueegg martin-rueegg force-pushed the enh/new-find-methods branch 7 times, most recently from 0f72796 to 79bc572 Compare August 11, 2023 14:05
@martin-rueegg martin-rueegg force-pushed the enh/new-find-methods branch 7 times, most recently from 9fb9185 to 9d984e4 Compare August 12, 2023 00:13
@martin-rueegg martin-rueegg changed the base branch from develop to statable August 12, 2023 08:15
@martin-rueegg martin-rueegg changed the title Enh: Introduce FindInstanceInterface with methods to get (cached) instances of e.g. Users, Spaces, and Memberships Enh: Introduce FindInstanceInterface with methods to get (cached) instances - Part 2 Aug 12, 2023
@martin-rueegg
Copy link
Contributor Author

@luke- as laid out in the introduction to part 1 of this PR:

Since #6457 was only a partial solution for #6375 as it

  1. uses a non-standard name for findMembership and
  2. deletes the cache only when memberships are updated, and
  3. does the latter in an insufficient way (ignores both deleteAll() and saveAll() and linked content),

there is an improved attempt in #6463 to address this in a broader way, and ultimately #6482, which addresses some more issues.

To break #6463 down, I've created this PR for review, and I've submitted towards the statable branch, as #6430 also bilds on top of #6463. Once, #6463 is merged in statable too, both can be merged int develop, which I recommend to happen before v1.15 is released.

I know it is a significant change and not something you like to add to v1.15. However, I do feel that the big change is really the introduction of the runtime cache and this might help mitigating undesired side-effects. However, it could also be ready for a hotfix in v15.1 if the runtime cache proofs to be tricky.

@martin-rueegg
Copy link
Contributor Author

@luke- From #6512 (comment):

Maybe something to consider in the code style, too. Accessing array entries, particularly if the array has many keys or the value is an array on its own, is expensive.
Also, accessing magic methods is, and the getter function should be used instead (unless it is an uncached relation definition).
And such result should be cached within the function or block.
IMO. :-)

I need to think about the caching topic a bit. I understand that there is really room for improvement here, but I also want to stay as close as possible to the Yii2 framework concepts and also not add too much complexity.

Yes, I appreciate that. I mean the cached relation can probably be undone.

However, for me arises the question: why not using the default db-caching mechanism from Yii2 and just deal with it's clearing as we do in the runtimeCache?

Unless you would opt for the New RuntimeCache engine #6482 that brings the advantage to store the records under different cache keys, e.g to be looked up both bi id as well as guid or email ....

@luke-
Copy link
Contributor

luke- commented Sep 13, 2023

@martin-rueegg Just some thoughts to perhaps simplify it further:

  • Do we really need the CacheableActiveQuery logic? Can't we just extend CachedActiveRecord::afterFind() and cache the record here on load?
  • Is there anything agsinst to integrate the following files directly into CachedActiveRecord:
    • FindInstanceTrait - Do we really need this separation?
    • RuntimeCacheHelper - It's primarily used for CachedActiveRecord or? Ideally integrate to CachedAR or maybe also rename it to CachedActiveRecordHelper or CachedActiveRecordService.
    • FindInstanceInterface
    • UniqueIdentifiersInterface - Maybe simply abstract methods in CachedActiveRecord?
    • CachableActiveQuery - See above

@martin-rueegg martin-rueegg changed the title Enh: Introduce FindInstanceInterface with methods to get (cached) instances - Part 2 Enh: Introduce FindInstanceInterface - Part 2: Implementing throughout the code Sep 13, 2023
@martin-rueegg
Copy link
Contributor Author

@luke- Thanks for your your input!

  • Do we really need the CacheableActiveQuery logic? Can't we just extend CachedActiveRecord::afterFind() and cache the record here on load?

Good idea. Done.

Is there anything against to integrate the following files directly into CachedActiveRecord:

> FindInstanceTrait

Do we really need this separation?

Please refer to my explanation in #6503 (comment):

This could be a good solution, yes. There is one implication: \humhub\modules\content\models\ContentContainer extends \yii\db\ActiveRecord, rather than \humhub\components\ActiveRecord. Is there a reason for this? We have two options:

  1. We keep the the code in FindInstanceTrait and link this both to ContentContainer and CachedActiveRecord. From the latter we then derive the other classes like ContentActiveRecord, Space, User, etc.
  2. We derive ContentContainer also from CachedActiveRecord (and hence humhub\components\ActiveRecord) but would have to check if there are some methods we have to direct to \yii\db\ActiveRecord (jumping humhub\components\ActiveRecord).

> RuntimeCacheHelper

It's primarily used for CachedActiveRecord or? Ideally integrate to CachedAR or maybe also rename it to CachedActiveRecordHelper or CachedActiveRecordService.

I'm not trying to be hairsplitting here, but it's mainly used to access and modify the RuntimeCache (RTC) at Yii::$app->runtimeCache. At least that was my approach, although I agree that most functions do consider ActiveRecords (not only CachedAR!) in a particular way.

But since the RTC is just an ArrayCache, it does not provide any additional functionality in terms of cache-management. However, we could also extend a class from the ArrayCache, use this as the RTC implementation, and then move the helper functions into that class. The downside of this is, that the RTC could not be easily disabled in the configuration by replacing it with DummyCache, but we would have to create a parameter to disable it instead.

So rather than integrating it into CachedAR, I'd leave it as a helper class, or service class for that matter (but related to the RuntimeCache) and then update other places where the RTC is accessed to use this Helper. I just did not want to add more code to this PR but happy to create another PR for this.

> FindInstanceInterface

The purpose of findInstance() is, in my view, not primarily the caching, but the possibility of providing different identifiers but always receive an object instance or null. Whether caching is done or not does not alter the outcome of this function (MUST not!). As such, the could also be implemented by non-caching models or any object for that matter.

However, if we can solve the issue regarding FindInstanceTrait mentioned above, we could also abandon this interface for now. Maybe it would be useful to make it part of \humhub\components\ActiveRecord but use a non-cached implementation there.

> UniqueIdentifiersInterface

Maybe simply abstract methods in CachedActiveRecord?
Similar to the question regarding RuntimeCacheHelper, this is something that any object that needs to be stored in the RTC could implement. That is why I have separated it from FindInstanceInterface. But maybe it could be something like RuntimeCachableInterface, to be more generic and add other requirements for caching if they arise?

And again, if we do actually derive all cachable objects from CachedActiveRecord, then yes, this can be abolished and a abstract method would do the trick!

@luke-
Copy link
Contributor

luke- commented Sep 14, 2023

Is there anything against to integrate the following files directly into CachedActiveRecord:

> FindInstanceTrait

Do we really need this separation?

Please refer to my explanation in #6503 (comment):

Missed that. There is no reason that it is not derived from our standard ActiveRecord. I've changed that in next.

> RuntimeCacheHelper

It's primarily used for CachedActiveRecord or? Ideally integrate to CachedAR or maybe also rename it to CachedActiveRecordHelper or CachedActiveRecordService.

I'm not trying to be hairsplitting here, but it's mainly used to access and modify the RuntimeCache (RTC) at Yii::$app->runtimeCache. At least that was my approach, although I agree that most functions do consider ActiveRecords (not only CachedAR!) in a particular way.

But since the RTC is just an ArrayCache, it does not provide any additional functionality in terms of cache-management. However, we could also extend a class from the ArrayCache, use this as the RTC implementation, and then move the helper functions into that class. The downside of this is, that the RTC could not be easily disabled in the configuration by replacing it with DummyCache, but we would have to create a parameter to disable it instead.

So rather than integrating it into CachedAR, I'd leave it as a helper class, or service class for that matter (but related to the RuntimeCache) and then update other places where the RTC is accessed to use this Helper. I just did not want to add more code to this PR but happy to create another PR for this.

I would prefer the code base of CachedActiveRecord feature to be as isolated as possible. CachedActiveRecord uses the RuntimeCache+Helper, but the RuntimeCacheHelper should not contain any parts of CachedActiveRecord.

The RTC is a standard Yii array cache including most features + good documentation. What specific management features are you missing?

> FindInstanceInterface

The purpose of findInstance() is, in my view, not primarily the caching, but the possibility of providing different identifiers but always receive an object instance or null. Whether caching is done or not does not alter the outcome of this function (MUST not!). As such, the could also be implemented by non-caching models or any object for that matter.

However, if we can solve the issue regarding FindInstanceTrait mentioned above, we could also abandon this interface for now. Maybe it would be useful to make it part of \humhub\components\ActiveRecord but use a non-cached implementation there.

Hmm, I actually see CachedActiveRecord::findInstance more Caching Only and more as an optional PowerUser/Core feature to make certain places like Polymorophic Relations much more efficient.

As before, most (module-) devs should still work with standard Yii 2 AR & ActiveQuery methods.


> UniqueIdentifiersInterface

Maybe simply abstract methods in CachedActiveRecord?
Similar to the question regarding RuntimeCacheHelper, this is something that any object that needs to be stored in the RTC could implement. That is why I have separated it from FindInstanceInterface. But maybe it could be something like RuntimeCachableInterface, to be more generic and add other requirements for caching if they arise?

And again, if we do actually derive all cachable objects from CachedActiveRecord, then yes, this can be abolished and a abstract method would do the trick!

@martin-rueegg
Copy link
Contributor Author

Is there anything against to integrate the following files directly into CachedActiveRecord:

> FindInstanceTrait

Please refer to my explanation in #6503 (comment):
Missed that. There is no reason that it is not derived from our standard ActiveRecord. I've changed that in next.

Thanks, I've seen that. However, as mentioned repeatedly and at different places :-) I consider the current implementation and usage of RTC not fit for production. Why? Simply because there are failing tests as soon as you enable RTC during tests. Why this should be not a problem of the testing environment is also explained at various locations, including

I have at those locations argued, that

  • Enable RuntimeCache during tests #6527 should be looked into before releasing 1.15
  • and then it possibly turns out that findInstance() might be a welcome remedy for the RTC issues that are currently not addressed

As such, it is the question, if the change you made to next applies to this PR or not. There are two ways to proceed:

  • if this PR series is certainly not going into develop, there is no problem. We can merge next into statable and remove the trait
  • if this PR might still be required for the release of 1.15, we can leave the PR including the trait in the two classes, and then for 1.16 we apply your change and remove the then obsolete trait by moving it's code to CachedAR

I would prefer the code base of CachedActiveRecord feature to be as isolated as possible. CachedActiveRecord uses the RuntimeCache+Helper, but the RuntimeCacheHelper should not contain any parts of CachedActiveRecord.

The RTC is a standard Yii array cache including most features + good documentation. What specific management features are you missing?

I guess the main question boils down to wheter the RTC should be exclusivly be used for record caching (i.e. CachedActiveRecord), or should it be possible to use it for other objects as well (that may have similar requirements in key variations).

So for example, with the current implementation in RuntimeCacheHelper it would be possible to also have some Settings in the RTC as they are currently read from file cache also several times per request.

Putting all this functionality into CachedActiveRecord would prevent the future use of RTC for other objects than CachedActiveRecord without having to duplicate that code.

So, we want to have the RTC available for CachedActiveRecord only, we can move the functionality to CachedActiveRecord. If, on the other hand, we want to have RTC available for other use cases with similar requirements, then it makes sense to keep RuntimeCacheHelper as it is.

The functionality that is currently in RuntimeCacheHelper:

  • creating a unique key from an object (could also be moved into e.g. \humhub\libs\StringHelper)
  • creating additional key variants to retrieve an object on different search keys (same here)
  • setting and deleting keys and at the same time manage the different key variants
  • deleting objects based on an instance or a class name

The "parts of CachedActiveRecord" that are in RuntimeCacheHelper are only a special treatment during the evaluation of the key variants. But in that, CachedActiveRecord is only one case among others: ActiveRecordInterface and general objects.

Relying on the code of RuntimeCacheHelper/StringHelper to create the UniqueId would in my understanding not bust the principle of having "CachedActiveRecord feature to be as isolated as possible." Similar to the use of let's say ArrayHelper::merge() would.

> FindInstanceInterface

The purpose of findInstance() is, in my view, not primarily the caching, but the possibility of providing different identifiers but always receive an object instance or null. Whether caching is done or not does not alter the outcome of this function (MUST not!). As such, the could also be implemented by non-caching models or any object for that matter.
However, if we can solve the issue regarding FindInstanceTrait mentioned above, we could also abandon this interface for now. Maybe it would be useful to make it part of \humhub\components\ActiveRecord but use a non-cached implementation there.

Hmm, I actually see CachedActiveRecord::findInstance more Caching Only and more as an optional PowerUser/Core feature to make certain places like Polymorophic Relations much more efficient.

As before, most (module-) devs should still work with standard Yii 2 AR & ActiveQuery methods.

I disagree on this one. IMO, it must be in the interest of the software provider, that the software is running as performant as possible. So it should not depend on the knowledge of a module developer whether or not looking up a user is performant. I would say, it usually depends more on the record type, rather than on the developer writhing the code. For example, looking up a User should by default be cached, particularly on single lookup (we do now cache every record of CachedActiveRecord since we cache them in the afterFind() method).

From that perspective, I wonder if it would not be better to abandon the special method findInstance() and use ActiveRecord::findOneCached() instead. And for the CachedActiveRecord we use findOneCached() as a default for findOne(). Or we simply add a second, optional flag $cached to findOne():

humhub\components\ActiveRecord::findOne($condition, $cached = false): ?self;
humhub\components\CachedActiveRecord::findOne($condition, $cached = true): ?self {
    return parent::findOne($condition, $cached);
}

This would allow us to make some sane defaults (as caching space, user, contentcontainer, membershio, etc.) and give the advanced programmers the option to used cached access on objects that are not cached by default as well as get non-cached objects. And we are much closer to the Yii standard.

I guess, using this method would be a much more elegant solution!

\yii\db\ActiveRecordInterface::findOne() has a very interesting definition, very close to the one we use for findInstance(). The only difference to findInstance() are:

  • and array of IDs is not treated as multiple values of a multi-pk or multi-key (e.g. membership) but all as one-out-of-many-possible IDs.
  • $condition must also be a simple condition, but is not separate from our $identifier. I think this would not be a big deal as we store and retrieve the object from the cache based on predefined attributes. Everything else is just forwarded to the DB.
  • null values and $instance are not supported. But this is not a problem, as $condition is un-typed and the PHP Contravariance allows to implement broader types in child classes (which would sort-of apply here).
    /**
     * Returns a single active record model instance by a primary key or an array of column values.
     *
     * The method accepts:
     *
     *  - a scalar value (integer or string): query by a single primary key value and return the
     *    corresponding record (or `null` if not found).
     *  - a non-associative array: query by a list of primary key values and return the
     *    first record (or `null` if not found).
     *  - an associative array of name-value pairs: query by a set of attribute values and return a single record
     *    matching all of them (or `null` if not found). Note that `['id' => 1, 2]` is treated as a non-associative array.
     *    Column names are limited to current records table columns for SQL DBMS, or filtered otherwise to be limited to simple filter conditions.
     *  - a yii\db\Expression: The expression will be used directly. (@since 2.0.37)
     *
     * That this method will automatically call the `one()` method and return an [[ActiveRecordInterface|ActiveRecord]]
     * instance.
     *
     * > Note: As this is a short-hand method only, using more complex conditions, like ['!=', 'id', 1] will not work.
     * > If you need to specify more complex conditions, use [[find()]] in combination with [[ActiveQuery::where()|where()]] instead.
     *
     * (...)
     *
     * @param mixed $condition primary key value or a set of column values
     * @return static|null ActiveRecord instance matching the condition, or `null` if nothing matches.
     */
    public static function findOne($condition);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants