Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize authority class constructor #1310

Open
SubJunk opened this issue Jan 13, 2022 · 27 comments
Open

Optimize authority class constructor #1310

SubJunk opened this issue Jan 13, 2022 · 27 comments

Comments

@SubJunk
Copy link
Contributor

SubJunk commented Jan 13, 2022

I did some profiling of zenphoto using xdebug and kcachegrind, and it revealed this query, which for me (since there are many admins) is a big operation, taking about 30% of the load time of the request
https://github.com/zenphoto/zenphoto/blob/master/zp-core/class-authority.php#L32

Is it really necessary to populate this admin_all variable in the constructor? Surely most users don't need that information, unless I misunderstand?

image

@sbillard
Copy link
Contributor

This is pretty ingrained in the structure. Of course anything is possible. Would you be interested in helping test out such a changer? (It is likely to be a bit destabilizing.)

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 14, 2022

@sbillard yes, I would be very happy to help test any performance improvements

@acrylian
Copy link
Member

@SubJunk Yes, this $admin_all property is pretty tied into lots of places via the getAdministrators() methods. But changes are possible as sbillard (who is not involved in Zenphoto anymore) rightly wrote.

I assume the main problem is the huge SELECT * query which should generally be avoided and we do have in several other places. Sp that query should be limited to the columns really needed. Basically the username and/or id only should be sufficient. If more/full data of an admin is needed it should be fetched as an object via the administrator class. Such a change in the return values may likely break third party code.

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 14, 2022

@acrylian I can test whether there is a difference in performance if I limit the columns, do you think it will be enough to just change the query for the benchmarking? If so, which columns?

@acrylian
Copy link
Member

@acrylian I can test whether there is a difference in performance if I limit the columns, do you think it will be enough to just change the query for the benchmarking? If so, which columns?

The constructor ifself only uses three columns: id, valid and user . If you can try taht isolated as that it will be useful. It iikely will break code as getAdministrators8) returns the full and I have not yet reviewed how that is used in all places.

But maybe we don't need the full array already at all. Additionally I will review to relocated the query stuff to getAdiminstrators() itself so the data is only fetched when actually requested.

@acrylian
Copy link
Member

I review usages and we use around half of the columns in many places with getAdministrators():

  • id
  • user
  • rights
  • name,
  • group
  • email
  • pass
  • custom_data
  • valid
  • date

If you can try those as well to see if that make a difference. The best in the long run is to only use id, valid and user in general and use new Administrator (user, valid)N within loops instead of a huge initial array to get full data of admins when needed.

acrylian added a commit that referenced this issue Jan 14, 2022
…he getAdministrators() methods itself and for the master_users name to new method getMasterUserName()
@acrylian
Copy link
Member

@SubJunk A start: I did not optimize the query itself but I moved fetching he data to the getAdministrators() method limiting the query to the actual requested data. This should not break anything yet anywhere. Next step is to limit the columns to fetch.

@sbillard
Copy link
Contributor

@SubJunk

First path implementation is at https://github.com/sbillard/netPhotoGraphics-DEV/releases/tag/Dev_2.0%2B

As above, the load happens only on request, not in the constructor. This does optimize what subclass gets loaded (valid users, groups/templates, other users) but I have not attempted to filter the fields that are loaded. Doing that would be a future "gotcha" as the function proports to return the users, so any use should be able to assume all user data is present.

Obviously, future discussions of this particular implementation should be moved to my development github repository.

@acrylian
Copy link
Member

If you don’t intend to actively contribute to Zenphoto please refrain from commenting on our tickets in the future…

@sbillard
Copy link
Contributor

Sorry, but by placing your code on github you make it public. I will make comments as appropriate.

I did suggest to @SubJunk that the conversation go forward on my netPhotoGraphics development repository which is appropriate for this topic as it will be specific tp netPhotoGraphics. But for conversations that are not specific I will post as appropriate.

@acrylian
Copy link
Member

Sorry, but by placing your code on github you make it public. I will make comments as appropriate.

Of course and no problem with that at all.

I did suggest to @SubJunk that the conversation go forward on my netPhotoGraphics development repository which is appropriate for this topic as it will be specific tp netPhotoGraphics. But for conversations that are not specific I will post as appropriate.

Exactly that is the point as that has no place here at all.

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 16, 2022

I'll hopefully do some testing tonight and will post the results

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 17, 2022

@acrylian that change has improved the stats on that route, here's the new one:
image
So that query has gone from taking 28.07% of the time to 20.96%, partly/mostly due to receiving about 300 fewer results AFAICT.

@acrylian @sbillard as a long-time user of zenphoto I'm aware of the split in the development team and I'm sorry for it. You're both talented developers and have both helped me a lot before, both directly on the forums and indirectly by developing this codebase.

@sbillard I don't have a copy of netPhotoGraphics running in production so it would be difficult for me to run profiling on it

@acrylian
Copy link
Member

Thanks, that's better although not that much. I will try a further change with limited columns in the result soon.

@sbillard
Copy link
Contributor

@SubJunk:

There will be some improvement by limiting the columns returned to just the required ones. But really that will not amount to much. Mostly the performance improvement comes from not fetching the admins when they are not required. So loading the front-end will be where you see the most improvements.

Anyway, I am more interested in stability, so any general testing would be useful.

@acrylian:

Your implementation of passing a parameter to getAdministrators() with the desired columns (at least in its current implementation) has a problem. The function caches the list of items it finds. So if it is called with a particular subset of fields, then called a second time with a different subset the second caller will not get the results he needs.

You could either cache by field set or not cache. Either choice probably cost more than the performance savings. There is a defined required set of fields that the function should return. See auth.php.

@acrylian
Copy link
Member

acrylian commented Jan 17, 2022

Yes, thanks, I am aware of the caching in the properties. If this is kept I will add an array index depending on the option to differ. Probably the full array will be deprecated in the long run and just the base columns with the administrator class will be used.

@acrylian
Copy link
Member

@SubJunk: I sadly didn't get around to implement the limited column set for testing. I hopefully be able tomorrow and let you know if there is more to test for you.

@zenphoto zenphoto deleted a comment from sbillard Jan 18, 2022
@fretzl
Copy link
Member

fretzl commented Jan 18, 2022

@sbillard: We have to be blunt. This ticket is about Zenphoto and not about your project.
Therefore your last comment has been removed. If @SubJunk decides to use your project, he will surely do so.
Do not use our platforms for your own project.

acrylian added a commit that referenced this issue Jan 19, 2022
@acrylian
Copy link
Member

@SubJunk Before I rework everything to use the "basedata" option with the administrator object just for testing with currently lacking time, please how a limited "coredata" column set works for you. Those are the columns basically I noted above that are used by core (I forgot one). Also as noted correctly above I have fixed the internal caching issue.

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 22, 2022

@acrylian I'll get back to this soon, I've had some problems with the zenphoto instance in the last 3 days that have taken my attention. I'll make issues for those too

@acrylian
Copy link
Member

@SubJunk Take your time!

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 31, 2022

Sorry for the delay, I finally got to this. The numbers look a little worse this time but I guess it's within the expected variation

image

@acrylian
Copy link
Member

Thanks, actually that is not really that expected. The queries are now done on request only and they have less columns/data. Sure this isn't a temp variation…? I will next try the version with onky the base cols and using the administratr class to get the actualy data per admin when needed. That's a little more work in serveral places so might take some days until I get to it. Of course if you have really a lot users that will not reduce these ;-)

@sbillard
Copy link
Contributor

I would not expect a significant difference just for the getAdministrators() queries. For your site the prior numbers would have been dominated by the number of users--groups and templates would be noise level additions. The real benefit of this change was to avoid doing the getAdministrators() call whenever possible.

@SubJunk
Copy link
Contributor Author

SubJunk commented Jan 31, 2022

The server does seem a lot healthier after the recent changes. The way I'm doing the profiling means I won't see the routes that are no longer using the getAdministrators call - I'm running the profiler in production and finding the matching reports manually, as opposed to directly testing.
Given that, this change might be helping greatly.

@sbillard
Copy link
Contributor

More on the performance difference between SELECT * and SELECT field list. From what I have read, there are three components that would suggest using SELECT field list over SELECT *. The first is that for SELECT * the database server must look in its configuration tables to get the field names to retrieve. The second is that there may with specific fields called out the server may be able to optimize the select using index keys. The third is that (when the database server is separate from the HTTP server) the data transport would be less.

The first case is unlikely to be significant when there are a large number of rows fetched--the lookup happens only once so its cost is drowned out by the cost of the select.

The second is probably not relevant to the administrator table. All the records will be in the key tables anyway.

The data transfer could be an issue depending on things like the address fields (if enabled.) But I really doubt that in practice there would be a measurable difference between returning all the fields and just the subset of commonly needed ones.

Still, it is probably good practice to limit the fields fetched to those most used. It certainly does not add to the cost.

@acrylian:

I would caution about your idea of fetching minimal fields then instantiating an administrator object to get the data when it is needed. Object instantiation is fairly expensive. If the code will instantiate the object anyway it won't matter, but for all the places where all that is needed is a list of administrators and the key data fields your proposal will degrade performance.

What really is needed is profiling the whole PHP process. Focusing on a single area can be a missing forest for the trees.

@acrylian
Copy link
Member

Glad to see it helped in general at least. Another point about SELECT * queries is that they take more memory if fetching a lot data. And yes, I agree, creating an administator object always is surely much overhead if just needing limited data and not the full admin object. Which probably applies to many usages.

acrylian added a commit that referenced this issue Feb 20, 2022
@acrylian acrylian added this to the 1.6 milestone May 26, 2022
@acrylian acrylian removed this from the 1.6 milestone Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants