Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit parameter is confusing #259

Open
roysmith opened this issue Jul 17, 2020 · 7 comments
Open

Limit parameter is confusing #259

roysmith opened this issue Jul 17, 2020 · 7 comments

Comments

@roysmith
Copy link

I've got version 0.10.0. Running this code:

import mwclient

site = mwclient.Site('en.wikipedia.org')
results = site.usercontributions('RoySmith-Mobile', limit=5)
print(len(list(results)))

prints:

91

Expected result: it should print "5". As far as I can tell, the limit parameter is ignored.

@roysmith
Copy link
Author

Sigh. Did some more digging. The limit parameter is just passed down to the API level where it really means the chunk size over which the List object iterates. The point of a high-level API is to provide a higher level abstraction, not just expose all the low level details in a slightly different skin.

@danmichaelo
Copy link
Member

danmichaelo commented Jul 21, 2020

For what it's worth, I agree that the parameter name is quite confusing! Wouldn't really mind changing it to chunk_size, except it's a breaking change, so the old parameter name would have to deprecated and kept for some time.

Also agree it's not very elegant having to do something like this:

import itertools
contribs = list(itertools.islice(site.usercontributions('RoySmith-Mobile'), 5))

or

results_generator = site.usercontributions('RoySmith-Mobile')
results = [next(results_generator) for _ in range(5)]

just for a common operation like fetching a limited set of results. Thinking about it, one way to make the API a bit more fluent without breaking backwards compatibility, could be to add a new method on the List class, that would return the first N results:

>>> results = site.usercontributions('RoySmith-Mobile').take(5)
>>> print(len(results))
5

The method could look something like this:

def take(self, limit):
    """Return a list of up to `limit` results"""
    return list(itertools.islice(self, limit))

@danmichaelo danmichaelo reopened this Jul 21, 2020
@danmichaelo danmichaelo changed the title Site.usercontributions() ignores limit Limit parameter is confusing Jul 21, 2020
@roysmith
Copy link
Author

I'm unsure we want to start adding methods like that; we already have the general-purpose itertools library which does all of these sorts of things.

But, more than that, returning an iterator is good. We certainly don't want to listify the results. What if you looked at the contributions for [[User:User:Ser Amantio di Nicolao]], with over 3 million edits?

I noticed this is called out in a debian release note so I guess I'm not the first person to be bitten by this.

@danmichaelo
Copy link
Member

danmichaelo commented Jul 21, 2020

Absolutely agree that iterators should be the standard. This would just be a convenience method, to use when requesting a small number of results, which I think is quite a common use case. But it would only be a minor convenience over using itertools.islice directly, so it might not be worth it.

I see that we already have mentioned islice at https://mwclient.readthedocs.io/en/latest/user/page-ops.html#listing-page-revisions , but it could perhaps be mentioned more places. Let me know if you have specific ideas for improving the documentation (or even better: a PR). Perhaps adding a warning to some of the docstrings would help?

@roysmith
Copy link
Author

Looking through the code, I see listing.List already has support for a max_items parameter which looks like it does the right thing. There's also a mysterious "# NOTE: Fix limit" comment. I haven't dug far enough back in the history, but I'm assuming these are related.

So, maybe just expose max_items at the Site.usercontributions() level?

As much as I argued against duplicating the islice() functionality inside mwclient, there is at least the self-documenting feature. If I was a lazy programmer and didn't read all the documentation before doing anything (yeah, right), at least I would see that the method had both a "max_items" and a "limit" and think to myself, "Oh, that's weird, how do these two differ from each other?", and go digging a bit deeper.

And, to be honest

site.usercontributions(user, max_items=n)

is easier to read than

islice(site.usercontributions(user), n)

@roysmith roysmith reopened this Jul 23, 2020
@roysmith
Copy link
Author

Ooops, sorry, didn't mean to close that. Clicked the wrong button.

@danmichaelo
Copy link
Member

Interesting, I have actually never noticed the max_items parameter before :D I'm open to the idea of exposing that on the various list methods. It's not ideal, but as you say, it might cause some users to stop, think and read the documentation when they see both limit and max_items :) Would you be interested in creating a PR for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants