Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add n argument to TakeFirst processor #38

Open
kmike opened this issue Apr 8, 2014 · 5 comments
Open

Add n argument to TakeFirst processor #38

kmike opened this issue Apr 8, 2014 · 5 comments
Labels
enhancement New feature or request

Comments

@kmike
Copy link
Member

kmike commented Apr 8, 2014

What do you think about adding "n" argument to TakeFirst processor?

  • By default it would be None, meaning the existing behaviour.
  • TakeFirst(3) would mean "return a list with at most 3 non-empty values".
@Digenis
Copy link
Member

Digenis commented Apr 13, 2014

It lets the return type depend on an argument, TakeFirst(None)(List) would return just an element of the list while TakeFirst(1)(List) an element of the list wrapped in a list.
Personally I don't like making the types less predictable but python does this already:

  • operator.itemgetter() from the builtins behaves similarly, the return type implicitly depends on the number of its arguments.
  • "str %" formatting expects a single right term or a tuple of them depending on the number of % placeholders in the string.

On the one hand TakeFirst(3) looks like natural language but on the other it implies a different return type.

I would rather see a function, like an argument to filter, passed to TakeFirst() allowing for customization of the default "value is not None and value != '' " (I am just saying this here/now because it conflicts with the above proposal)

@Digenis
Copy link
Member

Digenis commented Apr 13, 2014

From my perspective, the loader.processor lacks two loaders.

I 'd go a bit further, since your TakeFirst(-1) made me think of:
lambda n: lambda l: filter(None, l)[:n]
In the snippet notice that the first argument to filter() could also be customized, (without making the return type depend on the argument to the factory):
lambda f=None: lambda l: filter(f, l)
(Of course I oversimplify it here with None just to keep the snippet simple)
Let us call it "Filter" since "TakeFirst" return a single element.

Now "Slice":
Slice = lambda a, b: lambda l: l[a, b]
the operator module lacks a slicegetter anyway
Now the example with TakeFirst(3) would be written: Compose(Filter(), Slice(None, 3))
and why not name the composition as processor.TakeSlice or TakeFirstN, a shortcut.

Most processors offer practical shortcuts for long compositions of lambdas, "functools" and "operator". I see TakeFirst as a shortcut of shorcuts.

To say this in plain code Digenis/scrapy@1f088d4
and doctests to demonstrate their behaviour https://github.com/Digenis/scrapy/blob/0e9c95dfcccb1628d51bf500eca60b683ab95fb5/scrapy/contrib/loader/processor.py#L52-L100

@redapple
Copy link
Contributor

(just digging out old issues)
If we really want that (disclaimer: I don't use loaders myself), I'd favor a seperate processor TakeFirstN(n), always returning a list of at most n elements

@cathalgarvey
Copy link

@kmike - Is this something you'd still like to see? We might resolve some of @Digenis' concerns with a new processor, TakeFirstN, which would simplify typing and clarity issues a lot?

@kmike
Copy link
Member Author

kmike commented Feb 20, 2018

TakeFirstN sounds fine to me, though I haven't used item loaders for a long time.

@Gallaecio Gallaecio transferred this issue from scrapy/scrapy Oct 30, 2020
@Gallaecio Gallaecio added the enhancement New feature or request label Feb 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants