Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HN Search API limits number of hits to 1000, regardless of page parameter #230

Open
harabat opened this issue Aug 29, 2022 · 4 comments
Open

Comments

@harabat
Copy link

harabat commented Aug 29, 2022

I am trying to fetch all stories posted in a given period. I expected to be able to get all 5k results, but am only able to get 1k.

This limit is not made explicit on the HN Search API reference.

The issue has already been raised in #125, where using the page parameter was suggested as a workaround: this no longer works.

The issue also has also been mentioned in a StackOverflow question, with no answer specific to Algolia's HN Search API.

This might be expected behaviour, but it is not documented anywhere as far as I know.


My query:

http://hn.algolia.com/api/v1/search_by_date?tags=story&numericFilters=created_at_i%3E1661122800.0,created_at_i%3C1661727600.0&hitsPerPage=100

The output for page 9 of results:

{
"hits":[...]
"nbHits":5562,
"page":9,
"nbPages":10,
"hitsPerPage":100,
"exhaustiveNbHits":true,
"exhaustiveTypo":true,
"query":"",
"params":"advancedSyntax=true&analytics=true&analyticsTags=backend&hitsPerPage=100&numericFilters=created_at_i%3E1661122800.0%2Ccreated_at_i%3C1661727600.0&page=9&tags=story",
"processingTimeMS":5,
"processingTimingsMS":{...}
}

The output for page 10 of results:

{
  "hits": [],
  "page": 10,
  "nbHits": 0,
  "nbPages": 0,
  "hitsPerPage": 100,
  "exhaustiveNbHits": true,
  "exhaustiveTypo": true,
  "exhaustive": {
    "nbHits": true,
    "typo": true
  },
  "processingTimeMS": 1,
  "message": "you can only fetch the 1000 hits for this query. You can extend the number of hits returned via the paginationLimitedTo index parameter or use the browse method. You can read our FAQ for more details about browsing: https://www.algolia.com/doc/guides/sending-and-managing-data/manage-your-indices/how-to/export-an-algolia-index/#exporting-the-index-using-an-api-client",
  "query": "",
  "params": "advancedSyntax=true&analytics=true&analyticsTags=backend&hitsPerPage=100&numericFilters=created_at_i%3E1661122800.0%2Ccreated_at_i%3C1661727600.0&page=10&tags=story"
}
@harabat harabat changed the title Query limits number of hits to 1000, does not take into account HN Search API limits number of hits to 1000, regardless of page parameter Aug 29, 2022
@AleksandarJeftic
Copy link

This is making api useless for my project, where I have to fetch all hits.

@cmgchess
Copy link

cmgchess commented Nov 4, 2022

i guess this is because paginationLimitedTo is set to 1000 as default.
and to get more than 1000 you will need to use browse instead of search where you will also need access to a key with browse capability afaik.

@harabat
Copy link
Author

harabat commented Nov 5, 2022

This is making api useless for my project, where I have to fetch all hits.

My workaround was to write a script that splits whatever period I'm querying day by day (so a for loop that queries Mon - Sun instead of a full week).

@harabat
Copy link
Author

harabat commented Nov 5, 2022

i guess this is because paginationLimitedTo is set to 1000 as default. and to get more than 1000 you will need to use browse instead of search where you will also need access to a key with browse capability afaik.

Thanks @cmgchess for looking into this, I had found that resource before posting, but the use of that endpoint seems to be for Algolia's customers really: it's unlikely that all those trying to query HN Search API could request such a key, especially if the key needs to be renewed every X weeks.

My workaround (#230 (comment)) is fine for me for now, but I thought I'd keep the issue open as this is still unexpected and undocumented behaviour (as demonstrated by my sources).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants