Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limited too soon #51

Open
mxmzb opened this issue Apr 26, 2024 · 2 comments
Open

Rate limited too soon #51

mxmzb opened this issue Apr 26, 2024 · 2 comments

Comments

@mxmzb
Copy link

mxmzb commented Apr 26, 2024

I'm running the script locally like this:

yarn index example.com

❯ yarn index example.com
πŸ”Ž Processing site: sc-domain:example.com
πŸ‘‰ Found 1189 URLs in 2 sitemap
πŸ“¦ Batch 1 of 24 complete
πŸ“¦ Batch 2 of 24 complete
πŸ“¦ Batch 3 of 24 complete
πŸ“¦ Batch 4 of 24 complete
πŸ“¦ Batch 5 of 24 complete
πŸ“¦ Batch 6 of 24 complete
πŸ“¦ Batch 7 of 24 complete
πŸ“¦ Batch 8 of 24 complete
πŸ“¦ Batch 9 of 24 complete
πŸ“¦ Batch 10 of 24 complete
πŸ“¦ Batch 11 of 24 complete
πŸ“¦ Batch 12 of 24 complete
πŸ“¦ Batch 13 of 24 complete
πŸ“¦ Batch 14 of 24 complete
πŸ“¦ Batch 15 of 24 complete
πŸ“¦ Batch 16 of 24 complete
πŸ“¦ Batch 17 of 24 complete
πŸ“¦ Batch 18 of 24 complete
πŸ“¦ Batch 19 of 24 complete
πŸ“¦ Batch 20 of 24 complete
πŸ“¦ Batch 21 of 24 complete
πŸ“¦ Batch 22 of 24 complete
πŸ“¦ Batch 23 of 24 complete
πŸ“¦ Batch 24 of 24 complete

πŸ‘ Done, here's the status of all 1189 pages:
β€’ βœ… Submitted and indexed: 410 pages
β€’ πŸ‘€ Crawled - currently not indexed: 151 pages
β€’ πŸ‘€ Discovered - currently not indexed: 2 pages
β€’ πŸ”€ Page with redirect: 2 pages
β€’ 🚦 RateLimited: 506 pages
β€’ ❌ Server error (5xx): 9 pages
β€’ ❌ Alternate page with proper canonical tag: 1 pages
β€’ ❌ Duplicate, Google chose different canonical than user: 108 pages

✨ Found 659 pages that can be indexed.

[... list of urls]

πŸ“„ Processing url: https://example.com/foo/bar
πŸ•› Indexing already requested previously. It may take a few days for Google to process it.

πŸ“„ Processing url: https://example.com/foo/bar1
🚦 Rate limit exceeded, try again later.

The rate limit exceeds after only around 100-120 urls, and if I rerun it starts from start and again aborts on rate limit aroudn 100-120 urls, so I'm not able to request index for all the URLs that come later.

What am I doing wrong?

@goenning
Copy link
Owner

I think this was fixed by a recent PR, want to try again?

@ostwilkens
Copy link

The cache isn't being written to when the url:s are being processed. When the rate limit is exceeded, the program exits with a cache full of "RateLimited". Then when you run it again, it starts from the beginning, and gets rate limited at the same place again.

I added this to index.ts:168 and it can now pick up where it left off:

    statusPerUrl[url] = { status: Status.SubmittedAndIndexed, lastCheckedAt: new Date().toISOString() };
    writeFileSync(cachePath, JSON.stringify(statusPerUrl, null, 2));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants