Splash memory leak #312

Ethan353 · 2024-01-13T12:35:06Z

I have used scrapy splash for requesting in my crawling service. after amount of time my services usage of ram increase continuesly and after a while they use all ram of a vm. the wierd thing is splash service it self works properly but services which use splash for requests have memory leak. for more detail here is my code snippet and splash config i uses:
code:

if condition_to_use_splash:
    return SplashRequest(url, errback=self.errback, callback=self.parse, meta=metadata, args={'wait': 7})
else:
    return FormRequest(url, dont_filter=True, errback=self.errback,method=method, formdata=parameter, meta=metadata)

config:

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DOWNLOADER_MIDDLEWARES = {
    'solaris_scrapy.solaris_scrapy.middlewares.ProxyMiddleware': 100,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

I use splash 3.1 as splash image and it is my splash service docker compose:

services:
  splash:
    image: scrapinghub/splash:3.1
    ports:
      - "prot:port"
    networks:
      - net

note that I run my code on a vm in a docker container.
what do you think I should do about. I also aware of memory limit, maxrss and slots for preventing splash use lots of ram but this way causes my crawling service misses bunch of websites. how should I handle It in my code?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splash memory leak #312

Splash memory leak #312

Ethan353 commented Jan 13, 2024

Splash memory leak #312

Splash memory leak #312

Comments

Ethan353 commented Jan 13, 2024