Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User error? #129

Open
claytondukes opened this issue Oct 10, 2023 · 2 comments
Open

User error? #129

claytondukes opened this issue Oct 10, 2023 · 2 comments

Comments

@claytondukes
Copy link

Firstly, I love that you've made this. However, I'm having some trouble getting it to work properly and I think maybe it's just a user/documentation error. I don't quite get the:

make companies
or
make random
or
make byname

Like, which one is for what?
If I try make companies, I get the following in the log. If I connect to the vnc, I do the security check and it passes. Then scrape logs in, gets my linkedin page and then just sits there doing nothing, which then results in scrape exiting.

Successfully built 4d01d326b5743b603e198a3c558391123e315f4965f96722a5d4b4703b967ab7
docker-compose up scrapy_companies
selenium is up-to-date
Starting linkedin_scrapy_companies_1 ... done
Attaching to linkedin_scrapy_companies_1
scrapy_companies_1  | --2023-10-10 18:38:35--  http://selenium:4444/wd/hub
scrapy_companies_1  | Resolving selenium (selenium)... 172.21.0.2
scrapy_companies_1  | Connecting to selenium (selenium)|172.21.0.2|:4444... connected.
scrapy_companies_1  | HTTP request sent, awaiting response... 302 Found
scrapy_companies_1  | Location: http://selenium:4444/wd/hub/static/resource/hub.html [following]
scrapy_companies_1  | --2023-10-10 18:38:35--  http://selenium:4444/wd/hub/static/resource/hub.html
scrapy_companies_1  | Reusing existing connection to selenium:4444.
scrapy_companies_1  | HTTP request sent, awaiting response... 200 OK
scrapy_companies_1  | Length: 160 [text/html]
scrapy_companies_1  | Saving to: ‘STDOUT’
scrapy_companies_1  |
scrapy_companies_1  |      0K                                                       100%<!DOCTYPE html>
scrapy_companies_1  | <title>WebDriver Hub</title>
scrapy_companies_1  | <link rel="stylesheet" href="style.css">
scrapy_companies_1  | <script src="client.js"></script>
scrapy_companies_1  | <body>
scrapy_companies_1  | <script>init();</script>
scrapy_companies_1  | </body>
scrapy_companies_1  |  29.7M=0s
scrapy_companies_1  |
scrapy_companies_1  | 2023-10-10 18:38:35 (29.7 MB/s) - written to stdout [160/160]
scrapy_companies_1  |
scrapy_companies_1  | Selenium is up - executing command
scrapy_companies_1  | INFO:root:***** SECURITY CHECK IN PROGRESS *****
scrapy_companies_1  | INFO:root:Please perform the security check on selenium, you have 30 seconds...
scrapy_companies_1  | INFO:root:***** SECURITY CHECK COMPLETED *****
linkedin_scrapy_companies_1 exited with code 0
@claytondukes
Copy link
Author

claytondukes commented Oct 10, 2023

Just to be sure it wasn't something I changed, I re-checked the repo out, set my conf.py, then ran make companies.
NB: On first run, it errors with:

scrapy_companies_1  |   File "sequential_run.py", line 42, in <module>
scrapy_companies_1  |     open(file_name, "w").close()
scrapy_companies_1  | FileNotFoundError: [Errno 2] No such file or directory: 'data/companies/data.csv'

so I just did a mkdir data/companies and touch data/companies/data.csv, then ran it again.

Now, it's "running", but the vnc just sits at the main homepage for my user, never clicks/does anything. And the log for make companies just stays at the following, but never exits:

Recreating linkedin_scrapy_companies_1 ... done
Attaching to linkedin_scrapy_companies_1
scrapy_companies_1  | --2023-10-10 18:49:44--  http://selenium:4444/wd/hub
scrapy_companies_1  | Resolving selenium (selenium)... 172.21.0.2
scrapy_companies_1  | Connecting to selenium (selenium)|172.21.0.2|:4444... connected.
scrapy_companies_1  | HTTP request sent, awaiting response... 302 Found
scrapy_companies_1  | Location: http://selenium:4444/wd/hub/static/resource/hub.html [following]
scrapy_companies_1  | --2023-10-10 18:49:44--  http://selenium:4444/wd/hub/static/resource/hub.html
scrapy_companies_1  | Reusing existing connection to selenium:4444.
scrapy_companies_1  | HTTP request sent, awaiting response... 200 OK
scrapy_companies_1  | Length: 160 [text/html]
scrapy_companies_1  | Saving to: ‘STDOUT’
scrapy_companies_1  | <!DOCTYPE html>
scrapy_companies_1  | <title>WebDriver Hub</title>
scrapy_companies_1  | <link rel="stylesheet" href="style.css">
scrapy_companies_1  | <script src="client.js"></script>
scrapy_companies_1  | <body>
scrapy_companies_1  | <script>init();</script>
scrapy_companies_1  | </body>
scrapy_companies_1  |
scrapy_companies_1  |      0K                                                       100% 32.7M=0s
scrapy_companies_1  |
scrapy_companies_1  | 2023-10-10 18:49:44 (32.7 MB/s) - written to stdout [160/160]
scrapy_companies_1  |
scrapy_companies_1  | Selenium is up - executing command

@raithedavion
Copy link

Mine does the exact same thing. I've put the company URL in the "companies.txt" file, and nothing. Nothing happens via VNC/etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants