Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Downloads (edited title) #161

Open
andynuss opened this issue Oct 21, 2021 · 4 comments · May be fixed by ulixee/secret-agent#332
Open

Support Downloads (edited title) #161

andynuss opened this issue Oct 21, 2021 · 4 comments · May be fixed by ulixee/secret-agent#332

Comments

@andynuss
Copy link

andynuss commented Oct 21, 2021

When I had been scraping a user's requestUrl with playwright, given that some urls that do not end with the suffix '.pdf' ARE in fact pdfs, and in even rarer cases, some urls that end with '.pdf' are actually text/html, I had been using playwright to tell me if the document was a pdf.

i.e. I looked at the 'content-type' header found in the playwright page.goto() response, and made sure it is 'text/html', before doing further things with that visited document.

But when I use the agent.goto function to visit any pdf in secret-agent, I get something like the following exception:

Error: net::ERR_ABORTED
    at Page.navigate (/Users/andynuss/repos/stag-secret-agent/app/node_modules/puppet-chrome/lib/Page.ts:212:45)
    at runNextTicks (internal/process/task_queues.js:58:5)
    at processImmediate (internal/timers.js:434:9)
    at Timer.waitForPromise (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Timer.ts:56:20)
    at Tab.goto (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/lib/Tab.ts:369:5)
    at CommandRecorder.runCommandFn (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/lib/CommandRecorder.ts:73:16)
    at ConnectionToClient.executeCommand (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/server/ConnectionToClient.ts:324:14)
    at ConnectionToClient.handleRequest (/Users/andynuss/repos/stag-secret-agent/app/node_modules/core/server/ConnectionToClient.ts:70:14)
------REMOTE CORE---------------------------------
    at Function.reviver (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/TypeSerializer.ts:208:26)
    at JSON.parse (<anonymous>)
    at Function.parse (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/TypeSerializer.ts:24:17)
    at WebSocket.<anonymous> (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/RemoteConnectionToCore.ts:67:42)
    at WebSocket.emit (events.js:315:20)
    at Receiver.receiverOnMessage (/Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/websocket.js:983:20)
    at Receiver.emit (events.js:315:20)
    at Receiver.dataMessage (/Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/receiver.js:517:14)
    at /Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/receiver.js:468:23
    at /Users/andynuss/repos/stag-secret-agent/app/node_modules/ws/lib/permessage-deflate.js:308:9
------CONNECTION----------------------------------
    at new Resolvable (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Resolvable.ts:17:18)
    at Object.createPromise (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/utils.ts:68:10)
    at RemoteConnectionToCore.createPendingResult (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/ConnectionToCore.ts:328:31)
    at RemoteConnectionToCore.internalSendRequestAndWait (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/ConnectionToCore.ts:253:43)
    at RemoteConnectionToCore.sendRequest (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/connections/ConnectionToCore.ts:156:17)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at Object.cb (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/CoreCommandQueue.ts:104:26)
    at Queue.next (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Queue.ts:82:19)
------CORE COMMANDS-------------------------------
    at Queue.run (/Users/andynuss/repos/stag-secret-agent/app/node_modules/commons/Queue.ts:35:19)
    at CoreCommandQueue.run (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/CoreCommandQueue.ts:100:8)
    at CoreTab.goto (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/CoreTab.ts:92:36)
    at Tab.goto (/Users/andynuss/repos/stag-secret-agent/app/node_modules/client/lib/Tab.ts:160:36)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at testUrl (/Users/andynuss/repos/stag-secret-agent/app/src/test.js:51:30)
    at /Users/andynuss/repos/stag-secret-agent/app/src/test.js:130:5
@blakebyrnes
Copy link
Contributor

blakebyrnes commented Oct 22, 2021

I'm guessing this is triggering a file download, which in Chrome, will sometimes redirect to load into the browser. I got halfway through the downloads PR and then the dev who was interested in implementing it kind of disappeared.

@andynuss
Copy link
Author

I noticed that I previously found a similar behavior with puppeteer, and that this puppeteer issue indicated it had
something to do with chromium?

puppeteer/puppeteer#2794

@blakebyrnes
Copy link
Contributor

I don't see any confirmation in the thread, but it sounds to me like this is because headless chrome triggers downloads when it encounters PDFs. Which makes sense because headless chrome has no "plugins" installed in it, and the plugins are what knows how to render PDFs. Like I said, I need to finish (or get someone's help?!?! hint, hint) the PR I linked to above. Been wrapped up in some things for the new Hero project, so I haven't been able to get to this.

@blakebyrnes blakebyrnes linked a pull request Oct 30, 2021 that will close this issue
9 tasks
@blakebyrnes blakebyrnes linked a pull request Nov 15, 2021 that will close this issue
9 tasks
@blakebyrnes blakebyrnes transferred this issue from ulixee/secret-agent Oct 14, 2022
@blakebyrnes
Copy link
Contributor

Existing PR in SecretAgent

There's a PR that was mostly completed against SecretAgent. It can be mostly applied to the Agent repo. HOWEVER.. I came away thinking that the best approach for this was actually to allow Downloads to behave like normal resources.

Request Interception

I think to achieve this, we might want "request interception" with an ability to "stream" the response body as it becomes available.

@blakebyrnes blakebyrnes changed the title an error: NET::ABORTED is returned when visiting a pdf Support Downloads (edited title) Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants