Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Puppeteer in Headless Mode for Captcha Solving #49

Closed
alpgul opened this issue May 8, 2024 · 5 comments
Closed

Running Puppeteer in Headless Mode for Captcha Solving #49

alpgul opened this issue May 8, 2024 · 5 comments

Comments

@alpgul
Copy link

alpgul commented May 8, 2024

When I activate the headless mode, Puppeteer can't solve the capctra. Is there a way to run it in headless mode?

Additionally, the following solution is more accurate. Since we disable Puppeteer's access to Cloudflare, there won't be iframe access, so it would be more appropriate to manipulate the response and communicate with the iframe for a more accurate solution.

const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
const script = `<script>const targetSelector = 'input[type="checkbox"]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        const node = addedNode.querySelector(targetSelector);
        if (node) {
          setTimeout(()=>{node.parentElement.click();},1000);
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true,
};
observer.observe(targetElement, observerOptions);</script>`;
function targetFilter(target) {
  if (target._getTargetInfo().type !== "iframe") {
    return true;
  }
  return false;
}
const main = async () => {
  const browser = await puppeteer.launch({
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",

    targetFilter,
    headless: false,
  });
  const page = await browser.newPage();

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
               body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });
  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });
  try {
    await page.waitForSelector(".link_row", {
      timeout: 100000,
    });
  } catch (error) {
    console.error(error);
  }
  await page.screenshot({ path: "example.png" });
  await browser.close();
};
main();
@zfcsoftware
Copy link
Owner

Hi, this system works in simple cloudflare encounters but not in a situation like this. It might be because the captcha is clicked with javascript and the targetFilter function is not enough. Since the purpose of the library is not only cloudflare (e.g. google login), we use it this way. You can use wsl and docker in Windows environment to start the browser incognito.

2024-05-08.22-17-40.mp4

@alpgul
Copy link
Author

alpgul commented May 10, 2024

I have found the reason why headless mode is not working, and it is because the user-agent within the iframes does not change. This is because the program skips the iframes. When I changed the user agents through remote debugging using DevTools, the headless mode started working. The only problem is changing the user agents within iframes without using Puppeteer

@alpgul
Copy link
Author

alpgul commented May 10, 2024

https://hmaker.github.io/selenium-detector/
You can test whether it is captured headless with this testing tool.
kaliiiiiiiiii/Selenium-Driverless#86
This link also explains how it was detected.

@alpgul
Copy link
Author

alpgul commented May 10, 2024

const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
function targetFilter(target) {
  const session = target._session();
  if (session) {
    session.send = new Proxy(session.send, {
      apply(target, thisArg, args) {
        if ("Runtime.enable" === args[0]) {
          return Promise.resolve();
        } else {
          const result = Reflect.apply(target, thisArg, args);
          return result;
        }
      },
    });
  }
  return true;
}
const script = `<script>
Element.prototype._addEventListener = Element.prototype.addEventListener;
Element.prototype.addEventListener = function () {
    let args = [...arguments]
    let temp = args[1];
    args[1] = function () {
        let args2 = [...arguments];
        args2[0] = Object.assign({}, args2[0])
        args2[0].isTrusted = true;
        return temp(...args2);
    }
    return this._addEventListener(...args);
}
const targetSelector = 'input[type=checkbox]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        if (addedNode.nodeType === addedNode.ELEMENT_NODE) {
        const node = addedNode?.querySelector(targetSelector);
        if (node) {          
          setTimeout(()=>{
            node.parentElement.click();
          },1000);
        }
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true
};
observer.observe(targetElement, observerOptions);
//document.querySelector('script').remove();
</script>`;
async function main() {
  const browser = await puppeteer.launch({
    ignoreDefaultArgs: ["--enable-automation"],
    args: ["--disable-blink-features=AutomationControlled"],
    defaultViewport: null,
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    headless: "new",
    debuggingPort: 9222,
    targetFilter,
  });
  let page = (await browser.pages())[0];
  await page.setViewport({
    width: 1920,
    height: 1080,
  });
  await page.setUserAgent(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
  );

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
        body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });

  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });

  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.screenshot({ path: "example.png" });
  console.log("Closing browser");
  await browser.close();
}
main();

I wrote the above code to bypass the Runtime.enable for Headless Mode, but this is not an optimal solution; it is a temporary one. Furthermore, it will disable certain features. I added isTrusted to the script to solve captchas, and I also performed code checks. If you try it in different environments, test it with a debugger before using it because if there is an error in the code, it won't work. The previous code didn't work for this reason. And also, when performing remote debugging, the captcha solve won't work because the devtool opens, and this can be detected as a bot because of that.

@zfcsoftware
Copy link
Owner

const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
function targetFilter(target) {
  const session = target._session();
  if (session) {
    session.send = new Proxy(session.send, {
      apply(target, thisArg, args) {
        if ("Runtime.enable" === args[0]) {
          return Promise.resolve();
        } else {
          const result = Reflect.apply(target, thisArg, args);
          return result;
        }
      },
    });
  }
  return true;
}
const script = `<script>
Element.prototype._addEventListener = Element.prototype.addEventListener;
Element.prototype.addEventListener = function () {
    let args = [...arguments]
    let temp = args[1];
    args[1] = function () {
        let args2 = [...arguments];
        args2[0] = Object.assign({}, args2[0])
        args2[0].isTrusted = true;
        return temp(...args2);
    }
    return this._addEventListener(...args);
}
const targetSelector = 'input[type=checkbox]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        if (addedNode.nodeType === addedNode.ELEMENT_NODE) {
        const node = addedNode?.querySelector(targetSelector);
        if (node) {          
          setTimeout(()=>{
            node.parentElement.click();
          },1000);
        }
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true
};
observer.observe(targetElement, observerOptions);
//document.querySelector('script').remove();
</script>`;
async function main() {
  const browser = await puppeteer.launch({
    ignoreDefaultArgs: ["--enable-automation"],
    args: ["--disable-blink-features=AutomationControlled"],
    defaultViewport: null,
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
    headless: "new",
    debuggingPort: 9222,
    targetFilter,
  });
  let page = (await browser.pages())[0];
  await page.setViewport({
    width: 1920,
    height: 1080,
  });
  await page.setUserAgent(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
  );

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
        body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });

  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });

  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.waitForNavigation();
  await page.screenshot({ path: "example.png" });
  console.log("Closing browser");
  await browser.close();
}
main();

I wrote the above code to bypass the Runtime.enable for Headless Mode, but this is not an optimal solution; it is a temporary one. Furthermore, it will disable certain features. I added isTrusted to the script to solve captchas, and I also performed code checks. If you try it in different environments, test it with a debugger before using it because if there is an error in the code, it won't work. The previous code didn't work for this reason. And also, when performing remote debugging, the captcha solve won't work because the devtool opens, and this can be detected as a bot because of that.

This is how the error occurs. It can take minutes to pass a captcha.
Ekran Görüntüsü - 2024-05-11 02-23-28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants