Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are forms like this supported? #337

Open
Erudition opened this issue Mar 20, 2024 · 4 comments
Open

Are forms like this supported? #337

Erudition opened this issue Mar 20, 2024 · 4 comments

Comments

@Erudition
Copy link

https://utilitiesinfo.conservice.com/

On this site, you can see an input field with name "Username" and one with "Password", but they're not in a traditional <form> element. I targeted ".login-box" as the form selector, but I just get:

2024-03-20 02:22:12.345 INFO (MainThread) [custom_components.multiscrape.button] Multiscrape triggered by button
2024-03-20 02:22:12.348 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Conservice Account History # New run: start (re)loading data from resource
2024-03-20 02:22:12.349 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Conservice Account History # Deleting logging files from previous run
2024-03-20 02:22:12.364 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Conservice Account History # Rendered resource template into: https://utilitiesinfo.conservice.com/Tenant/AccountHistory
2024-03-20 02:22:12.364 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Starting with form-submit
2024-03-20 02:22:12.365 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Requesting page with form from: https://utilitiesinfo.conservice.com/
2024-03-20 02:22:12.365 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # Executing form_page-request with a GET to url: https://utilitiesinfo.conservice.com/ with headers: {}
2024-03-20 02:22:12.371 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # request_headers written to file: form_page_request_headers.txt
2024-03-20 02:22:12.377 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # request_body written to file: form_page_request_body.txt
2024-03-20 02:22:12.630 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # Response status code received: 200
2024-03-20 02:22:12.633 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # response_headers written to file: form_page_response_headers.txt
2024-03-20 02:22:12.638 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # response_body written to file: form_page_response_body.txt
2024-03-20 02:22:12.638 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Parse page with form with BeautifulSoup parser html.parser
2024-03-20 02:22:12.714 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # The page with the form parsed by BeautifulSoup has been written to file: form_page_soup.txt
2024-03-20 02:22:12.715 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Try to find form with selector .login-box
2024-03-20 02:22:12.718 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Form looks like this: 
<div class="login-box">
<div class="row powered-by-image-container">
<img alt="Powered by Conservice" class="powered-by-conservice-img" src="/Images/Logos/ConserviceLogoWhiteColor.png"/>
</div>
<div class="row text-center margin-top-1rem">
<input autocomplete="off" class="form-control" id="Username" name="Username" placeholder="Username" style="height:35px; display: inline;" type="text"/>
</div>
<div class="row text-center margin-top-1rem">
<input autocomplete="off" class="form-control" id="Password" name="Password" placeholder="Password" style="height:35px; display: inline;" type="Password"/>
</div>
<div class="row text-center showPassword" onclick="togglePasswordVisibility()">
<i class="fa fa-eye" id="toggleIcon"></i> Show/Hide Password
                        </div>
<div class="row text-center">
<div id="errorMessageDiv" style="display:none;">
<p class="alert alert-danger" id="errorMessage" role="alert" style="margin: 1rem 1.5rem 0 1.5rem; padding:10px;"></p>
</div>
</div>
<div class="row text-center margin-top-2rem" id="membership-submit">
<button class="btn btn-success g-recaptcha width-50-percent" data-callback="onSubmit" data-sitekey="6LfVygoaAAAAAFtMGUR7bEniEKPB5lqjTAQZ3eDp" id="sign-in">SIGN IN</button>
</div>
<div class="row text-center margin-top-1rem">
<button class="loginLinkBtn" data-target="#firstTimeLoginModal" data-toggle="modal">First Time Logging In?</button>
<button class="loginLinkBtn" onclick="location.href='/Login/ForgotUsernameRequest';">Recover Username</button>
<button class="loginLinkBtn" onclick="location.href='/ForgotPassword/ForgotPasswordRequest';">Forgot Password?</button>
</div>
<div><h5 class="alert alert-warning"><span><strong>Notice:</strong></span> This site will undergo planned maintenance and be unavailable on Sunday, March 24th from 8:00 PM to 9:00 PM Central Time. We thank you for your patience and apologize for any inconvenience.</h5></div>
<!-- Start of conservice Zendesk Widget script -->
<script id="ze-snippet" src="https://static.zdassets.com/ekr/snippet.js?key=2cc20aee-1cf3-465f-a68a-4034f2428d2d"></script>
<!-- End of conservice Zendesk Widget script -->
</div>
2024-03-20 02:22:12.723 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Finding all input fields in form
2024-03-20 02:22:12.723 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Found the following input fields: {'Username': None, 'Password': None}
2024-03-20 02:22:12.724 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Found form action None and method None
2024-03-20 02:22:12.724 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Merged input fields with input data in config. Result: {'Username': 'MYNAME', 'Password': 'MYPASSWORD'}
2024-03-20 02:22:12.724 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Determined the url to submit the form to: https://utilitiesinfo.conservice.com/
2024-03-20 02:22:12.724 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Submitting the form
2024-03-20 02:22:12.724 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # Executing form_submit-request with a POST to url: https://utilitiesinfo.conservice.com/ with headers: {}
2024-03-20 02:22:12.729 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # request_headers written to file: form_submit_request_headers.txt
2024-03-20 02:22:12.733 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # request_body written to file: form_submit_request_body.txt
2024-03-20 02:22:12.802 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # Response status code received: 200
2024-03-20 02:22:12.807 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # response_headers written to file: form_submit_response_headers.txt
2024-03-20 02:22:12.812 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # response_body written to file: form_submit_response_body.txt
2024-03-20 02:22:12.812 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Form seems to be submitted succesfully (to be sure, use log_response and check file). Now continuing to retrieve target page.
2024-03-20 02:22:12.812 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Conservice Account History # Request data from https://utilitiesinfo.conservice.com/Tenant/AccountHistory
2024-03-20 02:22:12.813 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # Executing page-request with a get to url: https://utilitiesinfo.conservice.com/Tenant/AccountHistory with headers: {}
2024-03-20 02:22:12.819 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # request_headers written to file: page_request_headers.txt
2024-03-20 02:22:12.825 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # request_body written to file: page_request_body.txt
2024-03-20 02:22:12.957 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # Response status code received: 200
2024-03-20 02:22:12.961 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # response_headers written to file: page_response_headers.txt
2024-03-20 02:22:12.965 DEBUG (MainThread) [custom_components.multiscrape.http] Conservice Account History # response_body written to file: page_response_body.txt
2024-03-20 02:22:12.965 DEBUG (MainThread) [custom_components.multiscrape.scraper] Conservice Account History # Loading the content in BeautifulSoup.
2024-03-20 02:22:13.045 DEBUG (MainThread) [custom_components.multiscrape.scraper] Conservice Account History # page_soup written to file: page_soup.txt
2024-03-20 02:22:13.045 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Conservice Account History # Data succesfully refreshed. Sensors will now start scraping to update.
2024-03-20 02:22:13.045 DEBUG (MainThread) [custom_components.multiscrape.coordinator] Finished fetching multiscrape data in 0.697 seconds (success: True)
2024-03-20 02:22:13.046 DEBUG (MainThread) [custom_components.multiscrape.sensor] Conservice Account History # Latest Electricity Charge Amount # Start scraping to update sensor
2024-03-20 02:22:13.049 DEBUG (MainThread) [custom_components.multiscrape.scraper] Conservice Account History # Latest Electricity Charge Amount # Tag selected: None
2024-03-20 02:22:13.050 DEBUG (MainThread) [custom_components.multiscrape.form] Conservice Account History # Exception occurred while scraping, will try to resubmit the form next interval.
2024-03-20 02:22:13.051 ERROR (MainThread) [custom_components.multiscrape.sensor] Conservice Account History # Latest Electricity Charge Amount # Unable to scrape data: Could not find a tag for given selector 
Consider using debug logging and log_response for further investigation.
2024-03-20 02:22:13.051 DEBUG (MainThread) [custom_components.multiscrape.sensor] Conservice Account History # Latest Electricity Charge Amount # On-error, set value to None
2024-03-20 02:22:13.052 DEBUG (MainThread) [custom_components.multiscrape.entity] Conservice Account History # Latest Electricity Charge Amount # Sensor updated and state written to HA

Based on the log files, after form submit the page I get back is the same page with login.
with this config:

multiscrape:
  - name: Conservice Account History
    resource: https://utilitiesinfo.conservice.com/Tenant/AccountHistory
    scan_interval: 360000
    authentication: basic
    log_response: true
    parser: html.parser
    button:
      - unique_id: refresh_conservice
        name: "Refresh Conservice Data"
    form_submit:
      resource: https://utilitiesinfo.conservice.com/
      submit_once: True
      select: ".login-box"
      input:
        Username: MYNAME
        Password: MYPASSWORD
    sensor:
      - unique_id: conservice_latest_charge_amount
        name: Latest Electricity Charge Amount
        select: "td"

It looks like the page may be submitting the form via javascript, and may make use of an invisible google captcha. Is this too hard for multiscrape?

@danieldotnl
Copy link
Owner

Interesting, new saw this before. But indeed, this will not work with multiscrape.
If it was just the form, you might have worked around it by omitting the select in form_submit and still provide the input values. Multiscrape will then just submit without first scraping the form.
Anyway, the captcha is killing...

@Erudition
Copy link
Author

Is it definitely the captcha? I don't see any failed captcha message on the result page (html saved after form submit)

@danieldotnl
Copy link
Owner

It receives some kind of token from the captcha and includes that in subsequent requests. I assume that gets validated, but I don't have valid credentials, so I can't see anything more than this.

@Erudition
Copy link
Author

Would providing you with valid credentials be helpful? Or is logging in to this portal a lost cause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants