Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape Idox tabs #19

Open
4 tasks
adrianshort opened this issue Oct 2, 2018 · 2 comments
Open
4 tasks

Scrape Idox tabs #19

adrianshort opened this issue Oct 2, 2018 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@adrianshort
Copy link
Owner

adrianshort commented Oct 2, 2018

  • details tab (Further Information)
  • contacts tab
  • dates tab
  • bump minor version number

See also #20.

@adrianshort adrianshort added the enhancement New feature or request label Oct 2, 2018
@adrianshort adrianshort self-assigned this Oct 2, 2018
@KeithP
Copy link
Contributor

KeithP commented Oct 2, 2018

      # Idox requires a scrape per tab
      # "https://planningregister.sutton.gov.uk/online-applications/applicationDetails.do?keyVal=PC6337KC08T00&activeTab=summary"
      ret = DataFetch::DetailIdox.new.scrape( detail_page_link.gsub( 'summary','details' ) )
      sleep(10)
      ret.merge!( DataFetch::DetailIdox.new.scrape( detail_page_link.gsub( 'summary','contacts' ) ) )
      sleep(10)
      ret.merge!( DataFetch::DetailIdox.new.scrape( detail_page_link.gsub( 'summary','dates' ) ) )

@KeithP
Copy link
Contributor

KeithP commented Oct 3, 2018

the table content differs depending on whether an agent was involved, so hence this approach to check the 'header'

# DataFetch::DetailIdox

def scrape( url )
        agent = Mechanize.new
        page = agent.get url
        app_hash = {}
        heads = agent.page.search( "th" )
        cols = agent.page.search( "td" )
        heads.each_with_index do |head, index|
          # "details"
          ["Application Type","Expected Decision Level","Case Officer","Parish","Ward","District Reference",
           "Applicant Name","Agent Name","Agent Company Name","Agent Address","Agent Phone Number",
           "Environmental Assessment Requested"].each do |item|
            app_hash.merge!( item.parameterize.underscore.to_sym => cols[index].text ) if cols[index] && head.text == item
          end
          # "dates"
          ["Application Received Date","Application ValiDated Date","Actual Committee Date",
           "Neighbour Consultation Expiry Date","Statutory Expiry Date","Agreed Expiry Date",
           "Decision Issued Date","Permission Expiry Date","Temporary Permission Expiry Date"].each do |item|
            app_hash.merge! item.parameterize.underscore.to_sym => parse_date( cols[index].text ) if cols[index] && head.text.strip == item
          end
        end

        # Map cols to ours:
        key_map = {:case_officer=>:officer,
                           :neighbour_consultation_expiry_date=>:comments_close_at,
                           ...}
        ret = app_hash.map {|k, v| [key_map[k], v] }.to_h
        ret.except!(nil)
        ret
end

private

   # str eg "Mon 17 Sep 2018"
    def parse_date( str )
        return "" if str.blank?
        Time.zone.parse( str )
     end

@adrianshort adrianshort added this to the 1.0.0 milestone Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants