Permission denied - connect(2) for "web.archive.org" port 443 #273

lacertrader · 2024-01-02T20:19:31Z

No works anymore, all time I try (for any site) I receive the same error:

C:/Ruby31-x64/lib/ruby/3.1.0/net/http.rb:1018:in `initialize': Nenhuma conexÒo p¶de ser feita porque a mßquina de destino as recusou ativamente. - connect(2) for "web.archive.org" port 443 (Errno::ECONNREFUSED)

jtwill98 · 2024-01-04T03:52:44Z

I think the archive.org site adding a limit on number of files to be downloaded in a time period. I can download maybe 6 or 7 files before I see the error. I added a sleep 60 into a script after downloading 6 files and it seems to be working ... slowly.. :-(

davidpfister · 2024-01-05T12:38:04Z

I get the same error every now and then. My two cents to this issue: why not retrying after a failure (i.e. ECONREFUSED) and putting a sleep only in this case?

rustam · 2024-01-05T14:03:51Z

got same error, solved by sleep 2 seconds every download.

setiawan-chandra · 2024-01-10T05:09:42Z

how to add sleep 2 seconds every download ? @rustam

davidpfister · 2024-01-10T08:19:34Z

@setiawan-chandra you can do it with a simple script. In powershell, I used

wayback_machine_downloader <website> -l -a --from 2023 > files.txt
Get-Content .\files.txt | ForEach-Object {Start-Sleep -Seconds 2; wayback_machine_downloader $_ --from 2023 -e}

The -l option in the first line get you the list of files.
Then, you simply download them one by one with a pause in between.

setiawan-chandra · 2024-01-10T13:54:11Z

i think it's not working @davidpfister
wayback_machine_downloader https://www.tes.com/ -l -a --from 20230303085651 > files.txt
Get-Content .\files.txt | ForEach-Object {Start-Sleep -Seconds 2; wayback_machine_downloader $_ --from 20230303085651 -e}
Getting snapshot pages.. found 10 snaphots to consider.

ForEach-Object: command not found
Get-Content: command not found
/home/jacky/.asdf/installs/ruby/3.3.0/lib/ruby/gems/3.3.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:64:in <top (required)>': invalid option: -} (OptionParser::InvalidOption) from /home/jacky/.asdf/installs/ruby/3.3.0/bin/wayback_machine_downloader:25:in load'
from /home/jacky/.asdf/installs/ruby/3.3.0/bin/wayback_machine_downloader:25:in <main>' /home/jacky/.asdf/installs/ruby/3.3.0/lib/ruby/gems/3.3.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader:64:in <top (required)>': invalid option: } (OptionParser::InvalidOption)
from /home/jacky/.asdf/installs/ruby/3.3.0/bin/wayback_machine_downloader:25:in load' from /home/jacky/.asdf/installs/ruby/3.3.0/bin/wayback_machine_downloader:25:in

'

davidpfister · 2024-01-10T14:28:40Z

@setiawan-chandra, do you have powershell? and which version?
Open your powershell terminal with elevated permissions and run the command one by one.
If that still does not work then you have an issue with your PS install. Anyway, the PS script is just one way to do it. You could do the same in bash or python, or else. Whatever you feel more confortable with.

setiawan-chandra · 2024-01-11T02:21:29Z

it's true if i running this command like this? @davidpfister

davidpfister · 2024-01-11T08:08:18Z

@setiawan-chandra if you are on windows, just start powershell from the start menu. Or start a powershell terminal rather than using it from WSL. If you still want to use it with your WSL shell, then you might need to import some PS modules. Try to do

Import-Module Microsoft.PowerShell.Management -Force -Verbose

that should get you the Get-... Cmdlet

rustam · 2024-01-11T08:30:53Z

@setiawan-chandra , just go lib directory for wayback_machine-downloader, then add sleep(3) to download_files method of wayback_machine_downloader.rb, my case line 213, currently changed that 2 secs to 3 secs

cd /var/lib/gems/3.0.0/gems/wayback_machine_downloader-2.3.1/lib

def download_files
start_time = Time.now
puts "Downloading #{@base_url} to #{backup_path} from Wayback Machine archives."
puts

if file_list_by_timestamp.count == 0
  puts "No files to download."
  puts "Possible reasons:"
  puts "\t* Site is not in Wayback Machine Archive."
  puts "\t* From timestamp too much in the future." if @from_timestamp and @from_timestamp != 0
  puts "\t* To timestamp too much in the past." if @to_timestamp and @to_timestamp != 0
  puts "\t* Only filter too restrictive (#{only_filter.to_s})" if @only_filter
  puts "\t* Exclude filter too wide (#{exclude_filter.to_s})" if @exclude_filter
  return
end

puts "#{file_list_by_timestamp.count} files to download:"

threads = []
@processed_file_count = 0
@threads_count = 1 unless @threads_count != 0
@threads_count.times do
  threads << Thread.new do
    until file_queue.empty?
      file_remote_info = file_queue.pop(true) rescue nil
      download_file(file_remote_info) if file_remote_info
      sleep(3)
    end
  end
end

jere-co · 2024-01-20T11:46:30Z

I also needed to add sleep to the get_all_snapshots_to_consider function:

 def get_all_snapshots_to_consider
    # Note: Passing a page index parameter allow us to get mor>
    # but from a less fresh index
    print "Getting snapshot pages"
    snapshot_list_to_consider = []
    snapshot_list_to_consider += get_raw_list_from_api(@base_u>
    print "."
    unless @exact_url
      @maximum_pages.times do |page_index|
        snapshot_list = get_raw_list_from_api(@base_url + '/*'>
        break if snapshot_list.empty?
        snapshot_list_to_consider += snapshot_list
        print "."
        sleep(3) # <--- Added this
      end
    end
    puts " found #{snapshot_list_to_consider.length} snaphots >
    puts
    snapshot_list_to_consider
  end

gingerbeardman · 2024-01-23T00:20:31Z

@rustam instead of putting sleep(3) the threads/file_queue loop, which will slow down every access even if the files have been downloaded, put it in the download_file function just before the else file exists.

between line 295 and 296, means it will pause only when doing some real downloading. so you can run it on an existing directory and it will be very fast and pause whilst downloading only new files.

wayback-machine-downloader/lib/wayback_machine_downloader.rb

Lines 292 to 301 in 653b94b

    
             semaphore.synchronize do 
        
               @processed_file_count += 1 
        
               puts "#{file_url} -> #{file_path} (#{@processed_file_count}/#{file_list_by_timestamp.size})" 
        
             end 
        
           else 
        
             semaphore.synchronize do 
        
               @processed_file_count += 1 
        
               puts "#{file_url} # #{file_path} already exists. (#{@processed_file_count}/#{file_list_by_timestamp.size})" 
        
             end 
        
           end

should look like:

   semaphore.synchronize do 
     @processed_file_count += 1 
     puts "#{file_url} -> #{file_path} (#{@processed_file_count}/#{file_list_by_timestamp.size})" 
   end
   sleep(3) 
 else 
   semaphore.synchronize do 
     @processed_file_count += 1 
     puts "#{file_url} # #{file_path} already exists. (#{@processed_file_count}/#{file_list_by_timestamp.size})" 
   end 
 end

The only new line is the sleep(3)

lacertrader · 2024-01-29T02:30:30Z

The problem is the download not even starts, I run the line with the command to download a page and after some seconds of the search the error appears.

gingerbeardman · 2024-01-29T15:04:33Z

The problem is the download not even starts, I run the line with the command to download a page and after some seconds of the search the error appears.

@lacertrader that's a different problem, seems specific to you. is the error the same? did you try different network connection? disable vpn, etc? might be worth starting a new issue.

gingerbeardman · 2024-04-20T16:07:34Z

Best solution is to use the updated files in PR #280

rustam mentioned this issue Jan 14, 2024

Doesn't properly work anymore #275

Open

jere-co mentioned this issue Jan 25, 2024

Error while "Getting snapshot pages..." #277

Open

ShiftaDeband linked a pull request Feb 8, 2024 that will close this issue

Implement Net::HTTP to resolve rate limiting #280

Open

tinyapps mentioned this issue Mar 7, 2024

DO NOT USE unless you have a means of rate limiting yourself #281

Open

gingerbeardman mentioned this issue Apr 20, 2024

# Failed to open TCP connection to web.archive.org:443 (Connection refused - connect(2) for "web.archive.org" port 443 #285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permission denied - connect(2) for "web.archive.org" port 443 #273

Permission denied - connect(2) for "web.archive.org" port 443 #273

lacertrader commented Jan 2, 2024 •

edited

jtwill98 commented Jan 4, 2024

davidpfister commented Jan 5, 2024

rustam commented Jan 5, 2024

setiawan-chandra commented Jan 10, 2024

davidpfister commented Jan 10, 2024

setiawan-chandra commented Jan 10, 2024 •

edited

davidpfister commented Jan 10, 2024

setiawan-chandra commented Jan 11, 2024

davidpfister commented Jan 11, 2024

rustam commented Jan 11, 2024

jere-co commented Jan 20, 2024

gingerbeardman commented Jan 23, 2024 •

edited

lacertrader commented Jan 29, 2024

gingerbeardman commented Jan 29, 2024 •

edited

gingerbeardman commented Apr 20, 2024

Permission denied - connect(2) for "web.archive.org" port 443 #273

Permission denied - connect(2) for "web.archive.org" port 443 #273

Comments

lacertrader commented Jan 2, 2024 • edited

jtwill98 commented Jan 4, 2024

davidpfister commented Jan 5, 2024

rustam commented Jan 5, 2024

setiawan-chandra commented Jan 10, 2024

davidpfister commented Jan 10, 2024

setiawan-chandra commented Jan 10, 2024 • edited

davidpfister commented Jan 10, 2024

setiawan-chandra commented Jan 11, 2024

davidpfister commented Jan 11, 2024

rustam commented Jan 11, 2024

jere-co commented Jan 20, 2024

gingerbeardman commented Jan 23, 2024 • edited

lacertrader commented Jan 29, 2024

gingerbeardman commented Jan 29, 2024 • edited

gingerbeardman commented Apr 20, 2024

lacertrader commented Jan 2, 2024 •

edited

setiawan-chandra commented Jan 10, 2024 •

edited

gingerbeardman commented Jan 23, 2024 •

edited

gingerbeardman commented Jan 29, 2024 •

edited