Why is curl in Ruby slower than command-line curl? Why is curl in Ruby slower than command-line curl? curl curl

Why is curl in Ruby slower than command-line curl?


This could be a fitting task for Typhoeus

Something like this (untested):

require 'typhoeus'def write_file(filename, data)    file = File.new(filename, "wb")    file.write(data)    file.close      # ... some other stuffendhydra = Typhoeus::Hydra.new(:max_concurrency => 20)batch_urls.each do |url_info|    req = Typhoeus::Request.new(url_info[:url])    req.on_complete do |response|      write_file(url_info[:file], response.body)    end    hydra.queue reqendhydra.run

Come to think of it, you might get a memory problem because of the enormous amout of files. One way to prevent that would be to never store the data in a variable but instead stream it to the file directly. You could use em-http-request for that.

EventMachine.run {  http = EventMachine::HttpRequest.new('http://www.website.com/').get  http.stream { |chunk| print chunk }  # ...}


So, if you don't set a on_body handler than curb will buffer the download. If you're downloading files you should use an on_body handler. If you want to download multiple files using Ruby Curl, try the Curl::Multi.download interface.

require 'rubygems'require 'curb'urls_to_download = [  'http://www.google.com/',  'http://www.yahoo.com/',  'http://www.cnn.com/',  'http://www.espn.com/']path_to_files = [  'google.com.html',  'yahoo.com.html',  'cnn.com.html',  'espn.com.html']Curl::Multi.download(urls_to_download, {:follow_location => true}, {}, path_to_files) {|c,p|}

If you want to just download a single file.

Curl::Easy.download('http://www.yahoo.com/')

Here is a good resource: http://gist.github.com/405779


There's been benchmarks done that has compared curb with other methods such as HTTPClient. The winner, in almost all categories was HTTPClient. Plus, there have been some documented scenarios where curb does NOT work in multi-threading scenarios.

Like you, I've had your experience. I ran system commands of curl in 20+ concurrent threads and it was 10 X fasters than running curb in 20+ concurrent threads. No matter, what I tried, this was always the case.

I've since then switched to HTTPClient, and the difference is huge. Now it runs as fast as 20 concurrent curl system commands, and uses less CPU as well.