Saturday, 15 May 2010

How to GET a URL with User-Agent and timeout through some Proxy in Ruby? -



How to GET a URL with User-Agent and timeout through some Proxy in Ruby? -

how url if need through proxy, has have timeout of max n. seconds, , user-agent?

require 'nokogiri' require 'net/http' require 'rexml/document' def get_with_max_wait(param, proxy, timeout) url = "http://example.com/?p=#{param}" uri = uri.parse(url) proxy_uri = uri.parse(proxy) http = net::http.new(uri.host, 80, proxy_uri.host, proxy_uri.port) http.open_timeout = timeout http.read_timeout = timeout response = http.get(url) doc = nokogiri.parse(response.body) doc.css(".css .goes .here")[0].content.strip end

the code above gets url through proxy timeout, it's missing user-agent. how with user-agent?

you should utilize open-uri , set user agent parameter in open function .

below illustration setting user agent in variable , using parameter in open function .

require 'rubygems' require 'nokogiri' require 'open-uri' user_agent = "mozilla/5.0 (macintosh; intel mac os x 10_7_0) applewebkit/535.2 (khtml, gecko) chrome/15.0.854.0 safari/535.2" url = "http://www.somedomain.com/somepage/" @doc = nokogiri::html(open(url, 'proxy' => 'http://(ip_address):(port)', 'user-agent' => user_agent, 'read_timeout' => 10 ), nil, "utf-8")

there alternative set readtime out in openuri

you can review documentation of open uri in below link

open uri documentation

ruby-on-rails ruby url proxy get

No comments:

Post a Comment