How to GET a URL with User-Agent and timeout through some Proxy in Ruby? -
how url if need through proxy, has have timeout of max n. seconds, , user-agent?
require 'nokogiri' require 'net/http' require 'rexml/document' def get_with_max_wait(param, proxy, timeout) url = "http://example.com/?p=#{param}" uri = uri.parse(url) proxy_uri = uri.parse(proxy) http = net::http.new(uri.host, 80, proxy_uri.host, proxy_uri.port) http.open_timeout = timeout http.read_timeout = timeout response = http.get(url) doc = nokogiri.parse(response.body) doc.css(".css .goes .here")[0].content.strip end
the code above gets url through proxy timeout, it's missing user-agent. how with user-agent?
you should utilize open-uri , set user agent parameter in open function .
below illustration setting user agent in variable , using parameter in open function .
require 'rubygems' require 'nokogiri' require 'open-uri' user_agent = "mozilla/5.0 (macintosh; intel mac os x 10_7_0) applewebkit/535.2 (khtml, gecko) chrome/15.0.854.0 safari/535.2" url = "http://www.somedomain.com/somepage/" @doc = nokogiri::html(open(url, 'proxy' => 'http://(ip_address):(port)', 'user-agent' => user_agent, 'read_timeout' => 10 ), nil, "utf-8")
there alternative set readtime out in openuri
you can review documentation of open uri in below link
open uri documentation
ruby-on-rails ruby url proxy get
No comments:
Post a Comment