web services - Google Website Scraping getting blocked after few requests -
we developing simple application makes phone call 1 of google's services (reverse image search http://www.google.com/insidesearch/features/images/searchbyimage.html uploading images url/image , getting entity name image). essentially, getting results page (as html) google returned , scraping results using simple parser.
we hosted on google app engine , found after while google blocked our app (identified ip) , send out message saying prevent bots sending requests websites. below message found in web server's logs:
this page appears when google automatically detects requests coming computer network appear in violation of http://www.google.com/policies/terms/">terms of service. block expire shortly after requests stop. in meantime, solving above captcha allow go on utilize our services.this traffic may have been sent malicious software, browser plug-in, or script sends automated requests. if share network connection, inquire administrator help — different computer using same ip address may responsible. http://support.google.com/websearch/answer/86640">learn moresometimes may asked solve captcha if using advanced terms robots known use, or sending requests quickly.
i wanted check if there way solve or workaround, etc. since google doesn't expose reverse image search api's, not see other way (other creating http request , scraping response) info want.
any leads helpful.
if in violation of terms of service, that's that. "workaround" inappropriate.
this service same , has api can legitimately use: http://services.tineye.com/tineyeapi
what tineye api? tineye reverse image search engine. can submit image tineye find out came from, how beingness used or if modified versions of image exist. tineye uses image recognition perform searches. tineye api allows user search multi billion tineye image index automatically.
web-services google-app-engine google-apps
No comments:
Post a Comment