Tuesday, 15 March 2011

java - ElasticSearch- How to query one result from 30 million documents quickly -



java - ElasticSearch- How to query one result from 30 million documents quickly -

now situation want search 3 1000000 times elasticsearch in short time. test set 1 es cluster 4 cores cpu , 16g memory.and take 8 hours. query utilize is:

xxx/type/_search { "query": { "match": { "poiname": { "query": "xxxxx" , "operator": "or" } } } }

and utilize java http request query elasticsearch hadoop.

url url = new url(searchurl); con = (httpurlconnection) url.openconnection(); con.setdooutput(true); con.setdoinput(true); outputstreamwriter wr= new outputstreamwriter(con.getoutputstream()); string query = getqueryjson(field,value); wr.write(query); wr.flush(); int httpresult =con.getresponsecode(); if(httpresult ==httpurlconnection.http_ok){ bufferedreader br = new bufferedreader(new inputstreamreader(con.getinputstream(),"utf-8")); string line = null; while ((line = br.readline()) != null) { sb.append(line + "\n"); } br.close(); }

in fact,we need 1 result response.how can improve this?

===================update===============================

for task :

the document {"doc_name":"an foo eoo","name_id:123456","other filed":"value"}.

we query "ann foo eoo" es name_id, donot need hits.

we query 3 1000000 different doc_name elasticsearch.

actually ,we need match result, , not care how much score is. attach terms query .the minimum_match depend on size of poiname.

(ps. minimum_match = math.ceil(terms size of poiname) /2 )

get xxx/type/_search { "query": { "terms": { "poiname": [ "an", "foo", "eoo" ], "minimum_match":2 } } }

java elasticsearch

No comments:

Post a Comment