Breeding: lucene - Elastic search analyzers -

Saturday, 15 March 2014

lucene - Elastic search analyzers -

i want specify per field analyzers in elasticsearch. fields need keyword analyzer, while 1 need custom number analyzer remove non-digit characters (see number_analyzer in below code)

the request creates index is

{   "settings": {         "analysis" : {             "analyzer" : {                 "number_analyzer" : {                    "type": "custom",                    "tokenizer": "keyword",                    "filter": ["lowercase"],                    "char_filter": ["number_filter"]                 }             },             "char_filter" : {                  "number_filter" : {                      "type": "pattern_replace",                      "pattern": "[\\d]+",                      "replacement": ""                  }             }         }     } }

the mapping fields is

{         "properties": {             "field1": {                 "type": "string",                 "store": "yes",                 "index": "analyzed",                 "analyzer": "number_analyzer"             },             "field2": {                 "type": "string",                 "store": "yes",                 "index": "not_analyzed",                 "analyzer": "keyword"             },             "field3": {                 "type": "string",                 "store": "true",                 "index": "not_analyzed"             },             "field4": {                 "type": "string",                 "store": "yes",                 "index": "analyzed"             },             "field5": {                 "type": "string",                 "store": "yes",                 "index": "analyzed",                 "analyzer": "number_analyzer"             }         } }

when insert next document in index

{     "field1" : "464533ab",     "field2" : "euro",     "field3" : "this title",     "field4": "deed_type",     "field5":"test3" }

i notice characters field1 not removed (my goal maintain 464533 only) , able results query field4:deed_type, although shouldn't (i think standard analyzer remove special character , perform lowercase, i'd expect field4:deed_type work keyword analyzer).

is there error in way analyzers specified in above code?

generally, same analysis rules applied @ query time, applied @ index time. when search for:

field4:"deed_type"

that query analyzed, , become:

field4:"deed type"

similarly, analysis not impact stored version of field, believe referring in field1. stored version of field, is, version retrieved index search result. if letters removed in analysis, reflected in how able search data. if want alter stored representation of field, should done pre-processing, before hitting lucene analysis. analyzers not tool used that.

your number_filter wrong though, have backwards. should be:

"number_filter" : { "type": "pattern_replace", "pattern": "[^\\d]+", "replacement": "" }

[\\d]+ matches digits. description, want remove digits, [^\\d]+

lucene elasticsearch

Breeding

Saturday, 15 March 2014

lucene - Elastic search analyzers -

No comments:

Post a Comment