Tuesday, 15 June 2010

nest - elasticsearch ngram analyzer/tokenizer not working? -



nest - elasticsearch ngram analyzer/tokenizer not working? -

it seems ngram tokenizer isn't working or perhaps understanding/use of isn't correct.

my tokenizer doing mingram of 3 , maxgram of 5. i'm looking term 'madonna' in documents under artists.name. can find term other techniques (using simple analyzer , related), not using ngram.

what i'm trying accomplish using ngram find names , accounting misspellings.

please see shortened version of mappings, settings, , query, , if have ideas, please allow me know - it's driving me nuts!

settings...

{ "myindex": { "settings": { "index": { "analysis": { "analyzer": { "ngramanalyzer": { "type": "custom", "filter": [ "lowercase" ], "tokenizer": "ngramtokenizer" } }, "tokenizer": { "ngramtokenizer": { "type": "ngram", "min_gram": "3", "max_gram": "5" } } }, "number_of_shards": "5", "number_of_replicas": "1", "version": { "created": "1020199" }, "uuid": "60ggsr6treadtitkanuagg" } } } }

mappings ...

{ "myindex": { "mappings": { "mytype": { "properties": { "artists.name": { "type": "string", "analyzer": "simple", "fields": { "ngram": { "type": "string", "analyzer": "ngramanalyzer" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } }

query ...

{"query": {"match": {"artists.name.ngram": "madonna"}}}

document ...

{ "_index": "myindex", "_type": "mytype", "_id": "602537592951", "_version": 1, "found": true, "_source": { "artists": [ { "name": "madonna", "id": "p 64565" } ] } }

edit incidentally, query works (without ngram):

{"query": {"match": {"artists.name": "madonna"}}}

this has nested object here. i'm apparently not applying ngram nested object properly.

ideas?

ok - figured out. hope helps b/c drove me crazy.

here's mapping turned out like:

{ "myindex": { "mappings": { "mytype": { "properties": { "artists": { "properties": { "id": { "type": "string" }, "name": { "type": "string", "analyzer": "ngramanalyzer", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } } } } } } } }

and here's how did using nest syntax...

first had sub type (class) called person has name , id looks (poco)...

[serializable] public class person { public string name { get; set; } [elasticproperty(analyzer = "fullterm", index = fieldindexoption.not_analyzed)] public string id { get; set; } }

and mapping went ...

.addmapping<myindex>(m => m .mapfromattributes() .properties(props => { props .object<person>(x => x.name("artists") .properties(pp => pp .multifield( mf => mf .name(s => s.name) .fields(f => f .string(s => s.name(o => o.name).analyzer("ngramanalyzer")) .string(s => s.name(o => o.name.suffix("raw")).index(fieldindexoption.not_analyzed)) ) ) ) ) )

note: object here indicates it's object beneath type 'artists'.

thanks, me!!!

elasticsearch nest

No comments:

Post a Comment