Friday, 15 January 2010

algorithm - Finding keywords in a set of small texts -



algorithm - Finding keywords in a set of small texts -

i have set of 2000 texts. goal find keywords across these texts understand subject of them, or mutual words , expressions.

i ideias of algorithms score words , identify when come together.

i have read other related questions here, i'm trying more , more info subject. ideas welcome. give thanks much!

--

i have extracted stopwords. after removing them have more 7000 words remaing; question how score these words , point can consider removing them list of keywords. also, how key expressions, find words come together.

you may want refer classical text on info retrieval. of algorithms utilize stop list remove commonly occurring words such "for" , "the", , then, extract base of operations or root word (change "seeing", "seen", "see", "sees" base of operations word "see"). remaining words form keywords of document , weighted things term frequency (how many times word occurs in document) , inverse document frequency (how unique word in describing content). can utilize weighted keywords document representation , utilize them retrieval.

algorithm keyword information-retrieval

No comments:

Post a Comment