machine learning - document similarity with documents using synonyms -
i have bunch of documents of documents re-create of other documents text jumbled , of words replaced synonyms. mentioned below 1 such illustration of sentence:
article 1 (original) : caught john snow in town making purchases @ kingslanding hardware store repair broken tractor. snow has farmed soybeans entire life, did father , fathers. asked him life on farm.
article 2 (duplicate) : obtained john snow in city in purchases create rising of hardware @ kingslanding repair broken motor tractor. snow have soya broad beans finish life have been treated, such father , fathers. asked him concerning life on agriculture company.
article 3 (duplicate) : took above john snow in city made purchases in warehouse of hardware of kingslanding repair broken tractor. snow has cultivated soybeans whole life, father , parents. asked him life in farm.
article 4 (duplicate) : caught myself compared john snow downtown making of purchases kingslanding store of material repair broken tractor. snow cultivated soya life whole, his/her father , fathers. questioned life farm.
i want document similarity ends tagging these documents in same group. suggestions along examples or tutorials appreciated.
it seems textbook case of locality sensitive hashing. check out this thread
machine-learning nlp scikit-learn stanford-nlp information-retrieval
No comments:
Post a Comment