Thursday, 15 March 2012

lda - MALLET Ranking of Words in a topic -



lda - MALLET Ranking of Words in a topic -

i relatively new mallet , need know: - words in each topic mallet produces rank ordered in way? - if so, ordering (i.e.) 1st in topic list 1 highest distribution across corpus?

thanks!

they ranked based on probabilities training, i.e. first word probable appear in topic, 2nd less probable, 3rd less , on.. these not straight related term frequencies although certainly words highest tfidf weights more probable. also, gibbs sampling has lot how words ranked in topics - due randomness in sampling can quite different probabilities words within topics. try, example, save model , retrain using --input-model alternative - topics much alike not same.

that said, if need see actual weights of terms in corpus unrelated lda, can utilize nltk in python check frequency distributions , sklearn tfidf more meaningful weight distributions.

lda topic-modeling mallet

No comments:

Post a Comment