lda - MALLET Ranking of Words in a topic -
i relatively new mallet , need know: - words in each topic mallet produces rank ordered in way? - if so, ordering (i.e.) 1st in topic list 1 highest distribution across corpus?
thanks!
they ranked based on probabilities training, i.e. first word probable appear in topic, 2nd less probable, 3rd less , on.. these not straight related term frequencies although certainly words highest tfidf weights more probable. also, gibbs sampling has lot how words ranked in topics - due randomness in sampling can quite different probabilities words within topics. try, example, save model , retrain using --input-model alternative - topics much alike not same.
that said, if need see actual weights of terms in corpus unrelated lda, can utilize nltk in python check frequency distributions , sklearn tfidf more meaningful weight distributions.
lda topic-modeling mallet
No comments:
Post a Comment