abstract:Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one can derive a low dimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in latent semantic analysis.
In this paper, the defectslatentsemanticanalysis, probabilisticlatent semantic analysis using methods to construct thetext-the words ofco-occurrencematrix, usingthe emalgorithm to solve.