MA-sLDAc: Multi-Annotator Supervised LDA for classification
MA-sLDAc is a C++ implementation of the supervised topic models with labels provided by multiple annotators with different levels of expertise, as proposed in:
- Rodrigues, F., Lourenço, M, Ribeiro, B, Pereira, F. Learning Supervised Topic Models for Classification and Regression from Crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.
- Rodrigues, F., Lourenço, M, Ribeiro, B, Pereira, F. Learning supervised topic models from crowds. The Third AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2015.
The code is based on the supervised LDA (sLDA) implementation by Chong Wang and David Blei (http://www.cs.cmu.edu/~chongw/slda/). Three different variants of the proposed model are provided:
- MA-sLDAc (mle): This implementation uses maximum likelihood estimates for the topics distributions (beta) and the annotators confusion matrices (pi);
- MA-sLDAc (smooth): This implementation places priors on beta and pi and performs approximate Bayesian inference;
- MA-sLDAc (svi): This implementation is similar to the “MA-sLDAc (smooth)”, but uses stochastic variational inference (svi).
For simplicity reasons, I recommend first-time users to start with “MA-sLDAc (mle)”, since this version has less parameters that need to be specified.
Sample data using the 20newsgroups dataset is provided here. See the readme file for a quick example on how to run MA-sLDA over this data.
Other datasets collected from Amazon Mechanical Turk are also provided below.
DOWNLOAD:
DATASETS:
- 20newsgroups (simulated annotators)
- Reuters (annotations from Amazon Mechanical Turk)
- LabelMe (annotations from Amazon Mechanical Turk)
CONTACT:
Please send questions and comments to rodr [at] dtu.dk