Source code for Multi-Annotator Supervised LDA for regression (MA-sLDAr) released

MA-sLDAr is a C++ implementation of the supervised topic models with responses/target variables provided by multiple annotators with different levels of expertise, as proposed in:

Rodrigues, F., Lourenço, M, Ribeiro, B, Pereira, F. Learning Supervised Topic Models for Classification and Regression from Crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2017.

For more details click here.

Software

Source code for Multi-Annotator Supervised LDA for classification (MA-sLDAc) released

MA-sLDAc is a C++ implementation of the supervised topic models with labels provided by multiple annotators with different levels of expertise, as proposed in:

For more details click here.

Software

Julia code for LogReg-Crowds released

LogReg-Crowds is a collection of Julia implementations of various approaches for learning a logistic regression model multiple annotators and crowds, namely the works of:

Rodrigues, F., Pereira, F., and Ribeiro, B. Learning from multiple annotators: distinguishing good from random labelers. Pattern Recognition Letters, pp. 1428–1436, 2013.

Raykar, V., Yu, S., Zhao, L., Valadez, G., Florin, C., Bogoni, L., and Moy, L. Learning from Crowds. Journal of Machine Learning Research, pp. 1297– 1322, 2010.

Dawid, A. P. and Skene, A. M. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society. Series C, 28(1):20–28, 1979.

All implementations are able to handle multi-class problems and do not require repeated labelling (i.e. annotators do not have to provide labels for the entire dataset). The code was though for interpretability and it is well commented, so that it can be very easy to use (kindly see the file “demo.jl”). At the same, the Julia language provides it with a great perfomance, specially when compared to other scientific languages such as MATLAB or Python/Numpy, without compromising its high-level and interpretability.

The tar.gz with the source code can be obtained here.

Software

Source code for GPC-MA released

GPC-MA builds on top of the popular GPML Matlab toolkit for Gaussian processes by giving it the support to handle data from multiple annotators and Crowds, thereby allowing the estimation of the reliability of the different annotators as well as finding better estimates of the (unobserved) ground truth labels when compared to standard GP classification or majority-voting-based approaches. See the original paper for further details:

Rodrigues, F. and Pereira, F.C. and Ribeiro, B., Gaussian Process Classification and Active Learning with Multiple Annotators, in proceedings of the International Conference on Machine Learning (ICML), 2014.

The tar.gz with the source code can be obtained here.

The datasets (from Amazon’s Mechanical Turk) used in the paper are also available here for download.

Software

Source code for CRF-MA released

CRF-MA is an extension of the Java implementation of Conditional Random Fields (CRFs) available in the Mallet toolbox in order to handle multiple annotators. CRF-MA uses the Expectation-Maximization algorithm to jointly learn the CRF model parameters, the relia- bility of the annotators and the estimated ground truth. When it comes to performance, the proposed method (CRF-MA) significantly outperforms typical approaches such as majority voting. See the original paper for further details:

Rodrigues, F. and Pereira, F.C. and Ribeiro, B., Sequence labeling with multiple annotators, Machine Learning, Springer, 2013.

Download here.

Software

Source code for MA-LR released

MA-LR is a Python implementation of the multiple-annotator logistic regression model proposed in:

Rodrigues, F. and Pereira, F.C. and Ribeiro, B. , Learning from Multiple Annotators: Distinguishing Good from Random Labelers, Pattern Recognition Letters, 2013.

Download here.

Software