Scaling up the learning-from-crowds GLAD algorithm using instance-difficulty clustering
González Rodrigo, Enrique
Aledo, Juan A.
MetadataShow full item record
The main goal of this article is to improve the results obtained by the GLAD algorithm in cases with large data. This algorithm is able to learn from instances labeled by multiple annotators taking into account both the quality of the annotators and the difficulty of the instances. Despite its many advantages, this study shows that GLAD does not scale well when dealing with large number of instances, as it estimates one parameter per instance of the dataset. Clustering is an alternative to reduce the number of parameters to be estimated, making the learning process more efficient. However, as the features of crowdsourced datasets are not usually available, classical clustering procedures can not be applied directly. To solve this issue, we propose using clustering from vectors created by matrix factorization. Our analysis shows that this clustering process improves the results obtained by GLAD both regarding accuracy and execution time, especially in large data scenarios. We also compare this approach against other algorithms with a similar goal.