Contribuciones a problemas no estandar de clasificación supervisada
Abstract
Non-standard machine learning problems have gained great traction in the recent years. These problems arise from the need to train models over data that do not fit on the usual pattern expected by classic machine learning models, which have a non standard target, or possibly, both. Throughout this thesis, we have made contributions to two non-standard problems in the area of supervised classification.
First, we make contributions to the problem of learning from multiple annotators. The goal of this problem is to train a model from instances labelled by several annotators whose expertise may be unknown. In re- cent years several applications of this kind of data have appeared, and thus we reviewed these applications, the problems they had when using such data and the general interest in the matter. We then focused on two of the problems arising from this review, namely the scalability of the existing methods and the estimation of the difficulty of the instances. For the first problem, we have developed a software package with scalable implementations of the main algorithms in this area that improves upon the performance of all available packages for the problem. Regarding the second problem, we have proposed a scalable algorithm for estimating the difficulty of the instances, improving upon an existing algorithm for this problem.
Second, we have worked on the problem of label ranking, making several proposals. We have designed a family of methods based on mixtures of graphical models with a Näive Bayes structure, which are competitive with the state of the art and are, to our knowledge, the first methods based on probabilistic graphical models. We have also proposed new impurity measures that improve the efficiency and quality of decision tree based models, especially when learning from incomplete rankings. Lastly, we have designed an oblique tree model that shows significantly better results than the state of the art for cases with complete rankings as well as a moderate number of missing labels.
In summary, for each of the non-standard problems, we have made valuable contributions, both in the form of new approaches as well as useful analyses and tools to foster research in each area.