D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions
D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions
No Thumbnail Available
Date
2012
Authors
Alípio Jorge
Nuno Escudeiro
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In some classification tasks, such as those related
to the automatic building and maintenance of text corpora,
it is expensive to obtain labeled instances to train a clas-
sifier. In such circumstances it is common to have mas-
sive corpora where a few instances are labeled (typically
a minority) while others are not. Semi-supervised learning
techniques try to leverage the intrinsic information in unla-
beled instances to improve classification models. However,
these techniques assume that the labeled instances cover all
the classes to learn which might not be the case. More-
over, when in the presence of an imbalanced class distribution, getting labeled instances from minority classes might
be very costly, requiring extensive labeling, if queries are
randomly selected. Active learning allows asking an oracle to label new instances, which are selected by criteria, aiming to reduce the labeling effort. D-Confidence is
an active learning approach that is effective when in pres-
enc