Challenges in Learning from Streaming Data Extended Abstract

João Gama

Challenges in Learning from Streaming Data Extended Abstract

Files

P-009-RFG.pdf (78.84 KB)

Date

2015

Authors

João Gama

Abstract

Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory. © Springer International Publishing Switzerland 2015.

URI

http://repositorio.inesctec.pt/handle/123456789/5372
http://dx.doi.org/10.1007/978-3-319-09879-1_1

Collections

LIAAD - Indexed Articles in Conferences

Full item page