LIAAD - Other Publications
Permanent URI for this collection
Browse
Browsing LIAAD - Other Publications by Issue Date
Results Per Page
Sort Options
-
ItemDos Projectos às Regiões Digitais. Principais desafios.( 2008) Oliveira Manuel ; Maria Simões ; Domingos Santos ; Jan Wolf ; Ricardo Campos
-
ItemUbiquitous Knowledge Discovery( 2011) Michael May ; João Gama
-
ItemData Stream Clustering: A Survey( 2013) Silva,JA ; Faria,ER ; Barros,RC ; Hruschka,ER ; de Carvalho,ACPLF ; João GamaData stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.
-
ItemA biased random key genetic algorithm for 2D and 3D bin packing problems( 2013) José Fernando Gonçalves ; Resende,MGCIn this paper we present a novel biased random-key genetic algorithm (BRKGA) for 2D and 3D bin packing problems. The approach uses a maximal-space representation to manage the free spaces in the bins. The proposed algorithm hybridizes a novel placement procedure with a genetic algorithm based on random keys. The BRKGA is used to evolve the order in which the boxes are packed into the bins and the parameters used by the placement procedure. Two new placement heuristics are used to determine the bin and the free maximal space where each box is placed. A novel fitness function that improves significantly the solution quality is also developed. The new approach is extensively tested on 858 problem instances and compared with other approaches published in the literature. The computational experiment results demonstrate that the new approach consistently equals or outperforms the other approaches and the statistical analysis confirms that the approach is significantly better than all the other approaches.
-
ItemReal-time Augmented Reality shopping platform for studying consumer cognitive experiences( 2013) Stoyanova,J ; Goncalves,R ; António Coelho ; Pedro BritoAugmented Reality (AR) is a technology which produces a synthesis between a computer-generated data and the physical world of a viewer while establishing 3D registration and real time interaction. Among the wide range of application of AR, its use in advertising shopping experiences has recently been embraced by advertisers due to its novelty and engaging potential. Part of a wider research aiming at understanding the impact of AR on consumer psychology, this paper presents a demo platform application developed for a real-time shopping experience for shoes and attempts to define a ground base for posterior marketing research in the field. In order to fully evaluate consumer experiences and compare with the main AR platform two other shopping applications were designed: a marker-based and a static one. The platform will assist in exploring the antecedents of consumer purchase intention and in defining metrics for measuring shopping experiences with AR.
-
ItemConcave minimum cost network flow problems solved with a colony of ants( 2013) Monteiro,MSR ; Dalila Fontes ; Fontes,FACCIn this work we address the Single-Source Uncapacitated Minimum Cost Network Flow Problem with concave cost functions. This problem is NP-Hard, therefore we propose a hybrid heuristic to solve it. Our goal is not only to apply an ant colony optimization (ACO) algorithm to such a problem, but also to provide an insight on the behaviour of the parameters in the performance of the algorithm. The performance of the ACO algorithm is improved with the hybridization of a local search (LS) procedure. The core ACO procedure is used to mainly deal with the exploration of the search space, while the LS is incorporated to further cope with the exploitation of the best solutions found. The method we have developed has proven to be very efficient while solving both small and large size problem instances. The problems we have used to test the algorithm were previously solved by other authors using other population based heuristics. Our algorithm was able to improve upon some of their results in terms of solution quality, proving that the HACO algorithm is a very good alternative approach to solve these problems. In addition, our algorithm is substantially faster at achieving these improved solutions. Furthermore, the magnitude of the reduction of the computational requirements grows with problem size.
-
ItemPredicting Taxi-Passenger Demand Using Streaming Data( 2013) Luís Moreira Matias ; João Gama ; Michel Ferreira ; João Mendes Moreira ; Damas,LInformed driving is increasingly becoming a key feature for increasing the sustainability of taxi companies. The sensors that are installed in each vehicle are providing new opportunities for automatically discovering knowledge, which, in return, delivers information for real-time decision making. Intelligent transportation systems for taxi dispatching and for finding time-saving routes are already exploring these sensing data. This paper introduces a novel methodology for predicting the spatial distribution of taxi-passengers for a short-term time horizon using streaming data. First, the information was aggregated into a histogram time series. Then, three time-series forecasting techniques were combined to originate a prediction. Experimental tests were conducted using the online data that are transmitted by 441 vehicles of a fleet running in the city of Porto, Portugal. The results demonstrated that the proposed framework can provide effective insight into the spatiotemporal distribution of taxi-passenger demand for a 30-min horizon.
-
ItemRandom rules from data streams( 2013) Ezilda Duarte Almeida ; Kosina,P ; João GamaExisting works suggest that random inputs and random features produce good results in classification. In this paper we study the problem of generating random rule sets from data streams. One of the most interpretable and flexible models for data stream mining prediction tasks is the Very Fast Decision Rules learner (VFDR). In this work we extend the VFDR algorithm using random rules from data streams. The proposed algorithm generates several sets of rules. Each rule set is associated with a set of Natt attributes. The proposed algorithm maintains all properties required when learning from stationary data streams: online and any-time classification, processing each example once. Copyright 2013 ACM.
-
ItemA comparison of metaheuristic procedures to schedule jobs in a permutation flow shop to minimise total earliness and tardiness( 2013) Schaller,J ; Jorge ValenteThis paper considers the problem of scheduling jobs in a permutation flow shop with the objective of minimising total earliness and tardiness. A genetic algorithm is proposed for the problem. This procedure and five other procedures were tested on problem sets that varied in terms of number of jobs, machines and the tightness and range of due dates. It was found that the genetic algorithm procedure was consistently effective in generating good solutions relative to the other procedures.
-
ItemData stream mining: The bounded rationality( 2013) João GamaThe developments of information and communication technologies dramatically change the data collection and processing methods. Data mining is now moving to the era of bounded rationality. In this work we discuss the implications of the resource constraints impose by the data stream computational model in the design of learning algorithms. We analyze the behavior of stream mining algorithms and present future research directions including ubiquitous stream mining and self-adaption models.
-
ItemDynamics of human decisions( 2013) Renato Araújo Soeiro ; Mousa,A ; Oliveira,TR ; Alberto Pinto
-
ItemSpecial issue on "Cutting and Packing"( 2013) António Miguel Gomes ; José Fernando Gonçalves ; Alvarez Valdes,R ; de Carvalho,V
-
ItemWIPS: The WiSARD indoor positioning system( 2013) Cardoso,DO ; João Gama ; De Gregorio,M ; Franca,FMG ; Giordano,M ; Lima,PMVIn this paper, we present a WiSARD-based system facing the problem of Indoor Positioning (IP) by taking advantage of pervasively available infrastructures (WiFi Access Points -AP). The goal is to develop a system to be used to position users in indoor environments, such as: museums, malls, factories, offshore platforms etc. Based on the fingerprint approach, we show how the proposed weightless neural system provides very good results in terms of performance and positioning resolution. Both the approach to the problem and the system will be presented through two correlated experiments.
-
ItemOn Predicting the Taxi-Passenger Demand: A Real-Time Approach( 2013) Luís Moreira Matias ; João Gama ; Michel Ferreira ; João Mendes Moreira ; Damas,LInformed driving is becoming a key feature to increase the sustainability of taxi companies. Some recent works are exploring the data broadcasted by each vehicle to provide live information for decision making. In this paper, we propose a method to employ a learning model based on historical GPS data in a real-time environment. Our goal is to predict the spatiotemporal distribution of the Taxi-Passenger demand in a short time horizon. We did so by using learning concepts originally proposed to a well-known online algorithm: the perceptron [1]. The results were promising: we accomplished a satisfactory performance to output the next prediction using a short amount of resources.
-
ItemEvaluation methodology for multiclass novelty detection algorithms( 2013) Faria,ER ; Goncalves,IJCR ; João Gama ; Carvalho,ACPLFNovelty detection is a useful ability for learning systems, especially in data stream scenarios, where new concepts can appear, known concepts can disappear and concepts can evolve over time. There are several studies in the literature investigating the use of machine learning classification techniques for novelty detection in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques, particular for multiclass problems. In this study, we propose a new evaluation approach for multiclass data streams novelty detection problems. This approach is able to deal with: i) multiclass problems, ii) confusion matrix with a column representing the unknown examples, iii) confusion matrix that increases over time, iv) unsupervised learning, that generates novelties without an association with the problem classes and v) representation of the evaluation measures over time. We evaluate the performance of the proposed approach by known novelty detection algorithms with artificial and real data sets. © 2013 IEEE.
-
ItemOn evaluating stream learning algorithms( 2013) João Gama ; Raquel Sebastião ; Pedro Pereira RodriguesMost streaming decision models evolve continuously over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet convincingly addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of prequential error with forgetting mechanisms to provide reliable error estimators. We prove that, in stationary data and for consistent learning algorithms, the holdout estimator, the prequential error and the prequential error estimated over a sliding window or using fading factors, all converge to the Bayes error. The use of prequential error with forgetting mechanisms reveals to be advantageous in assessing performance and in comparing stream learning algorithms. It is also worthwhile to use the proposed methods for hypothesis testing and for change detection. In a set of experiments in drift scenarios, we evaluate the ability of a standard change detection algorithm to detect change using three prequential error estimators. These experiments point out that the use of forgetting mechanisms (sliding windows or fading factors) are required for fast and efficient change detection. In comparison to sliding windows, fading factors are faster and memoryless, both important requirements for streaming applications. Overall, this paper is a contribution to a discussion on best practice for performance assessment when learning is a continuous process, and the decision models are dynamic and evolve over time.
-
ItemAdaptive model rules from data streams( 2013) Ezilda Duarte Almeida ; Carlos Ferreira ; João GamaDecision rules are one of the most expressive languages for machine learning. In this paper we present Adaptive Model Rules (AMRules), the first streaming rule learning algorithm for regression problems. In AMRules the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of attribute values. Each rule uses a Page-Hinkley test to detect changes in the process generating data and react to changes by pruning the rule set. In the experimental section we report the results of AMRules on benchmark regression problems, and compare the performance of our system with other streaming regression algorithms. © 2013 Springer-Verlag.
-
ItemSMOTE for regression( 2013) Luís Torgo ; Rita Paula Ribeiro ; Pfahringer,B ; Paula Oliveira BrancoSeveral real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable. © 2013 Springer-Verlag.
-
ItemRule Induction for Sentence Reduction( 2013) João Cordeiro ; Dias,G ; Pavel BrazdilSentence Reduction has recently received a great attention from the research community of Automatic Text Summarization. Sentence Reduction consists in the elimination of sentence components such as words, part-of-speech tags sequences or chunks without highly deteriorating the information contained in the sentence and its grammatical correctness. In this paper, we present an unsupervised scalable methodology for learning sentence reduction rules. Paraphrases are first discovered within a collection of automatically crawled Web News Stories and then textually aligned in order to extract interchangeable text fragment candidates, in particular reduction cases. As only positive examples exist, Inductive Logic Programming (ILP) provides an interesting learning paradigm for the extraction of sentence reduction rules. As a consequence, reduction cases are transformed into first order logic clauses to supply a massive set of suitable learning instances and an ILP learning environment is defined within the context of the Aleph framework. Experiments evidence good results in terms of irrelevancy elimination, syntactical correctness and reduction rate in a real-world environment as opposed to other methodologies proposed so far.
-
ItemContextual Anomalies in Medical Data( 2013) Vasco,D ; Pedro Pereira Rodrigues ; João GamaAnomalies in data can cause a lot of problems in the data analysis processes. Thus, it is necessary to improve data quality by detecting and eliminating errors and inconsistencies in the data, known as the data cleaning process [1]. Since detection and correction of anomalies requires detailed domain knowledge, the involvement of experts in the field is essential to the success of the process of cleaning the data. However, considering the size of data to be processed, this process should be as automatic as possible so as to minimize the time spent [1]. © 2013 IEEE.