Multiscale parameter tuning of a semantic relatedness algorithm

No Thumbnail Available
Date
2014
Authors
José Paulo Leal
Teresa Almeida Costa
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The research presented in this paper builds on previous work that lead to the definition of a family of semantic relatedness algorithms that compute a proximity given as input a pair of concept labels. The algorithms depends on a semantic graph, provided as RDF data, and on a particular set of weights assigned to the properties of RDF statements (types of arcs in the RDF graph). The current research objective is to automatically tune the weights for a given graph in order to increase the proximity quality. The quality of a semantic relatedness method is usually measured against a benchmark data set. The results produced by the method are compared with those on the benchmark using the Spearman's rank coefficient. This methodology works the other way round and uses this coefficient to tune the proximity weights. The tuning process is controlled by a genetic algorithm using the Spearman's rank coefficient as the fitness function. The genetic algorithm has its own set of parameters which also need to be tuned. Bootstrapping is based on a statistical method for generating samples that is used in this methodology to enable a large number of repetitions of the genetic algorithm, exploring the results of alternative parameter settings. This approach raises several technical challenges due to its computational complexity. This paper provides details on the techniques used to speedup this process. The proposed approach was validated with the WordNet 2.0 and the WordSim-353 data set. Several ranges of parameters values were tested and the obtained results are better than the state of the art methods for computing semantic relatedness using the WordNet 2.0, with the advantage of not requiring any domain knowledge of the ontological graph. © José Paulo Leal and Teresa Costa.
Description
Keywords
Citation