Challenges in computing semantic relatedness for large semantic graphs

Teresa Almeida Costa; José Paulo Leal

Challenges in computing semantic relatedness for large semantic graphs

Date

2014

Authors

Teresa Almeida Costa

José Paulo Leal

Abstract

The research presented in this paper is part of an ongoing work to define semantic relatedness measures to any given semantic graph. These measures are based on a prior definition of a family of proximity algorithms that computes the semantic relatedness between pairs of concepts, and are parametrized by a semantic graph and a set of weighted properties. The distinctive feature of the proximity algorithms is that they consider all paths connecting two concepts in the semantic graph. These parameters must be tuned in order to maximize the quality of the semantic measure against a benchmark data set. From a previous work, the process of tuning the weight assignment is already developed and relies on a genetic algorithm. The weight tuning process, using all the properties in the semantic graph, was validated using WordNet 2.0 and the data set WordSim-353. The quality of the obtained semantic measure is better than those in the literature. However, this approach did not produce equally good results in larger semantic graphs such as WordNet 3.0, DBPedia and Freebase. This was in part due to the size of these graphs. The current approach is to select a sub-graph of the original semantic graph, small enough to enable processing and large enough to include all the relevant paths. This paper provides an overview of the ongoing work and presents a strategy to overcome the challenges raise by large semantic graphs.Copyright 2014 ACM Semantic similarity Linked data freebase DBPedia WordNet.