Feature extraction for the author name disambiguation problem in a bibliographic database

dc.contributor.author Jorge Miguel Silva en
dc.contributor.author Fernando Silva en
dc.date.accessioned 2017-12-21T19:42:28Z
dc.date.available 2017-12-21T19:42:28Z
dc.date.issued 2017 en
dc.description.abstract Author name disambiguation in bibliographic databases has been, and still is, a challenging research task due to the high uncertainty there is when matching a publication author with a concrete researcher. Common approaches normally either resort to clustering to group author's publications, or use a binary classifier to decide whether a given publication is written by a specific author. Both approaches benefit from authors publishing similar works (e.g. subject areas and venues), from the previous publication history of an author (the higher, the better), and validated publicationauthor associations for model creation. However, whenever such an algorithm is confronted with different works from an author, or an author without publication history, often it makes wrong identifications. In this paper, we describe a feature extraction method that aims to avoid the previous problems. Instead of generally characterizing an author, it selectively uses features that associate the author to a certain publication. We build a Random Forest model to assess the quality of our set of features. Its goal is to predict whether a given author is the true author of a certain publication. We use a bibliographic database named Authenticus with more than 250, 000 validated author-publication associations to test model quality. Our model achieved a top result of 95.37% accuracy in predicting matches and 91.92% in a real test scenario. Furthermore, in the last case the model was able to correctly predict 61.86% of the cases where authors had no previous publication history. Copyright 2017 ACM. en
dc.identifier.uri http://repositorio.inesctec.pt/handle/123456789/4708
dc.identifier.uri http://dx.doi.org/10.1145/3019612.3019663 en
dc.language eng en
dc.relation 5124 en
dc.relation 6650 en
dc.rights info:eu-repo/semantics/openAccess en
dc.title Feature extraction for the author name disambiguation problem in a bibliographic database en
dc.type conferenceObject en
dc.type Publication en
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
P-00M-X02.pdf
Size:
663.49 KB
Format:
Adobe Portable Document Format
Description: