dc.contributor.author |
Borges, Eduardo Nunes |
|
dc.contributor.author |
Becker, Karin |
|
dc.contributor.author |
Heuser, Carlos |
|
dc.contributor.author |
Galante, Renata |
|
dc.date.accessioned |
2012-01-07T22:43:02Z |
|
dc.date.available |
2012-01-07T22:43:02Z |
|
dc.date.issued |
2011 |
|
dc.identifier.citation |
BORGES, Eduardo et al. A classification-based approach for bibliographic metadata deduplication. In: IADIS INTERNATIONAL CONFERENCE WWW/INTERNET, 2011, Rio de Janeiro. Anais eletrônicos... Rio de Janeiro: IADIS, 2011. Disponível em: <http://www.eduardo.c3.furg.br/arquivos/download/www-internet2011.pdf>. Acesso em: 24 dez. 2011. |
pt_BR |
dc.identifier.uri |
http://repositorio.furg.br/handle/1/1701 |
|
dc.description.abstract |
Digital libraries of scientific articles describe them using a set of metadata, including bibliographic references. These
references can be represented by several formats and styles. Considerable content variations can occur in some metadata
fields such as title, author names and publication venue. Besides, it is quite common to find references that omit same
metadata fields such as page numbers. Duplicate entries influence the quality of digital library services once they need to
be appropriately identified and treated. This paper presents a comparative analysis among different data classification
algorithms used to identify duplicated bibliographic metadata records. We have investigated the discovered patterns by
comparing the rules and the decision tree with the heuristics adopted in a previous work. Our experiments show that the
combination of specific-purpose similarity functions previously proposed and classification algorithms represent an
improvement up to 12% when compared to the experiments using our original approach. |
pt_BR |
dc.language.iso |
eng |
pt_BR |
dc.rights |
open access |
pt_BR |
dc.subject |
Deduplication |
pt_BR |
dc.subject |
Bibliographic metadata |
pt_BR |
dc.subject |
Classification |
pt_BR |
dc.subject |
Machine learning |
pt_BR |
dc.title |
A classification-based approach for bibliographic metadata deduplication |
pt_BR |
dc.type |
conferenceObject |
pt_BR |