An automatic approach for duplicate bibliographic metadata identification using classification

Borges, Eduardo Nunes; Becker, Karin; Heuser, Carlos; Galante, Renata

dc.contributor.author	Borges, Eduardo Nunes
dc.contributor.author	Becker, Karin
dc.contributor.author	Heuser, Carlos
dc.contributor.author	Galante, Renata
dc.date.accessioned	2012-01-07T22:47:43Z
dc.date.available	2012-01-07T22:47:43Z
dc.date.issued	2011
dc.identifier.citation	BORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011.	pt_BR
dc.identifier.uri	http://repositorio.furg.br/handle/1/1702
dc.description.abstract	References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality of results when compared to our unsupervised heuristic-based approach.	pt_BR
dc.language.iso	eng	pt_BR
dc.rights	open access	pt_BR
dc.subject	Classification algorithms	pt_BR
dc.subject	Information representation	pt_BR
dc.subject	Information management	pt_BR
dc.title	An automatic approach for duplicate bibliographic metadata identification using classification	pt_BR
dc.type	conferenceObject	pt_BR

Files in this item

An Automatic Approach for Duplicate Bibliographic.pdf

Size: 79.70Kb Format: PDF

This item appears in the following Collection(s)

:

C3 - Trabalhos apresentados em eventos

Show simple item record