dc.contributor.author |
Borges, Eduardo Nunes |
|
dc.contributor.author |
Becker, Karin |
|
dc.contributor.author |
Heuser, Carlos |
|
dc.contributor.author |
Galante, Renata |
|
dc.date.accessioned |
2012-01-07T22:47:43Z |
|
dc.date.available |
2012-01-07T22:47:43Z |
|
dc.date.issued |
2011 |
|
dc.identifier.citation |
BORGES, Eduardo et al. An automatic approach for duplicate bibliographic metadata identification using classification. In: INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, 30., 2011, Curicó. Anais eletrônicos... Curicó, 2011. Disponível em: <http://jcc2011.utalca.cl/actas/SCCC/jcc2011_submission_47.pdf>. Acesso em: 24 dez. 2011. |
pt_BR |
dc.identifier.uri |
http://repositorio.furg.br/handle/1/1702 |
|
dc.description.abstract |
References are the main descriptive metadata used by digital libraries of scientific articles. These references can be represented by several formats and styles. Although considerable content variations can also occur in some metadata fields such as title, author names and publication venue. Duplicate
records influence the quality of digital library services once they need to be appropriately identified and treated. This paper presents an approach to identifying duplicated bibliographic metadata. We extend our previous work so that instead of
setting thresholds based on the scores returned by similarity functions, we use the scores to train classification algorithms
which automatically identify duplicated references. The experiments show that the classifiers increases up to 11% the quality
of results when compared to our unsupervised heuristic-based approach. |
pt_BR |
dc.language.iso |
eng |
pt_BR |
dc.rights |
open access |
pt_BR |
dc.subject |
Classification algorithms |
pt_BR |
dc.subject |
Information representation |
pt_BR |
dc.subject |
Information management |
pt_BR |
dc.title |
An automatic approach for duplicate bibliographic metadata identification using classification |
pt_BR |
dc.type |
conferenceObject |
pt_BR |