Syntactic similarity of web documents.

Detalhes bibliográficos
Autor(a) principal: Pereira Junior, Álvaro Rodrigues
Data de Publicação: 2003
Outros Autores: Ziviani, Nivio
Tipo de documento: Artigo de conferência
Idioma: eng
Título da fonte: Repositório Institucional da UFOP
dARK ID: ark:/61566/0013000004b3x
Texto Completo: http://www.repositorio.ufop.br/handle/123456789/1682
Resumo: This paper presents and compares two methods for evaluating the syntactic similarity between documents. The first method uses the Patricia tree, constructed from the original document, and the similarity is computed searching the text of each candidate document in the tree. The second method uses shingles concept to obtain the similarity measure for every document pairs, and each shingle from the original document is inserted in a hash table, where shingles of each candidate document are searched. Given an original doc-ument and some candidates, two methods find documents that have some similarity relationship with the original doc-ument. Experimental results were obtained by using a pla-giarized documents generator system, from 900 documents collected from the Web. Considering the arithmetic ave rage of the absolute differences between the expected and ob-tained similarity, the algorithm that uses shingles obtained a performance of 4,13 % and the algorithm that uses Patricia tree a performance 7.50%
id UFOP_0a00d0aacbe9c80991e10dc9ff14aea9
oai_identifier_str oai:repositorio.ufop.br:123456789/1682
network_acronym_str UFOP
network_name_str Repositório Institucional da UFOP
repository_id_str 3233
spelling Syntactic similarity of web documents.This paper presents and compares two methods for evaluating the syntactic similarity between documents. The first method uses the Patricia tree, constructed from the original document, and the similarity is computed searching the text of each candidate document in the tree. The second method uses shingles concept to obtain the similarity measure for every document pairs, and each shingle from the original document is inserted in a hash table, where shingles of each candidate document are searched. Given an original doc-ument and some candidates, two methods find documents that have some similarity relationship with the original doc-ument. Experimental results were obtained by using a pla-giarized documents generator system, from 900 documents collected from the Web. Considering the arithmetic ave rage of the absolute differences between the expected and ob-tained similarity, the algorithm that uses shingles obtained a performance of 4,13 % and the algorithm that uses Patricia tree a performance 7.50%2012-10-18T20:49:54Z2012-10-18T20:49:54Z2003info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectapplication/pdfPEREIRA JUNIOR, A. R.; ZIVIANI, N. Syntactic similarity of web documents. In. Latin American Web Congress, 1 . 2003. Santiago . Anais... Santiago: Latin American Web Congress, 2003. v. 1. p. 194-200. Disponível em: <http://www.cwr.cl/la-web/2003/stamped/23_pereira_a.pdf>. Acesso em: 18 out. 2012.http://www.repositorio.ufop.br/handle/123456789/1682ark:/61566/0013000004b3xPereira Junior, Álvaro RodriguesZiviani, Nivioengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOPinfo:eu-repo/semantics/openAccess2024-11-10T15:48:43Zoai:repositorio.ufop.br:123456789/1682Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332024-11-10T15:48:43Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false
dc.title.none.fl_str_mv Syntactic similarity of web documents.
title Syntactic similarity of web documents.
spellingShingle Syntactic similarity of web documents.
Pereira Junior, Álvaro Rodrigues
title_short Syntactic similarity of web documents.
title_full Syntactic similarity of web documents.
title_fullStr Syntactic similarity of web documents.
title_full_unstemmed Syntactic similarity of web documents.
title_sort Syntactic similarity of web documents.
author Pereira Junior, Álvaro Rodrigues
author_facet Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
author_role author
author2 Ziviani, Nivio
author2_role author
dc.contributor.author.fl_str_mv Pereira Junior, Álvaro Rodrigues
Ziviani, Nivio
description This paper presents and compares two methods for evaluating the syntactic similarity between documents. The first method uses the Patricia tree, constructed from the original document, and the similarity is computed searching the text of each candidate document in the tree. The second method uses shingles concept to obtain the similarity measure for every document pairs, and each shingle from the original document is inserted in a hash table, where shingles of each candidate document are searched. Given an original doc-ument and some candidates, two methods find documents that have some similarity relationship with the original doc-ument. Experimental results were obtained by using a pla-giarized documents generator system, from 900 documents collected from the Web. Considering the arithmetic ave rage of the absolute differences between the expected and ob-tained similarity, the algorithm that uses shingles obtained a performance of 4,13 % and the algorithm that uses Patricia tree a performance 7.50%
publishDate 2003
dc.date.none.fl_str_mv 2003
2012-10-18T20:49:54Z
2012-10-18T20:49:54Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/conferenceObject
format conferenceObject
status_str publishedVersion
dc.identifier.uri.fl_str_mv PEREIRA JUNIOR, A. R.; ZIVIANI, N. Syntactic similarity of web documents. In. Latin American Web Congress, 1 . 2003. Santiago . Anais... Santiago: Latin American Web Congress, 2003. v. 1. p. 194-200. Disponível em: <http://www.cwr.cl/la-web/2003/stamped/23_pereira_a.pdf>. Acesso em: 18 out. 2012.
http://www.repositorio.ufop.br/handle/123456789/1682
dc.identifier.dark.fl_str_mv ark:/61566/0013000004b3x
identifier_str_mv PEREIRA JUNIOR, A. R.; ZIVIANI, N. Syntactic similarity of web documents. In. Latin American Web Congress, 1 . 2003. Santiago . Anais... Santiago: Latin American Web Congress, 2003. v. 1. p. 194-200. Disponível em: <http://www.cwr.cl/la-web/2003/stamped/23_pereira_a.pdf>. Acesso em: 18 out. 2012.
ark:/61566/0013000004b3x
url http://www.repositorio.ufop.br/handle/123456789/1682
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFOP
instname:Universidade Federal de Ouro Preto (UFOP)
instacron:UFOP
instname_str Universidade Federal de Ouro Preto (UFOP)
instacron_str UFOP
institution UFOP
reponame_str Repositório Institucional da UFOP
collection Repositório Institucional da UFOP
repository.name.fl_str_mv Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)
repository.mail.fl_str_mv repositorio@ufop.edu.br
_version_ 1817705756002091008