Genealogical trees on the web : a search engine user perspective.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2008 |
Outros Autores: | , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFOP |
Texto Completo: | http://www.repositorio.ufop.br/handle/123456789/1676 |
Resumo: | This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents. |
id |
UFOP_44edf9d606f25acfb1e742c78fb84aaa |
---|---|
oai_identifier_str |
oai:localhost:123456789/1676 |
network_acronym_str |
UFOP |
network_name_str |
Repositório Institucional da UFOP |
repository_id_str |
3233 |
spelling |
Yates, Ricardo BaezaPereira Junior, Álvaro RodriguesZiviani, Nivio2012-10-18T19:01:55Z2012-10-18T19:01:55Z2008YATES, R. B.; PEREIRA JÚNIOR, A. R.; ZIVIANI, N. Genealogical trees on the web : a search engine user perspective. In. 17th International World Wide Web Conference, 17,. 2008. Beijing. Anais... Beijing: International World Wide Web Conference, 2008. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/www08.pdf>. Acesso em: 18 out. 2012.http://www.repositorio.ufop.br/handle/123456789/1676This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents.WebTextContent evolutionSearch engineWeb miningGenealogical trees on the web : a search engine user perspective.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOPinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://www.repositorio.ufop.br/bitstream/123456789/1676/5/license.txt8a4605be74aa9ea9d79846c1fba20a33MD55ORIGINALEVENTO_GenealogicalTreesWeb.pdfEVENTO_GenealogicalTreesWeb.pdfapplication/pdf581340http://www.repositorio.ufop.br/bitstream/123456789/1676/1/EVENTO_GenealogicalTreesWeb.pdf7e7a64089821751ef4d8f527c3b123d2MD51123456789/16762019-03-12 14:38:24.729oai:localhost:123456789/1676Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332019-03-12T18:38:24Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false |
dc.title.pt_BR.fl_str_mv |
Genealogical trees on the web : a search engine user perspective. |
title |
Genealogical trees on the web : a search engine user perspective. |
spellingShingle |
Genealogical trees on the web : a search engine user perspective. Yates, Ricardo Baeza Web Text Content evolution Search engine Web mining |
title_short |
Genealogical trees on the web : a search engine user perspective. |
title_full |
Genealogical trees on the web : a search engine user perspective. |
title_fullStr |
Genealogical trees on the web : a search engine user perspective. |
title_full_unstemmed |
Genealogical trees on the web : a search engine user perspective. |
title_sort |
Genealogical trees on the web : a search engine user perspective. |
author |
Yates, Ricardo Baeza |
author_facet |
Yates, Ricardo Baeza Pereira Junior, Álvaro Rodrigues Ziviani, Nivio |
author_role |
author |
author2 |
Pereira Junior, Álvaro Rodrigues Ziviani, Nivio |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Yates, Ricardo Baeza Pereira Junior, Álvaro Rodrigues Ziviani, Nivio |
dc.subject.por.fl_str_mv |
Web Text Content evolution Search engine Web mining |
topic |
Web Text Content evolution Search engine Web mining |
description |
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents. |
publishDate |
2008 |
dc.date.issued.fl_str_mv |
2008 |
dc.date.accessioned.fl_str_mv |
2012-10-18T19:01:55Z |
dc.date.available.fl_str_mv |
2012-10-18T19:01:55Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
YATES, R. B.; PEREIRA JÚNIOR, A. R.; ZIVIANI, N. Genealogical trees on the web : a search engine user perspective. In. 17th International World Wide Web Conference, 17,. 2008. Beijing. Anais... Beijing: International World Wide Web Conference, 2008. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/www08.pdf>. Acesso em: 18 out. 2012. |
dc.identifier.uri.fl_str_mv |
http://www.repositorio.ufop.br/handle/123456789/1676 |
identifier_str_mv |
YATES, R. B.; PEREIRA JÚNIOR, A. R.; ZIVIANI, N. Genealogical trees on the web : a search engine user perspective. In. 17th International World Wide Web Conference, 17,. 2008. Beijing. Anais... Beijing: International World Wide Web Conference, 2008. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/www08.pdf>. Acesso em: 18 out. 2012. |
url |
http://www.repositorio.ufop.br/handle/123456789/1676 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFOP instname:Universidade Federal de Ouro Preto (UFOP) instacron:UFOP |
instname_str |
Universidade Federal de Ouro Preto (UFOP) |
instacron_str |
UFOP |
institution |
UFOP |
reponame_str |
Repositório Institucional da UFOP |
collection |
Repositório Institucional da UFOP |
bitstream.url.fl_str_mv |
http://www.repositorio.ufop.br/bitstream/123456789/1676/5/license.txt http://www.repositorio.ufop.br/bitstream/123456789/1676/1/EVENTO_GenealogicalTreesWeb.pdf |
bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 7e7a64089821751ef4d8f527c3b123d2 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP) |
repository.mail.fl_str_mv |
repositorio@ufop.edu.br |
_version_ |
1801685717388099584 |