The evolution of web content and search engines.
Autor(a) principal: | |
---|---|
Data de Publicação: | 2006 |
Outros Autores: | , |
Tipo de documento: | Artigo de conferência |
Idioma: | eng |
Título da fonte: | Repositório Institucional da UFOP |
Texto Completo: | http://www.repositorio.ufop.br/handle/123456789/1677 |
Resumo: | The evolution of web content and search engines The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giv-ing answers to questions like: How much new content has evolved from the Web old content? How much of the Web content is biased by ranking algorithms of search engines? We used four snapshots of the Chilean Web containing documents of all the Chilean primary domains, crawled in four distinct periods of time. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. Our hypothesis is that when pages have parents, in a portion of pages there was a query that related the parents and made possible the creation of the new page. Thus, part of the Web content is biased by the ranking function of search engines. We also de¯ne a genealogical tree for the Web, where many pages are new and do not have parents and others have one or more parents. We present the Chilean Web genealogical tree and study its components. To the best of our knowledge this is the ¯rst paper that studies how old content is used to create new content, relating a search engine ranking algorithm with the creation of new pages. |
id |
UFOP_8a6e37486b550e5405d026f47ef8098b |
---|---|
oai_identifier_str |
oai:localhost:123456789/1677 |
network_acronym_str |
UFOP |
network_name_str |
Repositório Institucional da UFOP |
repository_id_str |
3233 |
spelling |
Yates, Ricardo BaezaPereira Junior, Álvaro RodriguesZiviani, Nivio2012-10-18T19:15:45Z2012-10-18T19:15:45Z2006YATES, R. B.; PEREIRA JUNIOR, A. R.; ZIVIANI, N. The evolution of web content and search engines. In. 8th ACM Workshop on Web Mining and Web Usage Analysis,8. 2006. Philadelphia. Anais... Philadelphia: ACM Workshop on Web Mining and Web Usage Analysis, 2006. v. 1. Disponível em: <http://webmining.spd.louisville.edu/webkdd06/papers/paper-7-The%20evolution%20of%20Web-Alvaro-Baeza-Final.pdf>. Acesso em: 18 out. 2012.http://www.repositorio.ufop.br/handle/123456789/1677The evolution of web content and search engines The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giv-ing answers to questions like: How much new content has evolved from the Web old content? How much of the Web content is biased by ranking algorithms of search engines? We used four snapshots of the Chilean Web containing documents of all the Chilean primary domains, crawled in four distinct periods of time. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. Our hypothesis is that when pages have parents, in a portion of pages there was a query that related the parents and made possible the creation of the new page. Thus, part of the Web content is biased by the ranking function of search engines. We also de¯ne a genealogical tree for the Web, where many pages are new and do not have parents and others have one or more parents. We present the Chilean Web genealogical tree and study its components. To the best of our knowledge this is the ¯rst paper that studies how old content is used to create new content, relating a search engine ranking algorithm with the creation of new pages.The evolution of web content and search engines.info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectengreponame:Repositório Institucional da UFOPinstname:Universidade Federal de Ouro Preto (UFOP)instacron:UFOPinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81748http://www.repositorio.ufop.br/bitstream/123456789/1677/5/license.txt8a4605be74aa9ea9d79846c1fba20a33MD55ORIGINALEVENTO_EvolutionContentSearch.pdfEVENTO_EvolutionContentSearch.pdfapplication/pdf523686http://www.repositorio.ufop.br/bitstream/123456789/1677/1/EVENTO_EvolutionContentSearch.pdfca7a7d3971c7e53ba69f45cbf54cd160MD51123456789/16772017-01-05 08:44:24.122oai:localhost:123456789/1677Tk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=Repositório InstitucionalPUBhttp://www.repositorio.ufop.br/oai/requestrepositorio@ufop.edu.bropendoar:32332017-01-05T13:44:24Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP)false |
dc.title.pt_BR.fl_str_mv |
The evolution of web content and search engines. |
title |
The evolution of web content and search engines. |
spellingShingle |
The evolution of web content and search engines. Yates, Ricardo Baeza |
title_short |
The evolution of web content and search engines. |
title_full |
The evolution of web content and search engines. |
title_fullStr |
The evolution of web content and search engines. |
title_full_unstemmed |
The evolution of web content and search engines. |
title_sort |
The evolution of web content and search engines. |
author |
Yates, Ricardo Baeza |
author_facet |
Yates, Ricardo Baeza Pereira Junior, Álvaro Rodrigues Ziviani, Nivio |
author_role |
author |
author2 |
Pereira Junior, Álvaro Rodrigues Ziviani, Nivio |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Yates, Ricardo Baeza Pereira Junior, Álvaro Rodrigues Ziviani, Nivio |
description |
The evolution of web content and search engines The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giv-ing answers to questions like: How much new content has evolved from the Web old content? How much of the Web content is biased by ranking algorithms of search engines? We used four snapshots of the Chilean Web containing documents of all the Chilean primary domains, crawled in four distinct periods of time. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. Our hypothesis is that when pages have parents, in a portion of pages there was a query that related the parents and made possible the creation of the new page. Thus, part of the Web content is biased by the ranking function of search engines. We also de¯ne a genealogical tree for the Web, where many pages are new and do not have parents and others have one or more parents. We present the Chilean Web genealogical tree and study its components. To the best of our knowledge this is the ¯rst paper that studies how old content is used to create new content, relating a search engine ranking algorithm with the creation of new pages. |
publishDate |
2006 |
dc.date.issued.fl_str_mv |
2006 |
dc.date.accessioned.fl_str_mv |
2012-10-18T19:15:45Z |
dc.date.available.fl_str_mv |
2012-10-18T19:15:45Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/conferenceObject |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.citation.fl_str_mv |
YATES, R. B.; PEREIRA JUNIOR, A. R.; ZIVIANI, N. The evolution of web content and search engines. In. 8th ACM Workshop on Web Mining and Web Usage Analysis,8. 2006. Philadelphia. Anais... Philadelphia: ACM Workshop on Web Mining and Web Usage Analysis, 2006. v. 1. Disponível em: <http://webmining.spd.louisville.edu/webkdd06/papers/paper-7-The%20evolution%20of%20Web-Alvaro-Baeza-Final.pdf>. Acesso em: 18 out. 2012. |
dc.identifier.uri.fl_str_mv |
http://www.repositorio.ufop.br/handle/123456789/1677 |
identifier_str_mv |
YATES, R. B.; PEREIRA JUNIOR, A. R.; ZIVIANI, N. The evolution of web content and search engines. In. 8th ACM Workshop on Web Mining and Web Usage Analysis,8. 2006. Philadelphia. Anais... Philadelphia: ACM Workshop on Web Mining and Web Usage Analysis, 2006. v. 1. Disponível em: <http://webmining.spd.louisville.edu/webkdd06/papers/paper-7-The%20evolution%20of%20Web-Alvaro-Baeza-Final.pdf>. Acesso em: 18 out. 2012. |
url |
http://www.repositorio.ufop.br/handle/123456789/1677 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.source.none.fl_str_mv |
reponame:Repositório Institucional da UFOP instname:Universidade Federal de Ouro Preto (UFOP) instacron:UFOP |
instname_str |
Universidade Federal de Ouro Preto (UFOP) |
instacron_str |
UFOP |
institution |
UFOP |
reponame_str |
Repositório Institucional da UFOP |
collection |
Repositório Institucional da UFOP |
bitstream.url.fl_str_mv |
http://www.repositorio.ufop.br/bitstream/123456789/1677/5/license.txt http://www.repositorio.ufop.br/bitstream/123456789/1677/1/EVENTO_EvolutionContentSearch.pdf |
bitstream.checksum.fl_str_mv |
8a4605be74aa9ea9d79846c1fba20a33 ca7a7d3971c7e53ba69f45cbf54cd160 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositório Institucional da UFOP - Universidade Federal de Ouro Preto (UFOP) |
repository.mail.fl_str_mv |
repositorio@ufop.edu.br |
_version_ |
1801685731843768320 |