Searching dynamic Web pages with semi-structured contents
Autor(a) principal: | |
---|---|
Data de Publicação: | 2003 |
Outros Autores: | , , |
Tipo de documento: | Livro |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | https://repositorio-aberto.up.pt/handle/10216/621 |
Resumo: | At present, information systems (IS) in higher education are usually supported by databases (DB) and accessed through a Web interface. So happens with SiFEUP, the IS of the Engineering Faculty of the University of Porto (FEUP). The typical SiFEUP user sees the system as a collection of Web pages and is not aware of the fact that most of them do not exist in the sense of being an actual HTML file stored in a server but corresponds to HTML code generated on the fly by a designated program that accesses the DB and brings the most up-to-date information to the user desktop. Typical search engines do not index dynamically generated Web pages or just do that for those that are specifically mentioned in a static page and do not follow on the links the dynamic page may contain. In this paper we describe the development of a search facility for SiFEUP, how the limitations put to indexing dynamic Web pages were circumvented, and an evaluation of the results obtained. The solution involves using a locally developed crawler, the Oracle Text full text indexer, plus meta-information automatically drawn from the DB or manually added to improve the relevance factor calculation. |
id |
RCAP_2b4bcdb34cf966a05a3f08731fd16f6f |
---|---|
oai_identifier_str |
oai:repositorio-aberto.up.pt:10216/621 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Searching dynamic Web pages with semi-structured contentsEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringAt present, information systems (IS) in higher education are usually supported by databases (DB) and accessed through a Web interface. So happens with SiFEUP, the IS of the Engineering Faculty of the University of Porto (FEUP). The typical SiFEUP user sees the system as a collection of Web pages and is not aware of the fact that most of them do not exist in the sense of being an actual HTML file stored in a server but corresponds to HTML code generated on the fly by a designated program that accesses the DB and brings the most up-to-date information to the user desktop. Typical search engines do not index dynamically generated Web pages or just do that for those that are specifically mentioned in a static page and do not follow on the links the dynamic page may contain. In this paper we describe the development of a search facility for SiFEUP, how the limitations put to indexing dynamic Web pages were circumvented, and an evaluation of the results obtained. The solution involves using a locally developed crawler, the Oracle Text full text indexer, plus meta-information automatically drawn from the DB or manually added to improve the relevance factor calculation.20032003-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/bookapplication/mswordhttps://repositorio-aberto.up.pt/handle/10216/621engFilipe SilvaArmando OliveiraLígia M. RibeiroGabriel Davidinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:52:28Zoai:repositorio-aberto.up.pt:10216/621Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:10:40.677900Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Searching dynamic Web pages with semi-structured contents |
title |
Searching dynamic Web pages with semi-structured contents |
spellingShingle |
Searching dynamic Web pages with semi-structured contents Filipe Silva Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
title_short |
Searching dynamic Web pages with semi-structured contents |
title_full |
Searching dynamic Web pages with semi-structured contents |
title_fullStr |
Searching dynamic Web pages with semi-structured contents |
title_full_unstemmed |
Searching dynamic Web pages with semi-structured contents |
title_sort |
Searching dynamic Web pages with semi-structured contents |
author |
Filipe Silva |
author_facet |
Filipe Silva Armando Oliveira Lígia M. Ribeiro Gabriel David |
author_role |
author |
author2 |
Armando Oliveira Lígia M. Ribeiro Gabriel David |
author2_role |
author author author |
dc.contributor.author.fl_str_mv |
Filipe Silva Armando Oliveira Lígia M. Ribeiro Gabriel David |
dc.subject.por.fl_str_mv |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
topic |
Engenharia electrotécnica, electrónica e informática Electrical engineering, Electronic engineering, Information engineering |
description |
At present, information systems (IS) in higher education are usually supported by databases (DB) and accessed through a Web interface. So happens with SiFEUP, the IS of the Engineering Faculty of the University of Porto (FEUP). The typical SiFEUP user sees the system as a collection of Web pages and is not aware of the fact that most of them do not exist in the sense of being an actual HTML file stored in a server but corresponds to HTML code generated on the fly by a designated program that accesses the DB and brings the most up-to-date information to the user desktop. Typical search engines do not index dynamically generated Web pages or just do that for those that are specifically mentioned in a static page and do not follow on the links the dynamic page may contain. In this paper we describe the development of a search facility for SiFEUP, how the limitations put to indexing dynamic Web pages were circumvented, and an evaluation of the results obtained. The solution involves using a locally developed crawler, the Oracle Text full text indexer, plus meta-information automatically drawn from the DB or manually added to improve the relevance factor calculation. |
publishDate |
2003 |
dc.date.none.fl_str_mv |
2003 2003-01-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/book |
format |
book |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://repositorio-aberto.up.pt/handle/10216/621 |
url |
https://repositorio-aberto.up.pt/handle/10216/621 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/msword |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799136030506352640 |