Automated integration of transport timetable information
Autor(a) principal: | |
---|---|
Data de Publicação: | 2020 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10400.22/16830 |
Resumo: | The ever-growing Web contains a large amount of data. This large amount of data is useful when combined with applications that can refine it and use it to improve its users’ lives. However, using the data available is not an easy task since most of the information is not represented in machine-friendly formats. Instead, this information is represented in formats ideal for human users, resulting in an additional effort for having machines interpreting, extracting, and integrating it, while at the same time ensuring the consistency of information from different sources. In this project, a solution using an ontology-based integration combined with web robots’ extraction automates the process required for updating information regarding schedules of public transports. An already existing application receives that information and uses it to calculate efficient routes for commuters. The proposed solution can extract information from multiple online sources and transform it into different formats. It can extract and transform the information from PDFs and HTML. The system provides a web service for the exportation of these formats by a route optimization system. This document contains the detailed process of the design and construction of the integration system. It describes the alternatives and selections that lead to the application created. Lastly, it evaluates the solution by performing extraction from several sources relevant to the project’s domain. |
id |
RCAP_a67c4ca1f8f6a8119548613eb4de259c |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/16830 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Automated integration of transport timetable informationIntegração automatizada de informação de horários de transportesInformation RetrievalWeb crawlingInformation IntegrationOntologyPDF extractionThe ever-growing Web contains a large amount of data. This large amount of data is useful when combined with applications that can refine it and use it to improve its users’ lives. However, using the data available is not an easy task since most of the information is not represented in machine-friendly formats. Instead, this information is represented in formats ideal for human users, resulting in an additional effort for having machines interpreting, extracting, and integrating it, while at the same time ensuring the consistency of information from different sources. In this project, a solution using an ontology-based integration combined with web robots’ extraction automates the process required for updating information regarding schedules of public transports. An already existing application receives that information and uses it to calculate efficient routes for commuters. The proposed solution can extract information from multiple online sources and transform it into different formats. It can extract and transform the information from PDFs and HTML. The system provides a web service for the exportation of these formats by a route optimization system. This document contains the detailed process of the design and construction of the integration system. It describes the alternatives and selections that lead to the application created. Lastly, it evaluates the solution by performing extraction from several sources relevant to the project’s domain.Pereira, António Jorge SantosRepositório Científico do Instituto Politécnico do PortoWesterberg, João Baptista Monteiro2021-02-02T14:27:03Z20202020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/16830TID:202550087enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-03-13T13:04:44Zoai:recipp.ipp.pt:10400.22/16830Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T17:36:28.661767Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Automated integration of transport timetable information Integração automatizada de informação de horários de transportes |
title |
Automated integration of transport timetable information |
spellingShingle |
Automated integration of transport timetable information Westerberg, João Baptista Monteiro Information Retrieval Web crawling Information Integration Ontology PDF extraction |
title_short |
Automated integration of transport timetable information |
title_full |
Automated integration of transport timetable information |
title_fullStr |
Automated integration of transport timetable information |
title_full_unstemmed |
Automated integration of transport timetable information |
title_sort |
Automated integration of transport timetable information |
author |
Westerberg, João Baptista Monteiro |
author_facet |
Westerberg, João Baptista Monteiro |
author_role |
author |
dc.contributor.none.fl_str_mv |
Pereira, António Jorge Santos Repositório Científico do Instituto Politécnico do Porto |
dc.contributor.author.fl_str_mv |
Westerberg, João Baptista Monteiro |
dc.subject.por.fl_str_mv |
Information Retrieval Web crawling Information Integration Ontology PDF extraction |
topic |
Information Retrieval Web crawling Information Integration Ontology PDF extraction |
description |
The ever-growing Web contains a large amount of data. This large amount of data is useful when combined with applications that can refine it and use it to improve its users’ lives. However, using the data available is not an easy task since most of the information is not represented in machine-friendly formats. Instead, this information is represented in formats ideal for human users, resulting in an additional effort for having machines interpreting, extracting, and integrating it, while at the same time ensuring the consistency of information from different sources. In this project, a solution using an ontology-based integration combined with web robots’ extraction automates the process required for updating information regarding schedules of public transports. An already existing application receives that information and uses it to calculate efficient routes for commuters. The proposed solution can extract information from multiple online sources and transform it into different formats. It can extract and transform the information from PDFs and HTML. The system provides a web service for the exportation of these formats by a route optimization system. This document contains the detailed process of the design and construction of the integration system. It describes the alternatives and selections that lead to the application created. Lastly, it evaluates the solution by performing extraction from several sources relevant to the project’s domain. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020 2020-01-01T00:00:00Z 2021-02-02T14:27:03Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/16830 TID:202550087 |
url |
http://hdl.handle.net/10400.22/16830 |
identifier_str_mv |
TID:202550087 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1817554521937674240 |