Biomedical information extraction for matching patients to clinical trials
Autor(a) principal: | |
---|---|
Data de Publicação: | 2018 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/61552 |
Resumo: | Digital Medical information had an astonishing growth on the last decades, driven by an unprecedented number of medical writers, which lead to a complete revolution in what and how much information is available to the health professionals. The problem with this wave of information is that performing a precise selection of the information retrieved by medical information repositories is very exhaustive and time consuming for physicians. This is one of the biggest challenges for physicians with the new digital era: how to reduce the time spent finding the perfect matching document for a patient (e.g. intervention articles, clinical trial, prescriptions). Precision Medicine (PM) 2017 is the track by the Text REtrieval Conference (TREC), that is focused on this type of challenges exclusively for oncology. Using a dataset with a large amount of clinical trials, this track is a good real life example on how information retrieval solutions can be used to solve this types of problems. This track can be a very good starting point for applying information extraction and retrieval methods, in a very complex domain. The purpose of this thesis is to improve a system designed by the NovaSearch team for TREC PM 2017 Clinical Trials task, which got ranked on the top-5 systems of 2017. The NovaSearch team also participated on the 2018 track and got a 15% increase on precision compared to the 2017 one. It was used multiple IR techniques for information extraction and processing of data, including rank fusion, query expansion (e.g. Pseudo relevance feedback, Mesh terms expansion) and experiments with Learning to Rank (LETOR) algorithms. Our goal is to retrieve the best possible set of trials for a given patient, using precise documents filters to exclude the unwanted clinical trials. This work can open doors in what can be done for searching and perceiving the criteria to exclude or include the trials, helping physicians even on the more complex and difficult information retrieval tasks. |
id |
RCAP_1e02334a993465cd02763308686db078 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/61552 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Biomedical information extraction for matching patients to clinical trialsMedical Text RetrievalQuery expansionInformation RetrievalRank FusionInformation ExtractionDomínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaDigital Medical information had an astonishing growth on the last decades, driven by an unprecedented number of medical writers, which lead to a complete revolution in what and how much information is available to the health professionals. The problem with this wave of information is that performing a precise selection of the information retrieved by medical information repositories is very exhaustive and time consuming for physicians. This is one of the biggest challenges for physicians with the new digital era: how to reduce the time spent finding the perfect matching document for a patient (e.g. intervention articles, clinical trial, prescriptions). Precision Medicine (PM) 2017 is the track by the Text REtrieval Conference (TREC), that is focused on this type of challenges exclusively for oncology. Using a dataset with a large amount of clinical trials, this track is a good real life example on how information retrieval solutions can be used to solve this types of problems. This track can be a very good starting point for applying information extraction and retrieval methods, in a very complex domain. The purpose of this thesis is to improve a system designed by the NovaSearch team for TREC PM 2017 Clinical Trials task, which got ranked on the top-5 systems of 2017. The NovaSearch team also participated on the 2018 track and got a 15% increase on precision compared to the 2017 one. It was used multiple IR techniques for information extraction and processing of data, including rank fusion, query expansion (e.g. Pseudo relevance feedback, Mesh terms expansion) and experiments with Learning to Rank (LETOR) algorithms. Our goal is to retrieve the best possible set of trials for a given patient, using precise documents filters to exclude the unwanted clinical trials. This work can open doors in what can be done for searching and perceiving the criteria to exclude or include the trials, helping physicians even on the more complex and difficult information retrieval tasks.Magalhães, JoãoMourão, AndréRUNAraújo, Gonçalo Carmo de2019-02-25T11:00:58Z2018-1220182018-12-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/61552enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T04:29:14Zoai:run.unl.pt:10362/61552Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:33:38.356400Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Biomedical information extraction for matching patients to clinical trials |
title |
Biomedical information extraction for matching patients to clinical trials |
spellingShingle |
Biomedical information extraction for matching patients to clinical trials Araújo, Gonçalo Carmo de Medical Text Retrieval Query expansion Information Retrieval Rank Fusion Information Extraction Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
title_short |
Biomedical information extraction for matching patients to clinical trials |
title_full |
Biomedical information extraction for matching patients to clinical trials |
title_fullStr |
Biomedical information extraction for matching patients to clinical trials |
title_full_unstemmed |
Biomedical information extraction for matching patients to clinical trials |
title_sort |
Biomedical information extraction for matching patients to clinical trials |
author |
Araújo, Gonçalo Carmo de |
author_facet |
Araújo, Gonçalo Carmo de |
author_role |
author |
dc.contributor.none.fl_str_mv |
Magalhães, João Mourão, André RUN |
dc.contributor.author.fl_str_mv |
Araújo, Gonçalo Carmo de |
dc.subject.por.fl_str_mv |
Medical Text Retrieval Query expansion Information Retrieval Rank Fusion Information Extraction Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
topic |
Medical Text Retrieval Query expansion Information Retrieval Rank Fusion Information Extraction Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática |
description |
Digital Medical information had an astonishing growth on the last decades, driven by an unprecedented number of medical writers, which lead to a complete revolution in what and how much information is available to the health professionals. The problem with this wave of information is that performing a precise selection of the information retrieved by medical information repositories is very exhaustive and time consuming for physicians. This is one of the biggest challenges for physicians with the new digital era: how to reduce the time spent finding the perfect matching document for a patient (e.g. intervention articles, clinical trial, prescriptions). Precision Medicine (PM) 2017 is the track by the Text REtrieval Conference (TREC), that is focused on this type of challenges exclusively for oncology. Using a dataset with a large amount of clinical trials, this track is a good real life example on how information retrieval solutions can be used to solve this types of problems. This track can be a very good starting point for applying information extraction and retrieval methods, in a very complex domain. The purpose of this thesis is to improve a system designed by the NovaSearch team for TREC PM 2017 Clinical Trials task, which got ranked on the top-5 systems of 2017. The NovaSearch team also participated on the 2018 track and got a 15% increase on precision compared to the 2017 one. It was used multiple IR techniques for information extraction and processing of data, including rank fusion, query expansion (e.g. Pseudo relevance feedback, Mesh terms expansion) and experiments with Learning to Rank (LETOR) algorithms. Our goal is to retrieve the best possible set of trials for a given patient, using precise documents filters to exclude the unwanted clinical trials. This work can open doors in what can be done for searching and perceiving the criteria to exclude or include the trials, helping physicians even on the more complex and difficult information retrieval tasks. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-12 2018 2018-12-01T00:00:00Z 2019-02-25T11:00:58Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/61552 |
url |
http://hdl.handle.net/10362/61552 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799137958256705536 |