Never Ending Language metaLearning: model management for CMU's ReadTheWeb project

Detalhes bibliográficos
Autor(a) principal: Tiago Miguel Martins Vieira
Data de Publicação: 2015
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/110259
Resumo: The main goal of CMU's ReadTheWeb project is to build a new kind of machine learning system that continuously reads the web, 24 hours per day, 7 days per week. This system is called the Never-Ending Language Learner (NELL) . While this goal is not necessarily unheard-of, NELL stands out as being capable of improving the way it learns over time, that is to say, it learns to read the web better than it did the day before. To succeed in such an arduous quest, NELL combines several subsystem components that implement complementary knowledge extraction methods. For the same task, NELL is able to use different extraction methods. The performance of the components that use such methods, that is the quality of the extracted knowledge, will however change over time. In order to maximize the performance of the system as a whole, it becomes necessary to choose the best component for a task at any given time. Due to the amount of data and algorithm's involved, traditional testing and selection methods are not a viable option. A preliminary approach to use metalearning to address this issue was proposed by Santos . In this project, we extend this work. Our approach seeks to relate the innate (meta)features of the data and the performance of algorithms. A first step will be to gather different sets of data (used in NELL) and test the performance of the above mentioned subsystem components on such data. The results are used to create a meta-learning system that can select the best algorithm for future sets of data. Proven successful, this system can then be implemented on NELL's framework to improve its learning capability.
id RCAP_7494c74e307e6be273f0599b86715b68
oai_identifier_str oai:repositorio-aberto.up.pt:10216/110259
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Never Ending Language metaLearning: model management for CMU's ReadTheWeb projectEngenharia electrotécnica, electrónica e informáticaElectrical engineering, Electronic engineering, Information engineeringThe main goal of CMU's ReadTheWeb project is to build a new kind of machine learning system that continuously reads the web, 24 hours per day, 7 days per week. This system is called the Never-Ending Language Learner (NELL) . While this goal is not necessarily unheard-of, NELL stands out as being capable of improving the way it learns over time, that is to say, it learns to read the web better than it did the day before. To succeed in such an arduous quest, NELL combines several subsystem components that implement complementary knowledge extraction methods. For the same task, NELL is able to use different extraction methods. The performance of the components that use such methods, that is the quality of the extracted knowledge, will however change over time. In order to maximize the performance of the system as a whole, it becomes necessary to choose the best component for a task at any given time. Due to the amount of data and algorithm's involved, traditional testing and selection methods are not a viable option. A preliminary approach to use metalearning to address this issue was proposed by Santos . In this project, we extend this work. Our approach seeks to relate the innate (meta)features of the data and the performance of algorithms. A first step will be to gather different sets of data (used in NELL) and test the performance of the above mentioned subsystem components on such data. The results are used to create a meta-learning system that can select the best algorithm for future sets of data. Proven successful, this system can then be implemented on NELL's framework to improve its learning capability.2015-07-202015-07-20T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://hdl.handle.net/10216/110259TID:201322269engTiago Miguel Martins Vieirainfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T14:28:20Zoai:repositorio-aberto.up.pt:10216/110259Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T00:01:58.926755Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
title Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
spellingShingle Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
Tiago Miguel Martins Vieira
Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
title_short Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
title_full Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
title_fullStr Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
title_full_unstemmed Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
title_sort Never Ending Language metaLearning: model management for CMU's ReadTheWeb project
author Tiago Miguel Martins Vieira
author_facet Tiago Miguel Martins Vieira
author_role author
dc.contributor.author.fl_str_mv Tiago Miguel Martins Vieira
dc.subject.por.fl_str_mv Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
topic Engenharia electrotécnica, electrónica e informática
Electrical engineering, Electronic engineering, Information engineering
description The main goal of CMU's ReadTheWeb project is to build a new kind of machine learning system that continuously reads the web, 24 hours per day, 7 days per week. This system is called the Never-Ending Language Learner (NELL) . While this goal is not necessarily unheard-of, NELL stands out as being capable of improving the way it learns over time, that is to say, it learns to read the web better than it did the day before. To succeed in such an arduous quest, NELL combines several subsystem components that implement complementary knowledge extraction methods. For the same task, NELL is able to use different extraction methods. The performance of the components that use such methods, that is the quality of the extracted knowledge, will however change over time. In order to maximize the performance of the system as a whole, it becomes necessary to choose the best component for a task at any given time. Due to the amount of data and algorithm's involved, traditional testing and selection methods are not a viable option. A preliminary approach to use metalearning to address this issue was proposed by Santos . In this project, we extend this work. Our approach seeks to relate the innate (meta)features of the data and the performance of algorithms. A first step will be to gather different sets of data (used in NELL) and test the performance of the above mentioned subsystem components on such data. The results are used to create a meta-learning system that can select the best algorithm for future sets of data. Proven successful, this system can then be implemented on NELL's framework to improve its learning capability.
publishDate 2015
dc.date.none.fl_str_mv 2015-07-20
2015-07-20T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/110259
TID:201322269
url https://hdl.handle.net/10216/110259
identifier_str_mv TID:201322269
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135943991492609