Construction of Geometries Based on Automatic Text Interpretation

Detalhes bibliográficos
Autor(a) principal: Bernardo, Miguel Sequeira de Oliveira
Data de Publicação: 2021
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10362/145303
Resumo: When dealing with expanding systems, like the universe or the economy, due to its constant expansion, the statistical error is remarkably high to allow an understanding of the behaviour of the system. Thus, arises the necessity to transform an expanding system into a more straightforward system to work with. In order to address the problems, a geometric word space was constructed based on automatic text interpretation. News articles and economic reports about the European Union were collected and, using Python scripts, were cleaned and used to train a Word2Vec model. The trained model created multi-dimensional word spaces from three periods of the last decades, a pre-2000 period, a 2000-2008 period and a post-2008 period. After the interpretation of the created word spaces, a difference in behaviour between the country names was noticed. All the European Union member states were getting closer to each other until 2008, but after that year there was an abrupt rupture in this trend and ever country drifted apart. This behaviour can be linked with the 2008 financial crisis, though more research is needed to confirm this behaviour and hopefully found other correlations connecting the word spaces be- haviour and the real world. An improvement in the quantity and quality of the corpus will certainly improve the accuracy of the world space and enable a better understanding of the word spaces behaviour.
id RCAP_675a265d9bdffd6ae255338839080a74
oai_identifier_str oai:run.unl.pt:10362/145303
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Construction of Geometries Based on Automatic Text InterpretationText interpretationWord space geometryWord2VecData ScienceEconomyPythonDomínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e TecnologiasWhen dealing with expanding systems, like the universe or the economy, due to its constant expansion, the statistical error is remarkably high to allow an understanding of the behaviour of the system. Thus, arises the necessity to transform an expanding system into a more straightforward system to work with. In order to address the problems, a geometric word space was constructed based on automatic text interpretation. News articles and economic reports about the European Union were collected and, using Python scripts, were cleaned and used to train a Word2Vec model. The trained model created multi-dimensional word spaces from three periods of the last decades, a pre-2000 period, a 2000-2008 period and a post-2008 period. After the interpretation of the created word spaces, a difference in behaviour between the country names was noticed. All the European Union member states were getting closer to each other until 2008, but after that year there was an abrupt rupture in this trend and ever country drifted apart. This behaviour can be linked with the 2008 financial crisis, though more research is needed to confirm this behaviour and hopefully found other correlations connecting the word spaces be- haviour and the real world. An improvement in the quantity and quality of the corpus will certainly improve the accuracy of the world space and enable a better understanding of the word spaces behaviour.Ao trabalhar com sistemas em expansão, como o universo ou a economia, o erro estatístico, decorrente da sua dilatação, é demasiado alto para permitir a compreensão dos comportamentos do sistema. Daí surgiu a necessidade de transformar um sistema em expansão num sistema cuja compreensão fosse mais acessível. Para resolver este problema, foi criado um espaço geométrico constituído por palavras através da interpretação automática de texto. Foram recolhidas noticias e relatórios sobre a situação económica da União Europeia e, usando scripts escritos em Python, foram limpos e utilizados para treinar um modelo de Word2Vec. O modelo treinado de Word2Vec criou três espaços multidimensionais constituídos por palavras em períodos diferentes. Um dos espaços foi construído com dados anteriores a 2000, outro com dados entre 2000 e 2008 e por último um com dados posteriores a 2008. Após a interpretação dos espaços criados, foi evidente uma grande mudança de comportamento entre os objetos que representam os nomes dos países. Todos os nomes dos estados membros da União Europeia estavam a aproximar-se até ao ano de 2008, no entanto, após esse ano, este comportamento susteve-se abruptamente e todos os países se afastaram. Este comportamento poderá estar ligado com a crise financeira de 2008, no entanto é necessária mais investigação para confirmar este comportamento e encontrar mais correlações entre o espaço criado e o mundo real. Um aumento na quantidade e qualidade da coleção de textos irá certamente melhorar a precisão na construção do espaço e contribuir para uma melhor compreensão dos comportamentos dos espaços criados.Catarino, IsabelCruz, JoãoRUNBernardo, Miguel Sequeira de Oliveira2022-11-08T19:07:41Z2021-012021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/145303enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:25:33Zoai:run.unl.pt:10362/145303Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:52:00.224184Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Construction of Geometries Based on Automatic Text Interpretation
title Construction of Geometries Based on Automatic Text Interpretation
spellingShingle Construction of Geometries Based on Automatic Text Interpretation
Bernardo, Miguel Sequeira de Oliveira
Text interpretation
Word space geometry
Word2Vec
Data Science
Economy
Python
Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias
title_short Construction of Geometries Based on Automatic Text Interpretation
title_full Construction of Geometries Based on Automatic Text Interpretation
title_fullStr Construction of Geometries Based on Automatic Text Interpretation
title_full_unstemmed Construction of Geometries Based on Automatic Text Interpretation
title_sort Construction of Geometries Based on Automatic Text Interpretation
author Bernardo, Miguel Sequeira de Oliveira
author_facet Bernardo, Miguel Sequeira de Oliveira
author_role author
dc.contributor.none.fl_str_mv Catarino, Isabel
Cruz, João
RUN
dc.contributor.author.fl_str_mv Bernardo, Miguel Sequeira de Oliveira
dc.subject.por.fl_str_mv Text interpretation
Word space geometry
Word2Vec
Data Science
Economy
Python
Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias
topic Text interpretation
Word space geometry
Word2Vec
Data Science
Economy
Python
Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias
description When dealing with expanding systems, like the universe or the economy, due to its constant expansion, the statistical error is remarkably high to allow an understanding of the behaviour of the system. Thus, arises the necessity to transform an expanding system into a more straightforward system to work with. In order to address the problems, a geometric word space was constructed based on automatic text interpretation. News articles and economic reports about the European Union were collected and, using Python scripts, were cleaned and used to train a Word2Vec model. The trained model created multi-dimensional word spaces from three periods of the last decades, a pre-2000 period, a 2000-2008 period and a post-2008 period. After the interpretation of the created word spaces, a difference in behaviour between the country names was noticed. All the European Union member states were getting closer to each other until 2008, but after that year there was an abrupt rupture in this trend and ever country drifted apart. This behaviour can be linked with the 2008 financial crisis, though more research is needed to confirm this behaviour and hopefully found other correlations connecting the word spaces be- haviour and the real world. An improvement in the quantity and quality of the corpus will certainly improve the accuracy of the world space and enable a better understanding of the word spaces behaviour.
publishDate 2021
dc.date.none.fl_str_mv 2021-01
2021-01-01T00:00:00Z
2022-11-08T19:07:41Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/145303
url http://hdl.handle.net/10362/145303
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799138112321880064