Construction of Geometries Based on Automatic Text Interpretation
Autor(a) principal: | |
---|---|
Data de Publicação: | 2021 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/10362/145303 |
Resumo: | When dealing with expanding systems, like the universe or the economy, due to its constant expansion, the statistical error is remarkably high to allow an understanding of the behaviour of the system. Thus, arises the necessity to transform an expanding system into a more straightforward system to work with. In order to address the problems, a geometric word space was constructed based on automatic text interpretation. News articles and economic reports about the European Union were collected and, using Python scripts, were cleaned and used to train a Word2Vec model. The trained model created multi-dimensional word spaces from three periods of the last decades, a pre-2000 period, a 2000-2008 period and a post-2008 period. After the interpretation of the created word spaces, a difference in behaviour between the country names was noticed. All the European Union member states were getting closer to each other until 2008, but after that year there was an abrupt rupture in this trend and ever country drifted apart. This behaviour can be linked with the 2008 financial crisis, though more research is needed to confirm this behaviour and hopefully found other correlations connecting the word spaces be- haviour and the real world. An improvement in the quantity and quality of the corpus will certainly improve the accuracy of the world space and enable a better understanding of the word spaces behaviour. |
id |
RCAP_675a265d9bdffd6ae255338839080a74 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/145303 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Construction of Geometries Based on Automatic Text InterpretationText interpretationWord space geometryWord2VecData ScienceEconomyPythonDomínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e TecnologiasWhen dealing with expanding systems, like the universe or the economy, due to its constant expansion, the statistical error is remarkably high to allow an understanding of the behaviour of the system. Thus, arises the necessity to transform an expanding system into a more straightforward system to work with. In order to address the problems, a geometric word space was constructed based on automatic text interpretation. News articles and economic reports about the European Union were collected and, using Python scripts, were cleaned and used to train a Word2Vec model. The trained model created multi-dimensional word spaces from three periods of the last decades, a pre-2000 period, a 2000-2008 period and a post-2008 period. After the interpretation of the created word spaces, a difference in behaviour between the country names was noticed. All the European Union member states were getting closer to each other until 2008, but after that year there was an abrupt rupture in this trend and ever country drifted apart. This behaviour can be linked with the 2008 financial crisis, though more research is needed to confirm this behaviour and hopefully found other correlations connecting the word spaces be- haviour and the real world. An improvement in the quantity and quality of the corpus will certainly improve the accuracy of the world space and enable a better understanding of the word spaces behaviour.Ao trabalhar com sistemas em expansão, como o universo ou a economia, o erro estatístico, decorrente da sua dilatação, é demasiado alto para permitir a compreensão dos comportamentos do sistema. Daí surgiu a necessidade de transformar um sistema em expansão num sistema cuja compreensão fosse mais acessível. Para resolver este problema, foi criado um espaço geométrico constituído por palavras através da interpretação automática de texto. Foram recolhidas noticias e relatórios sobre a situação económica da União Europeia e, usando scripts escritos em Python, foram limpos e utilizados para treinar um modelo de Word2Vec. O modelo treinado de Word2Vec criou três espaços multidimensionais constituídos por palavras em períodos diferentes. Um dos espaços foi construído com dados anteriores a 2000, outro com dados entre 2000 e 2008 e por último um com dados posteriores a 2008. Após a interpretação dos espaços criados, foi evidente uma grande mudança de comportamento entre os objetos que representam os nomes dos países. Todos os nomes dos estados membros da União Europeia estavam a aproximar-se até ao ano de 2008, no entanto, após esse ano, este comportamento susteve-se abruptamente e todos os países se afastaram. Este comportamento poderá estar ligado com a crise financeira de 2008, no entanto é necessária mais investigação para confirmar este comportamento e encontrar mais correlações entre o espaço criado e o mundo real. Um aumento na quantidade e qualidade da coleção de textos irá certamente melhorar a precisão na construção do espaço e contribuir para uma melhor compreensão dos comportamentos dos espaços criados.Catarino, IsabelCruz, JoãoRUNBernardo, Miguel Sequeira de Oliveira2022-11-08T19:07:41Z2021-012021-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/145303enginfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-03-11T05:25:33Zoai:run.unl.pt:10362/145303Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-20T03:52:00.224184Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Construction of Geometries Based on Automatic Text Interpretation |
title |
Construction of Geometries Based on Automatic Text Interpretation |
spellingShingle |
Construction of Geometries Based on Automatic Text Interpretation Bernardo, Miguel Sequeira de Oliveira Text interpretation Word space geometry Word2Vec Data Science Economy Python Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias |
title_short |
Construction of Geometries Based on Automatic Text Interpretation |
title_full |
Construction of Geometries Based on Automatic Text Interpretation |
title_fullStr |
Construction of Geometries Based on Automatic Text Interpretation |
title_full_unstemmed |
Construction of Geometries Based on Automatic Text Interpretation |
title_sort |
Construction of Geometries Based on Automatic Text Interpretation |
author |
Bernardo, Miguel Sequeira de Oliveira |
author_facet |
Bernardo, Miguel Sequeira de Oliveira |
author_role |
author |
dc.contributor.none.fl_str_mv |
Catarino, Isabel Cruz, João RUN |
dc.contributor.author.fl_str_mv |
Bernardo, Miguel Sequeira de Oliveira |
dc.subject.por.fl_str_mv |
Text interpretation Word space geometry Word2Vec Data Science Economy Python Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias |
topic |
Text interpretation Word space geometry Word2Vec Data Science Economy Python Domínio/Área Científica::Engenharia e Tecnologia::Outras Engenharias e Tecnologias |
description |
When dealing with expanding systems, like the universe or the economy, due to its constant expansion, the statistical error is remarkably high to allow an understanding of the behaviour of the system. Thus, arises the necessity to transform an expanding system into a more straightforward system to work with. In order to address the problems, a geometric word space was constructed based on automatic text interpretation. News articles and economic reports about the European Union were collected and, using Python scripts, were cleaned and used to train a Word2Vec model. The trained model created multi-dimensional word spaces from three periods of the last decades, a pre-2000 period, a 2000-2008 period and a post-2008 period. After the interpretation of the created word spaces, a difference in behaviour between the country names was noticed. All the European Union member states were getting closer to each other until 2008, but after that year there was an abrupt rupture in this trend and ever country drifted apart. This behaviour can be linked with the 2008 financial crisis, though more research is needed to confirm this behaviour and hopefully found other correlations connecting the word spaces be- haviour and the real world. An improvement in the quantity and quality of the corpus will certainly improve the accuracy of the world space and enable a better understanding of the word spaces behaviour. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-01 2021-01-01T00:00:00Z 2022-11-08T19:07:41Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/145303 |
url |
http://hdl.handle.net/10362/145303 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
|
_version_ |
1799138112321880064 |