Concept maps mning for text summarization

Detalhes bibliográficos
Autor(a) principal: Aguiar, Camila Zacché de
Data de Publicação: 2017
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
Texto Completo: http://repositorio.ufes.br/handle/10/9846
Resumo: Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way. However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts. This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics. The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it. In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach. Besides, we present a process for Concept Map Mining elaborated following four dimensions: Data Source Description, Domain Definition, Elements Identification and Map Visualization. In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an 7 online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services. In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary of the domain. Finally, an objective analysis to validate the accuracy of ExtroutNLP library is performed and presents 0.65 precision on the corpus. Furthermore, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed, reaching 0.75/0.45 for precision/recall of concepts and 0.57/0.23 for precision/recall of relationships in English language, and reaching 0.68/0.38 for precision/recall of concepts and 0.41/0.19 for precision/recall of relationships in Portuguese language. In addition, an experiment to verify if the concept map summarized by CMBuilder has influence for the understanding of the subject addressed in a text is conducted, reaching 60% of hits for maps extracted from small texts with multi-choice questions and 77% of hits for maps extracted from extensive texts with discursive questions.
id UFES_78e5a5b16b8bd9fed05beca3f406af64
oai_identifier_str oai:repositorio.ufes.br:10/9846
network_acronym_str UFES
network_name_str Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
repository_id_str 2108
spelling Zouaq, AmalCury, DavidsonAguiar, Camila Zacché deOliveira, Elias Silva deVillavicencio, AlineMenezes, Crediné Silva de2018-08-02T00:03:48Z2018-08-012018-08-02T00:03:48Z2017-03-31Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way. However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts. This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics. The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it. In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach. Besides, we present a process for Concept Map Mining elaborated following four dimensions: Data Source Description, Domain Definition, Elements Identification and Map Visualization. In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an 7 online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services. In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary of the domain. Finally, an objective analysis to validate the accuracy of ExtroutNLP library is performed and presents 0.65 precision on the corpus. Furthermore, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed, reaching 0.75/0.45 for precision/recall of concepts and 0.57/0.23 for precision/recall of relationships in English language, and reaching 0.68/0.38 for precision/recall of concepts and 0.41/0.19 for precision/recall of relationships in Portuguese language. In addition, an experiment to verify if the concept map summarized by CMBuilder has influence for the understanding of the subject addressed in a text is conducted, reaching 60% of hits for maps extracted from small texts with multi-choice questions and 77% of hits for maps extracted from extensive texts with discursive questions.Os mapas conceituais são ferramentas gráficas para a representação e construção do conhecimento. Conceitos e relações formam a base para o aprendizado e, portanto, os mapas conceituais têm sido amplamente utilizados em diferentes situações e para diferentes propósitos na educação, sendo uma delas a representação do texto escrito. Mesmo um gramático e complexo texto pode ser representado por um mapa conceitual contendo apenas conceitos e relações que representem o que foi expresso de uma forma mais complicada. No entanto, a construção manual de um mapa conceitual exige bastante tempo e esforço na identificação e estruturação do conhecimento, especialmente quando o mapa não deve representar os conceitos da estrutura cognitiva do autor. Em vez disso, o mapa deve representar os conceitos expressos em um texto. Assim, várias abordagens tecnológicas foram propostas para facilitar o processo de construção de mapas conceituais a partir de textos. Portanto, esta dissertação propõe uma nova abordagem para a construção automática de mapas conceituais como sumarização de textos científicos. A sumarização pretende produzir um mapa conceitual como uma representação resumida do texto, mantendo suas diversas e mais importantes características. A sumarização pode facilitar a compreensão dos textos, uma vez que os alunos estão tentando lidar com a sobrecarga cognitiva causada pela crescente quantidade de informação textual disponível atualmente. Este crescimento também pode ser prejudicial à construção do conhecimento. Assim, consideramos a hipótese de que a sumarização de um texto representado por um mapa conceitual pode atribuir características importantes para assimilar o conhecimento do texto, bem como diminuir a sua complexidade e o tempo necessário para processá-lo. Neste contexto, realizamos uma revisão da literatura entre os anos de 1994 e 2016 sobre as abordagens que visam a construção automática de mapas conceituais a partir de textos. A partir disso, construímos uma categorização para melhor identificar e analisar os recursos e as características dessas abordagens tecnológicas. Além disso, buscamos identificar as limitações e reunir as melhores características dos trabalhos relacionados para propor nossa abordagem. 9 Ademais, apresentamos um processo Concept Map Mining elaborado seguindo quatro dimensões: Descrição da Fonte de Dados, Definição do Domínio, Identificação de Elementos e Visualização do Mapa. Com o intuito de desenvolver uma arquitetura computacional para construir automaticamente mapas conceituais como sumarização de textos acadêmicos, esta pesquisa resultou na ferramenta pública CMBuilder, uma ferramenta online para a construção automática de mapas conceituais a partir de textos, bem como uma api java chamada ExtroutNLP, que contém bibliotecas para extração de informações e serviços públicos. Para alcançar o objetivo proposto, direcionados esforços para áreas de processamento de linguagem natural e recuperação de informação. Ressaltamos que a principal tarefa para alcançar nosso objetivo é extrair do texto as proposições do tipo (conceito, relação, conceito). Sob essa premissa, a pesquisa introduz um pipeline que compreende: regras gramaticais e busca em profundidade para a extração de conceitos e relações a partir do texto; mapeamento de preposição, resolução de anáforas e exploração de entidades nomeadas para a rotulação de conceitos; ranking de conceitos baseado na análise de frequência de elementos e na topologia do mapa; e sumarização de proposição baseada na topologia do grafo. Além disso, a abordagem também propõe o uso de técnicas de aprendizagem supervisionada de clusterização e classificação associadas ao uso de um tesauro para a definição do domínio do texto e construção de um vocabulário conceitual de domínios. Finalmente, uma análise objetiva para validar a exatidão da biblioteca ExtroutNLP é executada e apresenta 0.65 precision sobre o corpus. Além disso, uma análise subjetiva para validar a qualidade do mapa conceitual construído pela ferramenta CMBuilder é realizada, apresentando 0.75/0.45 para precision/recall de conceitos e 0.57/0.23 para precision/recall de relações em idioma inglês e apresentando 0.68/0.38 para precision/recall de conceitos e 0.41/0.19 para precision/recall de relações em idioma português. Ademais, um experimento para verificar se o mapa conceitual sumarizado pelo CMBuilder tem influência para a compreensão do assunto abordado em um texto é realizado, atingindo 60% de acertos para mapas extraídos de pequenos textos com questões de múltipla escolha e 77% de acertos para mapas extraídos de textos extensos com questões discursivas.TextAGUIAR, Camila Zacché de. Concept maps mning for text summarization. 2017. 149 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2017.http://repositorio.ufes.br/handle/10/9846engUniversidade Federal do Espírito SantoMestrado em InformáticaPrograma de Pós-Graduação em InformáticaUFESBRCentro TecnológicoInformation retrievalSummarizationKnowledge representationNatural language processingMapas conceituaisMineração de mapas conceituaisProcessamento de linguagem naturalConcept map miningSumarização de textosRepresentação do conhecimentoInformática na educaçãoProcessamento de linguagem natural (Computação)Recuperação da informaçãoExploração de dados (Computação)Ciência da Computação004Concept maps mning for text summarizationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)instname:Universidade Federal do Espírito Santo (UFES)instacron:UFESORIGINALCamilaZacche_dissertacao_final.pdfapplication/pdf5437260http://repositorio.ufes.br/bitstreams/75caa235-9e3d-4250-ab40-7c9e29e6735a/download0c96c6b2cce9c15ea234627fad78ac9aMD5110/98462024-07-17 16:59:52.268oai:repositorio.ufes.br:10/9846http://repositorio.ufes.brRepositório InstitucionalPUBhttp://repositorio.ufes.br/oai/requestopendoar:21082024-10-15T18:00:00.070008Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)false
dc.title.none.fl_str_mv Concept maps mning for text summarization
title Concept maps mning for text summarization
spellingShingle Concept maps mning for text summarization
Aguiar, Camila Zacché de
Information retrieval
Summarization
Knowledge representation
Natural language processing
Mapas conceituais
Mineração de mapas conceituais
Processamento de linguagem natural
Concept map mining
Sumarização de textos
Representação do conhecimento
Ciência da Computação
Informática na educação
Processamento de linguagem natural (Computação)
Recuperação da informação
Exploração de dados (Computação)
004
title_short Concept maps mning for text summarization
title_full Concept maps mning for text summarization
title_fullStr Concept maps mning for text summarization
title_full_unstemmed Concept maps mning for text summarization
title_sort Concept maps mning for text summarization
author Aguiar, Camila Zacché de
author_facet Aguiar, Camila Zacché de
author_role author
dc.contributor.advisor-co1.fl_str_mv Zouaq, Amal
dc.contributor.advisor1.fl_str_mv Cury, Davidson
dc.contributor.author.fl_str_mv Aguiar, Camila Zacché de
dc.contributor.referee1.fl_str_mv Oliveira, Elias Silva de
dc.contributor.referee2.fl_str_mv Villavicencio, Aline
dc.contributor.referee3.fl_str_mv Menezes, Crediné Silva de
contributor_str_mv Zouaq, Amal
Cury, Davidson
Oliveira, Elias Silva de
Villavicencio, Aline
Menezes, Crediné Silva de
dc.subject.eng.fl_str_mv Information retrieval
Summarization
Knowledge representation
Natural language processing
topic Information retrieval
Summarization
Knowledge representation
Natural language processing
Mapas conceituais
Mineração de mapas conceituais
Processamento de linguagem natural
Concept map mining
Sumarização de textos
Representação do conhecimento
Ciência da Computação
Informática na educação
Processamento de linguagem natural (Computação)
Recuperação da informação
Exploração de dados (Computação)
004
dc.subject.por.fl_str_mv Mapas conceituais
Mineração de mapas conceituais
Processamento de linguagem natural
Concept map mining
Sumarização de textos
Representação do conhecimento
dc.subject.cnpq.fl_str_mv Ciência da Computação
dc.subject.br-rjbn.none.fl_str_mv Informática na educação
Processamento de linguagem natural (Computação)
Recuperação da informação
Exploração de dados (Computação)
dc.subject.udc.none.fl_str_mv 004
description Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way. However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts. This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics. The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it. In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach. Besides, we present a process for Concept Map Mining elaborated following four dimensions: Data Source Description, Domain Definition, Elements Identification and Map Visualization. In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an 7 online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services. In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary of the domain. Finally, an objective analysis to validate the accuracy of ExtroutNLP library is performed and presents 0.65 precision on the corpus. Furthermore, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed, reaching 0.75/0.45 for precision/recall of concepts and 0.57/0.23 for precision/recall of relationships in English language, and reaching 0.68/0.38 for precision/recall of concepts and 0.41/0.19 for precision/recall of relationships in Portuguese language. In addition, an experiment to verify if the concept map summarized by CMBuilder has influence for the understanding of the subject addressed in a text is conducted, reaching 60% of hits for maps extracted from small texts with multi-choice questions and 77% of hits for maps extracted from extensive texts with discursive questions.
publishDate 2017
dc.date.issued.fl_str_mv 2017-03-31
dc.date.accessioned.fl_str_mv 2018-08-02T00:03:48Z
dc.date.available.fl_str_mv 2018-08-01
2018-08-02T00:03:48Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv AGUIAR, Camila Zacché de. Concept maps mning for text summarization. 2017. 149 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2017.
dc.identifier.uri.fl_str_mv http://repositorio.ufes.br/handle/10/9846
identifier_str_mv AGUIAR, Camila Zacché de. Concept maps mning for text summarization. 2017. 149 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2017.
url http://repositorio.ufes.br/handle/10/9846
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv Text
dc.publisher.none.fl_str_mv Universidade Federal do Espírito Santo
Mestrado em Informática
dc.publisher.program.fl_str_mv Programa de Pós-Graduação em Informática
dc.publisher.initials.fl_str_mv UFES
dc.publisher.country.fl_str_mv BR
dc.publisher.department.fl_str_mv Centro Tecnológico
publisher.none.fl_str_mv Universidade Federal do Espírito Santo
Mestrado em Informática
dc.source.none.fl_str_mv reponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
instname:Universidade Federal do Espírito Santo (UFES)
instacron:UFES
instname_str Universidade Federal do Espírito Santo (UFES)
instacron_str UFES
institution UFES
reponame_str Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
collection Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
bitstream.url.fl_str_mv http://repositorio.ufes.br/bitstreams/75caa235-9e3d-4250-ab40-7c9e29e6735a/download
bitstream.checksum.fl_str_mv 0c96c6b2cce9c15ea234627fad78ac9a
bitstream.checksumAlgorithm.fl_str_mv MD5
repository.name.fl_str_mv Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)
repository.mail.fl_str_mv
_version_ 1813022560880689152