Concept maps mning for text summarization

Aguiar, Camila Zacché de

Concept maps mning for text summarization

Detalhes bibliográficos
Autor(a) principal:	Aguiar, Camila Zacché de
Data de Publicação:	2017
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
Texto Completo:	http://repositorio.ufes.br/handle/10/9846
Resumo:	Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way. However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts. This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics. The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it. In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach. Besides, we present a process for Concept Map Mining elaborated following four dimensions: Data Source Description, Domain Definition, Elements Identification and Map Visualization. In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an 7 online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services. In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary of the domain. Finally, an objective analysis to validate the accuracy of ExtroutNLP library is performed and presents 0.65 precision on the corpus. Furthermore, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed, reaching 0.75/0.45 for precision/recall of concepts and 0.57/0.23 for precision/recall of relationships in English language, and reaching 0.68/0.38 for precision/recall of concepts and 0.41/0.19 for precision/recall of relationships in Portuguese language. In addition, an experiment to verify if the concept map summarized by CMBuilder has influence for the understanding of the subject addressed in a text is conducted, reaching 60% of hits for maps extracted from small texts with multi-choice questions and 77% of hits for maps extracted from extensive texts with discursive questions.

Metadados do item

id	UFES_78e5a5b16b8bd9fed05beca3f406af64
oai_identifier_str	oai:repositorio.ufes.br:10/9846
network_acronym_str	UFES
network_name_str	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
repository_id_str	2108
spelling	Zouaq, AmalCury, DavidsonAguiar, Camila Zacché deOliveira, Elias Silva deVillavicencio, AlineMenezes, Crediné Silva de2018-08-02T00:03:48Z2018-08-012018-08-02T00:03:48Z2017-03-31Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way. However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts. This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics. The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it. In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach. Besides, we present a process for Concept Map Mining elaborated following four dimensions: Data Source Description, Domain Definition, Elements Identification and Map Visualization. In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an 7 online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services. In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary of the domain. Finally, an objective analysis to validate the accuracy of ExtroutNLP library is performed and presents 0.65 precision on the corpus. Furthermore, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed, reaching 0.75/0.45 for precision/recall of concepts and 0.57/0.23 for precision/recall of relationships in English language, and reaching 0.68/0.38 for precision/recall of concepts and 0.41/0.19 for precision/recall of relationships in Portuguese language. In addition, an experiment to verify if the concept map summarized by CMBuilder has influence for the understanding of the subject addressed in a text is conducted, reaching 60% of hits for maps extracted from small texts with multi-choice questions and 77% of hits for maps extracted from extensive texts with discursive questions.Os mapas conceituais são ferramentas gráficas para a representação e construção do conhecimento. Conceitos e relações formam a base para o aprendizado e, portanto, os mapas conceituais têm sido amplamente utilizados em diferentes situações e para diferentes propósitos na educação, sendo uma delas a representação do texto escrito. Mesmo um gramático e complexo texto pode ser representado por um mapa conceitual contendo apenas conceitos e relações que representem o que foi expresso de uma forma mais complicada. No entanto, a construção manual de um mapa conceitual exige bastante tempo e esforço na identificação e estruturação do conhecimento, especialmente quando o mapa não deve representar os conceitos da estrutura cognitiva do autor. Em vez disso, o mapa deve representar os conceitos expressos em um texto. Assim, várias abordagens tecnológicas foram propostas para facilitar o processo de construção de mapas conceituais a partir de textos. Portanto, esta dissertação propõe uma nova abordagem para a construção automática de mapas conceituais como sumarização de textos científicos. A sumarização pretende produzir um mapa conceitual como uma representação resumida do texto, mantendo suas diversas e mais importantes características. A sumarização pode facilitar a compreensão dos textos, uma vez que os alunos estão tentando lidar com a sobrecarga cognitiva causada pela crescente quantidade de informação textual disponível atualmente. Este crescimento também pode ser prejudicial à construção do conhecimento. Assim, consideramos a hipótese de que a sumarização de um texto representado por um mapa conceitual pode atribuir características importantes para assimilar o conhecimento do texto, bem como diminuir a sua complexidade e o tempo necessário para processá-lo. Neste contexto, realizamos uma revisão da literatura entre os anos de 1994 e 2016 sobre as abordagens que visam a construção automática de mapas conceituais a partir de textos. A partir disso, construímos uma categorização para melhor identificar e analisar os recursos e as características dessas abordagens tecnológicas. Além disso, buscamos identificar as limitações e reunir as melhores características dos trabalhos relacionados para propor nossa abordagem. 9 Ademais, apresentamos um processo Concept Map Mining elaborado seguindo quatro dimensões: Descrição da Fonte de Dados, Definição do Domínio, Identificação de Elementos e Visualização do Mapa. Com o intuito de desenvolver uma arquitetura computacional para construir automaticamente mapas conceituais como sumarização de textos acadêmicos, esta pesquisa resultou na ferramenta pública CMBuilder, uma ferramenta online para a construção automática de mapas conceituais a partir de textos, bem como uma api java chamada ExtroutNLP, que contém bibliotecas para extração de informações e serviços públicos. Para alcançar o objetivo proposto, direcionados esforços para áreas de processamento de linguagem natural e recuperação de informação. Ressaltamos que a principal tarefa para alcançar nosso objetivo é extrair do texto as proposições do tipo (conceito, relação, conceito). Sob essa premissa, a pesquisa introduz um pipeline que compreende: regras gramaticais e busca em profundidade para a extração de conceitos e relações a partir do texto; mapeamento de preposição, resolução de anáforas e exploração de entidades nomeadas para a rotulação de conceitos; ranking de conceitos baseado na análise de frequência de elementos e na topologia do mapa; e sumarização de proposição baseada na topologia do grafo. Além disso, a abordagem também propõe o uso de técnicas de aprendizagem supervisionada de clusterização e classificação associadas ao uso de um tesauro para a definição do domínio do texto e construção de um vocabulário conceitual de domínios. Finalmente, uma análise objetiva para validar a exatidão da biblioteca ExtroutNLP é executada e apresenta 0.65 precision sobre o corpus. Além disso, uma análise subjetiva para validar a qualidade do mapa conceitual construído pela ferramenta CMBuilder é realizada, apresentando 0.75/0.45 para precision/recall de conceitos e 0.57/0.23 para precision/recall de relações em idioma inglês e apresentando 0.68/0.38 para precision/recall de conceitos e 0.41/0.19 para precision/recall de relações em idioma português. Ademais, um experimento para verificar se o mapa conceitual sumarizado pelo CMBuilder tem influência para a compreensão do assunto abordado em um texto é realizado, atingindo 60% de acertos para mapas extraídos de pequenos textos com questões de múltipla escolha e 77% de acertos para mapas extraídos de textos extensos com questões discursivas.TextAGUIAR, Camila Zacché de. Concept maps mning for text summarization. 2017. 149 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2017.http://repositorio.ufes.br/handle/10/9846engUniversidade Federal do Espírito SantoMestrado em InformáticaPrograma de Pós-Graduação em InformáticaUFESBRCentro TecnológicoInformation retrievalSummarizationKnowledge representationNatural language processingMapas conceituaisMineração de mapas conceituaisProcessamento de linguagem naturalConcept map miningSumarização de textosRepresentação do conhecimentoInformática na educaçãoProcessamento de linguagem natural (Computação)Recuperação da informaçãoExploração de dados (Computação)Ciência da Computação004Concept maps mning for text summarizationinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)instname:Universidade Federal do Espírito Santo (UFES)instacron:UFESORIGINALCamilaZacche_dissertacao_final.pdfapplication/pdf5437260http://repositorio.ufes.br/bitstreams/75caa235-9e3d-4250-ab40-7c9e29e6735a/download0c96c6b2cce9c15ea234627fad78ac9aMD5110/98462024-07-17 16:59:52.268oai:repositorio.ufes.br:10/9846http://repositorio.ufes.brRepositório InstitucionalPUBhttp://repositorio.ufes.br/oai/requestopendoar:21082024-10-15T18:00:00.070008Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)false
dc.title.none.fl_str_mv	Concept maps mning for text summarization
title	Concept maps mning for text summarization
spellingShingle	Concept maps mning for text summarization Aguiar, Camila Zacché de Information retrieval Summarization Knowledge representation Natural language processing Mapas conceituais Mineração de mapas conceituais Processamento de linguagem natural Concept map mining Sumarização de textos Representação do conhecimento Ciência da Computação Informática na educação Processamento de linguagem natural (Computação) Recuperação da informação Exploração de dados (Computação) 004
title_short	Concept maps mning for text summarization
title_full	Concept maps mning for text summarization
title_fullStr	Concept maps mning for text summarization
title_full_unstemmed	Concept maps mning for text summarization
title_sort	Concept maps mning for text summarization
author	Aguiar, Camila Zacché de
author_facet	Aguiar, Camila Zacché de
author_role	author
dc.contributor.advisor-co1.fl_str_mv	Zouaq, Amal
dc.contributor.advisor1.fl_str_mv	Cury, Davidson
dc.contributor.author.fl_str_mv	Aguiar, Camila Zacché de
dc.contributor.referee1.fl_str_mv	Oliveira, Elias Silva de
dc.contributor.referee2.fl_str_mv	Villavicencio, Aline
dc.contributor.referee3.fl_str_mv	Menezes, Crediné Silva de
contributor_str_mv	Zouaq, Amal Cury, Davidson Oliveira, Elias Silva de Villavicencio, Aline Menezes, Crediné Silva de
dc.subject.eng.fl_str_mv	Information retrieval Summarization Knowledge representation Natural language processing
topic	Information retrieval Summarization Knowledge representation Natural language processing Mapas conceituais Mineração de mapas conceituais Processamento de linguagem natural Concept map mining Sumarização de textos Representação do conhecimento Ciência da Computação Informática na educação Processamento de linguagem natural (Computação) Recuperação da informação Exploração de dados (Computação) 004
dc.subject.por.fl_str_mv	Mapas conceituais Mineração de mapas conceituais Processamento de linguagem natural Concept map mining Sumarização de textos Representação do conhecimento
dc.subject.cnpq.fl_str_mv	Ciência da Computação
dc.subject.br-rjbn.none.fl_str_mv	Informática na educação Processamento de linguagem natural (Computação) Recuperação da informação Exploração de dados (Computação)
dc.subject.udc.none.fl_str_mv	004
description	Concept maps are graphical tools for the representation and construction of knowledge. Concepts and relationships form the basis for learning and, therefore, concept maps have been extensively used in different situations and for different purposes in education, one of them being representation of written text. Even a complex and grammatically difficult one can be represented by a concept map containing only concepts and relationships that represent what was expressed in a more complicated way. However, the manual construction of a concept map requires quite a bit of time and effort in the identification and structuring of knowledge, especially when the map should not represent the concepts of the author's cognitive structure. Instead, the map should represent the concepts expressed in a text. Thus, several technological approaches have been proposed in order to facilitate the process of constructing concept maps from texts. This dissertation proposes a new approach to automatically build concept maps as a summarization of scientific texts. The summarization aims to produce a concept map as a summarized representation of the text while maintaining its various and most important characteristics. The summarization facilitates the understanding of texts, as the students are trying to cope with the cognitive overload caused by the increasing amount of available textual information. This increase can also be harmful to the construction of knowledge. Thus, we hypothesized that the summarization of a text represented by a concept map may contribute for assimilating the knowledge of the text, as well as decrease its complexity and the time needed to process it. In this context, we conducted a review of literature from between the years of 1994 and 2016 on the approaches aimed at the automatic construction of concept maps from texts. From it, we built a categorization to better identify and analyze the features and characteristics of these technological approaches. Furthermore, we sought to identify the limitations and gather the best features of the related works to propose our approach. Besides, we present a process for Concept Map Mining elaborated following four dimensions: Data Source Description, Domain Definition, Elements Identification and Map Visualization. In order to develop a computational architecture to automatically build concept maps as summarization of academic texts, this research resulted in the public tool CMBuilder, an 7 online tool for the automatic construction of concept maps from texts, as well as a public api java called ExtroutNLP, which contains libraries for information extraction and public services. In order to reach the proposed objective, we used methods from natural language processing and information retrieval. The main task to reach the objective is to extract propositions of the type (concept, relation, concept) from the text. Based on that, the research introduces a pipeline that comprises the following: grammar rules and depth-first search for the extraction of concepts and relations between them from text; preposition mapping, anaphora resolution, and exploitation of named entities for concept labeling; concepts ranking based on frequency and map topology; and summarization of propositions based on graph topology. Moreover, the approach also proposes the use of supervised learning techniques of clustering and classification associated with the use of a thesaurus for the definition of the text domain and the construction of a conceptual vocabulary of the domain. Finally, an objective analysis to validate the accuracy of ExtroutNLP library is performed and presents 0.65 precision on the corpus. Furthermore, a qualitative analysis to validate the quality of the concept map built by the CMBuilder tool is performed, reaching 0.75/0.45 for precision/recall of concepts and 0.57/0.23 for precision/recall of relationships in English language, and reaching 0.68/0.38 for precision/recall of concepts and 0.41/0.19 for precision/recall of relationships in Portuguese language. In addition, an experiment to verify if the concept map summarized by CMBuilder has influence for the understanding of the subject addressed in a text is conducted, reaching 60% of hits for maps extracted from small texts with multi-choice questions and 77% of hits for maps extracted from extensive texts with discursive questions.
publishDate	2017
dc.date.issued.fl_str_mv	2017-03-31
dc.date.accessioned.fl_str_mv	2018-08-02T00:03:48Z
dc.date.available.fl_str_mv	2018-08-01 2018-08-02T00:03:48Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.citation.fl_str_mv	AGUIAR, Camila Zacché de. Concept maps mning for text summarization. 2017. 149 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2017.
dc.identifier.uri.fl_str_mv	http://repositorio.ufes.br/handle/10/9846
identifier_str_mv	AGUIAR, Camila Zacché de. Concept maps mning for text summarization. 2017. 149 f. Dissertação (Mestrado em Informática) - Universidade Federal do Espírito Santo, Centro Tecnológico, Vitória, 2017.
url	http://repositorio.ufes.br/handle/10/9846
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	Text
dc.publisher.none.fl_str_mv	Universidade Federal do Espírito Santo Mestrado em Informática
dc.publisher.program.fl_str_mv	Programa de Pós-Graduação em Informática
dc.publisher.initials.fl_str_mv	UFES
dc.publisher.country.fl_str_mv	BR
dc.publisher.department.fl_str_mv	Centro Tecnológico
publisher.none.fl_str_mv	Universidade Federal do Espírito Santo Mestrado em Informática
dc.source.none.fl_str_mv	reponame:Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) instname:Universidade Federal do Espírito Santo (UFES) instacron:UFES
instname_str	Universidade Federal do Espírito Santo (UFES)
instacron_str	UFES
institution	UFES
reponame_str	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
collection	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes)
bitstream.url.fl_str_mv	http://repositorio.ufes.br/bitstreams/75caa235-9e3d-4250-ab40-7c9e29e6735a/download
bitstream.checksum.fl_str_mv	0c96c6b2cce9c15ea234627fad78ac9a
bitstream.checksumAlgorithm.fl_str_mv	MD5
repository.name.fl_str_mv	Repositório Institucional da Universidade Federal do Espírito Santo (riUfes) - Universidade Federal do Espírito Santo (UFES)
repository.mail.fl_str_mv
_version_	1813022560880689152

Concept maps mning for text summarization

Registros relacionados