Ambiente para geração e manutenção semiautomática de tesauros

Manoel Palhares Moreira

Ambiente para geração e manutenção semiautomática de tesauros

Detalhes bibliográficos
Autor(a) principal:	Manoel Palhares Moreira
Data de Publicação:	2005
Tipo de documento:	Tese
Idioma:	por
Título da fonte:	Repositório Institucional da UFMG
Texto Completo:	http://hdl.handle.net/1843/VALA-6KHJPX
Resumo:	Thesauri are among the diverse means of representing information as used by Information Retrieval Systems, which is considered to be a consolidated indexing language employed by professionals that carry out activities of organizing information. The flexibility for establishing new relations between terms, the hierarchies and the crossed references give that instrument a diversity of usage, reaching processes that range from indexing to effective recovery of documents. The production and maintenance of a thesaurus are intellectual activities with specific procedures to be followed. Among them are knowledge about documents produced in the subject area, comprehension of the used terms and the construction of concepts to explain those terms. It isexpected from the professionals in this field a flexible attitude to assimilate the changes and innovations which may be found in the indexing area, in the language itself, and in the usage of terms. This study aimed at the construction of a methodology and an environment to the generation and maintenance of a thesaurus within a semi-automatic way, through the use of natural language based on the technology of Computer Science and the theoretical scope of Information Science. Moreover, through the ordering concepts of literary, usage and structural guarantee, they incorporate the proposal of the guarantee that comes from the structure of the text itself. The environment made it possible toverify the present thesaurus as well as its representative potential. Thehypothesis was that key words from scientific articles could be applied in this process, since they represent both the literary and usage guaranties, for they are a privileged instrument in disseminating the scientific knowledge. Statistic calculi were made involving frequency and score standardised in the observation of the frequency of key words in titles, summaries, texts and articles of the reference list. Scientific texts of the electronic periodicals 'DataGramaZero and Ciência da Informação were used. The absence of an updated thesaurus in the area led to the elaboration of a Thesaurus in Information Science (TIS), from the existing thesaurus in Portuguese (IBICT), in the English language (ASIS) and in Spanish (CINDOC and DOCUTES). The environment pointed out the need of updating terms classified by degrees of relevance, considering the progress of the area.

Metadados do item

id	UFMG_d37eed9cc250fc66cd457ce70ea6ec94
oai_identifier_str	oai:repositorio.ufmg.br:1843/VALA-6KHJPX
network_acronym_str	UFMG
network_name_str	Repositório Institucional da UFMG
repository_id_str
spelling	Ambiente para geração e manutenção semiautomática de tesaurosTesauroMineração de palavrasCiência da informaçãoOrganização da informaçãoLinguagem de indexaçãoTesaurosIndexação automaticaSistemas de recuperação da informação TecnologiaCiência da informaçãoTecnologia da informaçãoThesauri are among the diverse means of representing information as used by Information Retrieval Systems, which is considered to be a consolidated indexing language employed by professionals that carry out activities of organizing information. The flexibility for establishing new relations between terms, the hierarchies and the crossed references give that instrument a diversity of usage, reaching processes that range from indexing to effective recovery of documents. The production and maintenance of a thesaurus are intellectual activities with specific procedures to be followed. Among them are knowledge about documents produced in the subject area, comprehension of the used terms and the construction of concepts to explain those terms. It isexpected from the professionals in this field a flexible attitude to assimilate the changes and innovations which may be found in the indexing area, in the language itself, and in the usage of terms. This study aimed at the construction of a methodology and an environment to the generation and maintenance of a thesaurus within a semi-automatic way, through the use of natural language based on the technology of Computer Science and the theoretical scope of Information Science. Moreover, through the ordering concepts of literary, usage and structural guarantee, they incorporate the proposal of the guarantee that comes from the structure of the text itself. The environment made it possible toverify the present thesaurus as well as its representative potential. Thehypothesis was that key words from scientific articles could be applied in this process, since they represent both the literary and usage guaranties, for they are a privileged instrument in disseminating the scientific knowledge. Statistic calculi were made involving frequency and score standardised in the observation of the frequency of key words in titles, summaries, texts and articles of the reference list. Scientific texts of the electronic periodicals 'DataGramaZero and Ciência da Informação were used. The absence of an updated thesaurus in the area led to the elaboration of a Thesaurus in Information Science (TIS), from the existing thesaurus in Portuguese (IBICT), in the English language (ASIS) and in Spanish (CINDOC and DOCUTES). The environment pointed out the need of updating terms classified by degrees of relevance, considering the progress of the area.Entre as diversas formas de representação da informação utilizadas por Sistemas de Recuperação de Informação encontram-se os tesauros, que se constituem em uma linguagem de indexação consolidada e empregada por profissionais que exercem atividades de organização da informação. A flexibilidade para o estabelecimento de novas relações entre termos, as hierarquias e as referências cruzadas conferem ao instrumento uma multiplicidade de usos, abrangendo processos que vão desde a indexação até a efetiva recuperação dos documentos. A elaboração e manutenção de tesauros são atividades intelectuais com procedimentos específicos, entre eles o conhecimento de documentos produzidos na área, o entendimento dos termos empregados e a construção de conceitos para explicação dessestermos. Do profissional envolvido espera-se uma atitude flexível paraincorporar as mudanças e inovações que surgem na área, na próprialinguagem e no emprego de termos. Este trabalho objetivou a construção de uma metodologia e um ambiente para a geração e manutenção de tesauros de forma semi-automatizada, através da utilização da linguagem natural e com base em tecnologias da Ciência da Computação e nos fundamentos teóricos da Ciência da Informação; mais especificamente, através dos conceitos ordenadores da garantia literária, da garantia de uso e da garantia estrutural, incorporando-se a essas a proposta da garantia advinda da própria estrutura do texto. O ambiente possibilitou a verificação da atualidade e do potencial representativo do tesauro. Partiu-se da hipótese de que palavras-chavesrecolhidas de artigos científicos poderiam ser aplicadas neste processo já que elas representam duplamente a garantia literária e a de uso por se tratar de um instrumento privilegiado de disseminação do conhecimento científico. Foram feitos cálculos estatísticos envolvendo freqüência e escore padronizado nas observações de freqüência de palavras-chave no título, no resumo, no texto e na bibliografia dos artigos. Para testes, foram utilizados textos científicos dos periódicos eletrônicos Datagrama Zero e Ciência da Informação, já consolidados na área. A inexistência de um tesauro atualizado na área levou a construção de um Tesauro em Ciência da Informação (TCI), a partir detesauros existentes em português (IBICT), em inglês (ASIS) e em espanhol (CINDOC e DOCUTES). O ambiente apontou a necessidade de atualização de termos, classificados por grau de relevância, levando em conta a evolução da área.Universidade Federal de Minas GeraisUFMGMaria Aparecida MouraBeatriz Valadares CendonEduardo Jose Wense DiasGercina Angela Borem de Oliveira LimaRenato Rocha SouzaLigia Maria Arruda CaféManoel Palhares Moreira2019-08-13T20:08:40Z2019-08-13T20:08:40Z2005-12-20info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfhttp://hdl.handle.net/1843/VALA-6KHJPXinfo:eu-repo/semantics/openAccessporreponame:Repositório Institucional da UFMGinstname:Universidade Federal de Minas Gerais (UFMG)instacron:UFMG2019-11-14T18:28:30Zoai:repositorio.ufmg.br:1843/VALA-6KHJPXRepositório InstitucionalPUBhttps://repositorio.ufmg.br/oairepositorio@ufmg.bropendoar:2019-11-14T18:28:30Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)false
dc.title.none.fl_str_mv	Ambiente para geração e manutenção semiautomática de tesauros
title	Ambiente para geração e manutenção semiautomática de tesauros
spellingShingle	Ambiente para geração e manutenção semiautomática de tesauros Manoel Palhares Moreira Tesauro Mineração de palavras Ciência da informação Organização da informação Linguagem de indexação Tesauros Indexação automatica Sistemas de recuperação da informação Tecnologia Ciência da informação Tecnologia da informação
title_short	Ambiente para geração e manutenção semiautomática de tesauros
title_full	Ambiente para geração e manutenção semiautomática de tesauros
title_fullStr	Ambiente para geração e manutenção semiautomática de tesauros
title_full_unstemmed	Ambiente para geração e manutenção semiautomática de tesauros
title_sort	Ambiente para geração e manutenção semiautomática de tesauros
author	Manoel Palhares Moreira
author_facet	Manoel Palhares Moreira
author_role	author
dc.contributor.none.fl_str_mv	Maria Aparecida Moura Beatriz Valadares Cendon Eduardo Jose Wense Dias Gercina Angela Borem de Oliveira Lima Renato Rocha Souza Ligia Maria Arruda Café
dc.contributor.author.fl_str_mv	Manoel Palhares Moreira
dc.subject.por.fl_str_mv	Tesauro Mineração de palavras Ciência da informação Organização da informação Linguagem de indexação Tesauros Indexação automatica Sistemas de recuperação da informação Tecnologia Ciência da informação Tecnologia da informação
topic	Tesauro Mineração de palavras Ciência da informação Organização da informação Linguagem de indexação Tesauros Indexação automatica Sistemas de recuperação da informação Tecnologia Ciência da informação Tecnologia da informação
description	Thesauri are among the diverse means of representing information as used by Information Retrieval Systems, which is considered to be a consolidated indexing language employed by professionals that carry out activities of organizing information. The flexibility for establishing new relations between terms, the hierarchies and the crossed references give that instrument a diversity of usage, reaching processes that range from indexing to effective recovery of documents. The production and maintenance of a thesaurus are intellectual activities with specific procedures to be followed. Among them are knowledge about documents produced in the subject area, comprehension of the used terms and the construction of concepts to explain those terms. It isexpected from the professionals in this field a flexible attitude to assimilate the changes and innovations which may be found in the indexing area, in the language itself, and in the usage of terms. This study aimed at the construction of a methodology and an environment to the generation and maintenance of a thesaurus within a semi-automatic way, through the use of natural language based on the technology of Computer Science and the theoretical scope of Information Science. Moreover, through the ordering concepts of literary, usage and structural guarantee, they incorporate the proposal of the guarantee that comes from the structure of the text itself. The environment made it possible toverify the present thesaurus as well as its representative potential. Thehypothesis was that key words from scientific articles could be applied in this process, since they represent both the literary and usage guaranties, for they are a privileged instrument in disseminating the scientific knowledge. Statistic calculi were made involving frequency and score standardised in the observation of the frequency of key words in titles, summaries, texts and articles of the reference list. Scientific texts of the electronic periodicals 'DataGramaZero and Ciência da Informação were used. The absence of an updated thesaurus in the area led to the elaboration of a Thesaurus in Information Science (TIS), from the existing thesaurus in Portuguese (IBICT), in the English language (ASIS) and in Spanish (CINDOC and DOCUTES). The environment pointed out the need of updating terms classified by degrees of relevance, considering the progress of the area.
publishDate	2005
dc.date.none.fl_str_mv	2005-12-20 2019-08-13T20:08:40Z 2019-08-13T20:08:40Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/doctoralThesis
format	doctoralThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/1843/VALA-6KHJPX
url	http://hdl.handle.net/1843/VALA-6KHJPX
dc.language.iso.fl_str_mv	por
language	por
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidade Federal de Minas Gerais UFMG
publisher.none.fl_str_mv	Universidade Federal de Minas Gerais UFMG
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFMG instname:Universidade Federal de Minas Gerais (UFMG) instacron:UFMG
instname_str	Universidade Federal de Minas Gerais (UFMG)
instacron_str	UFMG
institution	UFMG
reponame_str	Repositório Institucional da UFMG
collection	Repositório Institucional da UFMG
repository.name.fl_str_mv	Repositório Institucional da UFMG - Universidade Federal de Minas Gerais (UFMG)
repository.mail.fl_str_mv	repositorio@ufmg.br
_version_	1816829849944195072

Ambiente para geração e manutenção semiautomática de tesauros

Registros relacionados