Information retrieval system using Multiwords Expressions (MWE) as descriptors
Autor(a) principal: | |
---|---|
Data de Publicação: | 2012 |
Outros Autores: | |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Journal of Information Systems and Technology Management (Online) |
Texto Completo: | https://www.revistas.usp.br/jistem/article/view/45596 |
Resumo: | This paper aims to propose an alternative method for retrieving documents using Multiwords Expressions (MWE) extracted from a document base to be used as descriptors in search of an Information Retrieval System (IRS). In this sense, unlike methods that consider the text as a set of words, bag of words, we propose a method that takes into account the characteristics of the physical structure of the document in the extraction process of MWE. From this set of terms comparing pre-processed using an exhaustive algorithmic technique proposed by the authors with the results obtained for thirteen different measures of association statistics generated by the software Ngram Statistics Package (NSP). To perform this experiment was set up with a corpus of documents in digital format. |
id |
USP-33_932f04095b9cedda2a27c8bc86bf1254 |
---|---|
oai_identifier_str |
oai:revistas.usp.br:article/45596 |
network_acronym_str |
USP-33 |
network_name_str |
Journal of Information Systems and Technology Management (Online) |
repository_id_str |
|
spelling |
Information retrieval system using Multiwords Expressions (MWE) as descriptorsExtraction of Expressions MultiwordsMeasures of Association StatisticsCompared SearchInformation Retrieval Systemthe Document StructureThis paper aims to propose an alternative method for retrieving documents using Multiwords Expressions (MWE) extracted from a document base to be used as descriptors in search of an Information Retrieval System (IRS). In this sense, unlike methods that consider the text as a set of words, bag of words, we propose a method that takes into account the characteristics of the physical structure of the document in the extraction process of MWE. From this set of terms comparing pre-processed using an exhaustive algorithmic technique proposed by the authors with the results obtained for thirteen different measures of association statistics generated by the software Ngram Statistics Package (NSP). To perform this experiment was set up with a corpus of documents in digital format.TECSI - FEA - Universidade de São Paulo. Faculdade de Economia, Administração, Contabilidade e Atuária2012-08-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://www.revistas.usp.br/jistem/article/view/4559610.4301/S1807-17752012000200002Journal of Information Systems and Technology Management; v. 9 n. 2 (2012); 213-234Journal of Information Systems and Technology Management; Vol. 9 No. 2 (2012); 213-234Journal of Information Systems and Technology Management; Vol. 9 Núm. 2 (2012); 213-2341807-1775reponame:Journal of Information Systems and Technology Management (Online)instname:Universidade de São Paulo (USP)instacron:USPenghttps://www.revistas.usp.br/jistem/article/view/45596/49195Copyright (c) 2018 JISTEM - Journal of Information Systems and Technology Management (Online)info:eu-repo/semantics/openAccessSilva, Edson Marchetti daSouza, Renato Rocha2014-05-18T13:32:34Zoai:revistas.usp.br:article/45596Revistahttp://www.scielo.br/scielo.php?script=sci_serial&pid=1807-1775&lng=pt&nrm=isoPUBhttps://old.scielo.br/oai/scielo-oai.php||jistem@usp.br1807-17751807-1775opendoar:2014-05-18T13:32:34Journal of Information Systems and Technology Management (Online) - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
title |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
spellingShingle |
Information retrieval system using Multiwords Expressions (MWE) as descriptors Silva, Edson Marchetti da Extraction of Expressions Multiwords Measures of Association Statistics Compared Search Information Retrieval System the Document Structure |
title_short |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
title_full |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
title_fullStr |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
title_full_unstemmed |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
title_sort |
Information retrieval system using Multiwords Expressions (MWE) as descriptors |
author |
Silva, Edson Marchetti da |
author_facet |
Silva, Edson Marchetti da Souza, Renato Rocha |
author_role |
author |
author2 |
Souza, Renato Rocha |
author2_role |
author |
dc.contributor.author.fl_str_mv |
Silva, Edson Marchetti da Souza, Renato Rocha |
dc.subject.por.fl_str_mv |
Extraction of Expressions Multiwords Measures of Association Statistics Compared Search Information Retrieval System the Document Structure |
topic |
Extraction of Expressions Multiwords Measures of Association Statistics Compared Search Information Retrieval System the Document Structure |
description |
This paper aims to propose an alternative method for retrieving documents using Multiwords Expressions (MWE) extracted from a document base to be used as descriptors in search of an Information Retrieval System (IRS). In this sense, unlike methods that consider the text as a set of words, bag of words, we propose a method that takes into account the characteristics of the physical structure of the document in the extraction process of MWE. From this set of terms comparing pre-processed using an exhaustive algorithmic technique proposed by the authors with the results obtained for thirteen different measures of association statistics generated by the software Ngram Statistics Package (NSP). To perform this experiment was set up with a corpus of documents in digital format. |
publishDate |
2012 |
dc.date.none.fl_str_mv |
2012-08-01 |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.revistas.usp.br/jistem/article/view/45596 10.4301/S1807-17752012000200002 |
url |
https://www.revistas.usp.br/jistem/article/view/45596 |
identifier_str_mv |
10.4301/S1807-17752012000200002 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://www.revistas.usp.br/jistem/article/view/45596/49195 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2018 JISTEM - Journal of Information Systems and Technology Management (Online) info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2018 JISTEM - Journal of Information Systems and Technology Management (Online) |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
TECSI - FEA - Universidade de São Paulo. Faculdade de Economia, Administração, Contabilidade e Atuária |
publisher.none.fl_str_mv |
TECSI - FEA - Universidade de São Paulo. Faculdade de Economia, Administração, Contabilidade e Atuária |
dc.source.none.fl_str_mv |
Journal of Information Systems and Technology Management; v. 9 n. 2 (2012); 213-234 Journal of Information Systems and Technology Management; Vol. 9 No. 2 (2012); 213-234 Journal of Information Systems and Technology Management; Vol. 9 Núm. 2 (2012); 213-234 1807-1775 reponame:Journal of Information Systems and Technology Management (Online) instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Journal of Information Systems and Technology Management (Online) |
collection |
Journal of Information Systems and Technology Management (Online) |
repository.name.fl_str_mv |
Journal of Information Systems and Technology Management (Online) - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
||jistem@usp.br |
_version_ |
1800222952454619136 |