Towards an automated classification of spreadsheets

Detalhes bibliográficos
Autor(a) principal: Mendes, Jorge Cunha
Data de Publicação: 2016
Outros Autores: Do, Kha N., Saraiva, João
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/1822/70215
Resumo: Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
id RCAP_7ae8b8001fa5fd1b5411ee7e81cc4ef7
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/70215
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling Towards an automated classification of spreadsheetsSpreadsheetsData miningClassificationScience & TechnologyMany spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.SpringerUniversidade do MinhoMendes, Jorge CunhaDo, Kha N.Saraiva, João2016-012016-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/1822/70215engMendes J., Do K.N., Saraiva J. (2016) Towards an Automated Classification of Spreadsheets. In: Milazzo P., Varró D., Wimmer M. (eds) Software Technologies: Applications and Foundations. STAF 2016. Lecture Notes in Computer Science, vol 9946. Springer, Cham. https://doi.org/10.1007/978-3-319-50230-4_26978-3-319-50229-80302-974310.1007/978-3-319-50230-4_26978-3-319-50230-4https://link.springer.com/chapter/10.1007/978-3-319-50230-4_26info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T04:41:02Zoai:repositorium.sdum.uminho.pt:1822/70215Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T04:41:02Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv Towards an automated classification of spreadsheets
title Towards an automated classification of spreadsheets
spellingShingle Towards an automated classification of spreadsheets
Mendes, Jorge Cunha
Spreadsheets
Data mining
Classification
Science & Technology
title_short Towards an automated classification of spreadsheets
title_full Towards an automated classification of spreadsheets
title_fullStr Towards an automated classification of spreadsheets
title_full_unstemmed Towards an automated classification of spreadsheets
title_sort Towards an automated classification of spreadsheets
author Mendes, Jorge Cunha
author_facet Mendes, Jorge Cunha
Do, Kha N.
Saraiva, João
author_role author
author2 Do, Kha N.
Saraiva, João
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Mendes, Jorge Cunha
Do, Kha N.
Saraiva, João
dc.subject.por.fl_str_mv Spreadsheets
Data mining
Classification
Science & Technology
topic Spreadsheets
Data mining
Classification
Science & Technology
description Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.
publishDate 2016
dc.date.none.fl_str_mv 2016-01
2016-01-01T00:00:00Z
dc.type.driver.fl_str_mv conference paper
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/70215
url http://hdl.handle.net/1822/70215
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Mendes J., Do K.N., Saraiva J. (2016) Towards an Automated Classification of Spreadsheets. In: Milazzo P., Varró D., Wimmer M. (eds) Software Technologies: Applications and Foundations. STAF 2016. Lecture Notes in Computer Science, vol 9946. Springer, Cham. https://doi.org/10.1007/978-3-319-50230-4_26
978-3-319-50229-8
0302-9743
10.1007/978-3-319-50230-4_26
978-3-319-50230-4
https://link.springer.com/chapter/10.1007/978-3-319-50230-4_26
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv mluisa.alvim@gmail.com
_version_ 1817544385998356480