Towards an automated classification of spreadsheets
Autor(a) principal: | |
---|---|
Data de Publicação: | 2016 |
Outros Autores: | , |
Idioma: | eng |
Título da fonte: | Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
Texto Completo: | http://hdl.handle.net/1822/70215 |
Resumo: | Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work. |
id |
RCAP_7ae8b8001fa5fd1b5411ee7e81cc4ef7 |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/70215 |
network_acronym_str |
RCAP |
network_name_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository_id_str |
7160 |
spelling |
Towards an automated classification of spreadsheetsSpreadsheetsData miningClassificationScience & TechnologyMany spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work.SpringerUniversidade do MinhoMendes, Jorge CunhaDo, Kha N.Saraiva, João2016-012016-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/1822/70215engMendes J., Do K.N., Saraiva J. (2016) Towards an Automated Classification of Spreadsheets. In: Milazzo P., Varró D., Wimmer M. (eds) Software Technologies: Applications and Foundations. STAF 2016. Lecture Notes in Computer Science, vol 9946. Springer, Cham. https://doi.org/10.1007/978-3-319-50230-4_26978-3-319-50229-80302-974310.1007/978-3-319-50230-4_26978-3-319-50230-4https://link.springer.com/chapter/10.1007/978-3-319-50230-4_26info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2024-05-11T04:41:02Zoai:repositorium.sdum.uminho.pt:1822/70215Portal AgregadorONGhttps://www.rcaap.pt/oai/openairemluisa.alvim@gmail.comopendoar:71602024-05-11T04:41:02Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse |
dc.title.none.fl_str_mv |
Towards an automated classification of spreadsheets |
title |
Towards an automated classification of spreadsheets |
spellingShingle |
Towards an automated classification of spreadsheets Mendes, Jorge Cunha Spreadsheets Data mining Classification Science & Technology |
title_short |
Towards an automated classification of spreadsheets |
title_full |
Towards an automated classification of spreadsheets |
title_fullStr |
Towards an automated classification of spreadsheets |
title_full_unstemmed |
Towards an automated classification of spreadsheets |
title_sort |
Towards an automated classification of spreadsheets |
author |
Mendes, Jorge Cunha |
author_facet |
Mendes, Jorge Cunha Do, Kha N. Saraiva, João |
author_role |
author |
author2 |
Do, Kha N. Saraiva, João |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Mendes, Jorge Cunha Do, Kha N. Saraiva, João |
dc.subject.por.fl_str_mv |
Spreadsheets Data mining Classification Science & Technology |
topic |
Spreadsheets Data mining Classification Science & Technology |
description |
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-01 2016-01-01T00:00:00Z |
dc.type.driver.fl_str_mv |
conference paper |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/70215 |
url |
http://hdl.handle.net/1822/70215 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Mendes J., Do K.N., Saraiva J. (2016) Towards an Automated Classification of Spreadsheets. In: Milazzo P., Varró D., Wimmer M. (eds) Software Technologies: Applications and Foundations. STAF 2016. Lecture Notes in Computer Science, vol 9946. Springer, Cham. https://doi.org/10.1007/978-3-319-50230-4_26 978-3-319-50229-8 0302-9743 10.1007/978-3-319-50230-4_26 978-3-319-50230-4 https://link.springer.com/chapter/10.1007/978-3-319-50230-4_26 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Springer |
publisher.none.fl_str_mv |
Springer |
dc.source.none.fl_str_mv |
reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP |
instname_str |
Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
collection |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) |
repository.name.fl_str_mv |
Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação |
repository.mail.fl_str_mv |
mluisa.alvim@gmail.com |
_version_ |
1817544385998356480 |