ML datasets as synthetic cognitive experience records

Detalhes bibliográficos
Autor(a) principal: M. T. Andrade
Data de Publicação: 2018
Outros Autores: H. Castro
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: https://hdl.handle.net/10216/125077
Resumo: Machine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. (c) MIR Labs.
id RCAP_57c9d513016b5e2f8fccef8d2bdb57a6
oai_identifier_str oai:repositorio-aberto.up.pt:10216/125077
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling ML datasets as synthetic cognitive experience recordsMachine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. (c) MIR Labs.20182018-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/10216/125077engM. T. AndradeH. Castroinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:37:27Zoai:repositorio-aberto.up.pt:10216/125077Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:44:08.271033Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv ML datasets as synthetic cognitive experience records
title ML datasets as synthetic cognitive experience records
spellingShingle ML datasets as synthetic cognitive experience records
M. T. Andrade
title_short ML datasets as synthetic cognitive experience records
title_full ML datasets as synthetic cognitive experience records
title_fullStr ML datasets as synthetic cognitive experience records
title_full_unstemmed ML datasets as synthetic cognitive experience records
title_sort ML datasets as synthetic cognitive experience records
author M. T. Andrade
author_facet M. T. Andrade
H. Castro
author_role author
author2 H. Castro
author2_role author
dc.contributor.author.fl_str_mv M. T. Andrade
H. Castro
description Machine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. (c) MIR Labs.
publishDate 2018
dc.date.none.fl_str_mv 2018
2018-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10216/125077
url https://hdl.handle.net/10216/125077
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799135757031440384