ML datasets as synthetic cognitive experience records

M. T. Andrade; H. Castro

ML datasets as synthetic cognitive experience records

Detalhes bibliográficos
Autor(a) principal:	M. T. Andrade
Data de Publicação:	2018
Outros Autores:	H. Castro
Tipo de documento:	Artigo
Idioma:	eng
Título da fonte:	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo:	https://hdl.handle.net/10216/125077
Resumo:	Machine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. (c) MIR Labs.

Metadados do item

id	RCAP_57c9d513016b5e2f8fccef8d2bdb57a6
oai_identifier_str	oai:repositorio-aberto.up.pt:10216/125077
network_acronym_str	RCAP
network_name_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str	7160
spelling	ML datasets as synthetic cognitive experience recordsMachine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. (c) MIR Labs.20182018-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/10216/125077engM. T. AndradeH. Castroinfo:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-29T13:37:27Zoai:repositorio-aberto.up.pt:10216/125077Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T23:44:08.271033Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv	ML datasets as synthetic cognitive experience records
title	ML datasets as synthetic cognitive experience records
spellingShingle	ML datasets as synthetic cognitive experience records M. T. Andrade
title_short	ML datasets as synthetic cognitive experience records
title_full	ML datasets as synthetic cognitive experience records
title_fullStr	ML datasets as synthetic cognitive experience records
title_full_unstemmed	ML datasets as synthetic cognitive experience records
title_sort	ML datasets as synthetic cognitive experience records
author	M. T. Andrade
author_facet	M. T. Andrade H. Castro
author_role	author
author2	H. Castro
author2_role	author
dc.contributor.author.fl_str_mv	M. T. Andrade H. Castro
description	Machine Learning (ML), presently the major research area within Artificial Intelligence, aims at developing tools that can learn, approximately on their own, from data. ML tools learn, through a training phase, to perform some association between some input data and some output evaluation of it. When the input data is audio or visual media (i.e. akin to sensory information) and the output corresponds to some interpretation of it, the process may be described as Synthetic Cognition (SC). Presently ML (or SC) research is heterogeneous, comprising a broad set of disconnected initiatives which develop no systematic efforts for cooperation or integration of their achievements, and no standards exist to facilitate that. The training datasets (base sensory data and targeted interpretation), which are very labour intensive to produce, are also built employing ad-hoc structures and (metadata) formats, have very narrow expressive objectives and thus enable no true interoperability or standardisation. Our work contributes to overcome this fragility by putting forward: a specification for a standard ML dataset repository, describing how it internally stores the different components of datasets, and how it interfaces with external services; and a tool for the comprehensive structuring of ML datasets, defining them as Synthetic Cognitive Experience (SCE) records, which interweave the base audio-visual sensory data with multilevel interpretative information. A standardised structure to express the different components of the datasets and their interrelations will promote re-usability, resulting on the availability of a very large pool of datasets for a myriad of application domains. Our work thus contributes to: the universal interpretability and reusability of ML datasets; greatly easing the acquisition and sharing of training and testing datasets within the ML research community; facilitating the comparison of results from different ML tools; accelerating the overall research process. (c) MIR Labs.
publishDate	2018
dc.date.none.fl_str_mv	2018 2018-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10216/125077
url	https://hdl.handle.net/10216/125077
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação instacron:RCAAP
instname_str	Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv	Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_	1799135757031440384

ML datasets as synthetic cognitive experience records

Registros relacionados