On the automated learning of air pollution prediction models from data collected by mobile sensor networks

Detalhes bibliográficos
Autor(a) principal: Mariano, P.
Data de Publicação: 2021
Outros Autores: Almeida, S. M., Santana, P.
Tipo de documento: Artigo
Idioma: eng
Título da fonte: Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
Texto Completo: http://hdl.handle.net/10071/24971
Resumo: This paper addresses the problem of automated learning of air pollution predictive models that were trained using information gathered by a set of mobile low-cost sensors. Concretely, fast to compute machine learning methods (Decision Trees and Support Vector Machines) were used to build regression models that predict air pollution levels for a given location. The models were trained using the data collected by the OpenSense project, in particular, number of particulate matter, particle diameter, and lung deposited surface area (LDSA). We examined two different sets of attributes: one based on a geographical description of the location under analysis (e.g. distribution of households and roads), and another based on a time series of past air pollution observations in that location. Overall, we have found out that past measures lead to better pollution predictions. The best R2 score was 0.751 obtained with the model that predicts LDSA and was trained with the data set with time series attributes, while the worst R2 was 0.009 obtained with the geographical data set to predict number of particles. The performance of the best model is on par with similar air pollution systems. Moreover it can be used in a production system that requires frequent updates.
id RCAP_e8dd884f9416a780d0c6fabe2cd7704f
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/24971
network_acronym_str RCAP
network_name_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository_id_str 7160
spelling On the automated learning of air pollution prediction models from data collected by mobile sensor networksAir pollutionDecision treeLand-useMachine learningSupport vector machineTime-seriesThis paper addresses the problem of automated learning of air pollution predictive models that were trained using information gathered by a set of mobile low-cost sensors. Concretely, fast to compute machine learning methods (Decision Trees and Support Vector Machines) were used to build regression models that predict air pollution levels for a given location. The models were trained using the data collected by the OpenSense project, in particular, number of particulate matter, particle diameter, and lung deposited surface area (LDSA). We examined two different sets of attributes: one based on a geographical description of the location under analysis (e.g. distribution of households and roads), and another based on a time series of past air pollution observations in that location. Overall, we have found out that past measures lead to better pollution predictions. The best R2 score was 0.751 obtained with the model that predicts LDSA and was trained with the data set with time series attributes, while the worst R2 was 0.009 obtained with the geographical data set to predict number of particles. The performance of the best model is on par with similar air pollution systems. Moreover it can be used in a production system that requires frequent updates.Taylor and Francis2022-08-28T00:00:00Z2021-01-01T00:00:00Z20212022-04-01T16:12:01Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10071/24971eng1556-703610.1080/15567036.2021.1968076Mariano, P.Almeida, S. M.Santana, P.info:eu-repo/semantics/openAccessreponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãoinstacron:RCAAP2023-11-09T17:35:42Zoai:repositorio.iscte-iul.pt:10071/24971Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireopendoar:71602024-03-19T22:16:09.379321Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informaçãofalse
dc.title.none.fl_str_mv On the automated learning of air pollution prediction models from data collected by mobile sensor networks
title On the automated learning of air pollution prediction models from data collected by mobile sensor networks
spellingShingle On the automated learning of air pollution prediction models from data collected by mobile sensor networks
Mariano, P.
Air pollution
Decision tree
Land-use
Machine learning
Support vector machine
Time-series
title_short On the automated learning of air pollution prediction models from data collected by mobile sensor networks
title_full On the automated learning of air pollution prediction models from data collected by mobile sensor networks
title_fullStr On the automated learning of air pollution prediction models from data collected by mobile sensor networks
title_full_unstemmed On the automated learning of air pollution prediction models from data collected by mobile sensor networks
title_sort On the automated learning of air pollution prediction models from data collected by mobile sensor networks
author Mariano, P.
author_facet Mariano, P.
Almeida, S. M.
Santana, P.
author_role author
author2 Almeida, S. M.
Santana, P.
author2_role author
author
dc.contributor.author.fl_str_mv Mariano, P.
Almeida, S. M.
Santana, P.
dc.subject.por.fl_str_mv Air pollution
Decision tree
Land-use
Machine learning
Support vector machine
Time-series
topic Air pollution
Decision tree
Land-use
Machine learning
Support vector machine
Time-series
description This paper addresses the problem of automated learning of air pollution predictive models that were trained using information gathered by a set of mobile low-cost sensors. Concretely, fast to compute machine learning methods (Decision Trees and Support Vector Machines) were used to build regression models that predict air pollution levels for a given location. The models were trained using the data collected by the OpenSense project, in particular, number of particulate matter, particle diameter, and lung deposited surface area (LDSA). We examined two different sets of attributes: one based on a geographical description of the location under analysis (e.g. distribution of households and roads), and another based on a time series of past air pollution observations in that location. Overall, we have found out that past measures lead to better pollution predictions. The best R2 score was 0.751 obtained with the model that predicts LDSA and was trained with the data set with time series attributes, while the worst R2 was 0.009 obtained with the geographical data set to predict number of particles. The performance of the best model is on par with similar air pollution systems. Moreover it can be used in a production system that requires frequent updates.
publishDate 2021
dc.date.none.fl_str_mv 2021-01-01T00:00:00Z
2021
2022-08-28T00:00:00Z
2022-04-01T16:12:01Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/24971
url http://hdl.handle.net/10071/24971
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1556-7036
10.1080/15567036.2021.1968076
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Taylor and Francis
publisher.none.fl_str_mv Taylor and Francis
dc.source.none.fl_str_mv reponame:Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
instname:Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron:RCAAP
instname_str Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
instacron_str RCAAP
institution RCAAP
reponame_str Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
collection Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos)
repository.name.fl_str_mv Repositório Científico de Acesso Aberto de Portugal (Repositórios Cientìficos) - Agência para a Sociedade do Conhecimento (UMIC) - FCT - Sociedade da Informação
repository.mail.fl_str_mv
_version_ 1799134719955173376