Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine

Dos Santos, Marcio Carneiro

Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine

Detalhes bibliográficos
Autor(a) principal:	Dos Santos, Marcio Carneiro
Data de Publicação:	2015
Tipo de documento:	Artigo
Idioma:	por
Título da fonte:	Revista Observatório
Texto Completo:	https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549
Resumo:	We explore the possibility of automation of data collection from web pages, using the application of customized code built in Python programming language, with specific HTML syntax (Hypertext Markup Language) to locate and extract elements of interest as links, text and images. The automated data collection, also known as scraping is an increasingly common feature in journalism. From the access to the digital repository site www.web.archive.org, also known as WayBackMachine, we develop a proof of concept of an algorithm able to recover, list and offer basic tools of analysis of data collected from the various versions of newspaper portals in time series.

Metadados do item

id	UFT-7_c367b1fa7d96804340431ef4832a8806
oai_identifier_str	oai:ojs.revista.uft.edu.br:article/1549
network_acronym_str	UFT-7
network_name_str	Revista Observatório
repository_id_str
spelling	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack MachineMétodos digitales y memória visitada por APIs:Herramienta de desarrollo para extraer datos de los portales periodísticos por la Wayback MachineMétodos digitais e a memória acessada por APIs: Desenvolvimento de ferramenta para extração de dados de portais jornalísticos a partir da WayBack MachineScrapingPythonDigital JournalismHTMLMemoryScrapingPythonDigital Journalism.HTMLMemoryRaspar datosPythonPeriodismo digitalHTMLMemoriaRaspar datos.PythonPeriodismo digital.HTMLMemoriaRaspagem de dadosPythonJornalismo DigitalHTMLMemóriaRaspagem de dadosPythonJornalismo DigitalHTMLMemóriaWe explore the possibility of automation of data collection from web pages, using the application of customized code built in Python programming language, with specific HTML syntax (Hypertext Markup Language) to locate and extract elements of interest as links, text and images. The automated data collection, also known as scraping is an increasingly common feature in journalism. From the access to the digital repository site www.web.archive.org, also known as WayBackMachine, we develop a proof of concept of an algorithm able to recover, list and offer basic tools of analysis of data collected from the various versions of newspaper portals in time series.Se explora la posibilidad de automatización de los sitios de recolección de datos, desde el código de aplicación construida en lenguaje de programación Python, utilizando la sintaxis específica de HTML (Hypertext Markup Language) para localizar y extraer elementos de interés, tales como enlaces, texto e imágenes. La colección de datos automatizada, también conocido como el raspado es una característica cada vez más común en el periodismo. Desde el acceso a la www.web.archive.org, sitio de repositorio digital, también conocida como WayBackMachine, desarrollamos una prueba de concepto de un algoritmo para recuperar, listar y ofrecer herramientas básicas de análisis de los datos recogidos de las diferentes versiones de portales de periódicos en el tiempo.Explora-se a possibilidade de automação da coleta de dados em sites, a partir da aplicação de código construído em linguagem de programação Python, utilizando a sintaxe específica do HTML (HiperText Markup Language) para localizar e extrair elementos de interesse como links, texto e imagens. A coleta automatizada de dados, também conhecida como raspagem (scraping) é um recurso cada vez mais comum no jornalismo. A partir do acesso ao repositório digital do site www.web.archive.org, também conhecido como WayBackMachine, desenvolvemos a prova de conceito de um algoritmo capaz de recuperar, listar e oferecer ferramentas básicas de análise sobre dados coletados a partir das diversas versões de portais jornalísticos ao longo do tempo.Universidade Federal do Tocantins - UFT2015-12-08info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionTextoapplication/pdfapplication/epub+zipapplication/octet-streamapplication/zipapplication/x-gzipapplication/ziphttps://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/154910.20873/uft.2447-4266.2015v1n2p23Observatory Journal; Vol. 1 No. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41Observatorio Magazine; Vol. 1 Núm. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41Observatoire Journal; Vol. 1 No. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41Revista Observatório ; v. 1 n. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-412447-426610.20873/uft.2447-4266.2015v1n2reponame:Revista Observatórioinstname:Universidade Federal do Tocantins (UFT)instacron:UFTporhttps://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/8496https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10684https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10685https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10686https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10687https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10688Século XX/XXICopyright (c) 2015 Revista Observatorioinfo:eu-repo/semantics/openAccessDos Santos, Marcio Carneiro2022-03-04T13:28:15Zoai:ojs.revista.uft.edu.br:article/1549Revistahttps://sistemas.uft.edu.br/periodicos/index.php/observatorio/oai2447-42662447-4266opendoar:2022-03-04T13:28:15Revista Observatório - Universidade Federal do Tocantins (UFT)false
dc.title.none.fl_str_mv	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine Métodos digitales y memória visitada por APIs:Herramienta de desarrollo para extraer datos de los portales periodísticos por la Wayback Machine Métodos digitais e a memória acessada por APIs: Desenvolvimento de ferramenta para extração de dados de portais jornalísticos a partir da WayBack Machine
title	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine
spellingShingle	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine Dos Santos, Marcio Carneiro Scraping Python Digital Journalism HTML Memory Scraping Python Digital Journalism. HTML Memory Raspar datos Python Periodismo digital HTML Memoria Raspar datos. Python Periodismo digital. HTML Memoria Raspagem de dados Python Jornalismo Digital HTML Memória Raspagem de dados Python Jornalismo Digital HTML Memória
title_short	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine
title_full	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine
title_fullStr	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine
title_full_unstemmed	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine
title_sort	Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine
author	Dos Santos, Marcio Carneiro
author_facet	Dos Santos, Marcio Carneiro
author_role	author
dc.contributor.author.fl_str_mv	Dos Santos, Marcio Carneiro
dc.subject.por.fl_str_mv	Scraping Python Digital Journalism HTML Memory Scraping Python Digital Journalism. HTML Memory Raspar datos Python Periodismo digital HTML Memoria Raspar datos. Python Periodismo digital. HTML Memoria Raspagem de dados Python Jornalismo Digital HTML Memória Raspagem de dados Python Jornalismo Digital HTML Memória
topic	Scraping Python Digital Journalism HTML Memory Scraping Python Digital Journalism. HTML Memory Raspar datos Python Periodismo digital HTML Memoria Raspar datos. Python Periodismo digital. HTML Memoria Raspagem de dados Python Jornalismo Digital HTML Memória Raspagem de dados Python Jornalismo Digital HTML Memória
description	We explore the possibility of automation of data collection from web pages, using the application of customized code built in Python programming language, with specific HTML syntax (Hypertext Markup Language) to locate and extract elements of interest as links, text and images. The automated data collection, also known as scraping is an increasingly common feature in journalism. From the access to the digital repository site www.web.archive.org, also known as WayBackMachine, we develop a proof of concept of an algorithm able to recover, list and offer basic tools of analysis of data collected from the various versions of newspaper portals in time series.
publishDate	2015
dc.date.none.fl_str_mv	2015-12-08
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Texto
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549 10.20873/uft.2447-4266.2015v1n2p23
url	https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549
identifier_str_mv	10.20873/uft.2447-4266.2015v1n2p23
dc.language.iso.fl_str_mv	por
language	por
dc.relation.none.fl_str_mv	https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/8496 https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10684 https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10685 https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10686 https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10687 https://sistemas.uft.edu.br/periodicos/index.php/observatorio/article/view/1549/10688
dc.rights.driver.fl_str_mv	Copyright (c) 2015 Revista Observatorio info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Copyright (c) 2015 Revista Observatorio
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf application/epub+zip application/octet-stream application/zip application/x-gzip application/zip
dc.coverage.none.fl_str_mv	Século XX/XXI
dc.publisher.none.fl_str_mv	Universidade Federal do Tocantins - UFT
publisher.none.fl_str_mv	Universidade Federal do Tocantins - UFT
dc.source.none.fl_str_mv	Observatory Journal; Vol. 1 No. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41 Observatorio Magazine; Vol. 1 Núm. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41 Observatoire Journal; Vol. 1 No. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41 Revista Observatório ; v. 1 n. 2 (2015): Vol. 1 N. 2 (2015) Tema Livre / Free Theme / Tema Libre Maio-Agosto 2015; 23-41 2447-4266 10.20873/uft.2447-4266.2015v1n2 reponame:Revista Observatório instname:Universidade Federal do Tocantins (UFT) instacron:UFT
instname_str	Universidade Federal do Tocantins (UFT)
instacron_str	UFT
institution	UFT
reponame_str	Revista Observatório
collection	Revista Observatório
repository.name.fl_str_mv	Revista Observatório - Universidade Federal do Tocantins (UFT)
repository.mail.fl_str_mv
_version_	1798313242781548544

Digital methods and the memory accessed by APIs: Development tool for extracting data from journalistic portais with the WayBack Machine

Registros relacionados