Data collection for aggregation of digital repositories: Entities linked to the Special Secretariat of Culture of Brazil
Keywords:Data science, Information science, Digital repositories, Cultural institutions, Reuse
This article aims to clarify the use of data science techniques in information science by reporting data collection from the digital repositories of entities linked to the Special Secretariat of Culture of Brazil. This process is part of a research project promoted by the Foundation of Research Support of the State of São Paulo and carried out at the University of Brasília by the Network Intelligence Laboratory. The quantitative descriptive methodology applied alludes to extracting, transforming, and data load used in the constitution of data warehouses, in which Python scripts were developed to collect the data. The results indicate how many scripts were needed and how many storage files were generated, in addition to the description of the data collected, denoting a greater use of the web scraping technique, making the collection process more difficult. Thus, the article points out how the current reality of the analyzed Brazilian cultural institutions is far from enabling the aggregation of their digital repositories. At the same time, it points out how data science strategies allow information science professionals to overcome existing technical barriers and promote data analysis and reuse.
Bräscher, M., & Monteiro, F. de S. (2010). Organização da informação em repositórios digitais. Encontros Bibli: revista eletrônica de biblioteconomia e ciência da informação, 15(29).
Ferreira, J., Miranda, M., Abelha, A., & Machado, J. (2010, September). O processo etl em sistemas data warehouse. In INForum (pp. 757-765).
Lagoze, C., Van de Sompel, H., Nelson, M., & Warner, S. (2002). Open archives initiative-protocol for metadata harvesting-v. 2.0. Recuperado de http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm
Pautasso, C., Zimmermann, O., & Leymann, F. (2008). Restful web services vs. “Big” web services: making the right architectural decision. In Proceedings of the 17th international conference on World Wide Web, 805-814. doi: https://doi.org/10.1145/1367497.1367606
Virkus, S., & Garoufallou, E. (2019). Data science from a library and information science perspective. Data Technologies and Applications, 422-441. doi: https://doi.org/10.1108/DTA-05-2019-0076
Virkus, S., & Garoufallou, E. (2020). Data science and its relationship to library and information science: a content analysis. Data Technologies and Applications, 643-663. doi: https://doi.org/10.1108/DTA-07-2020-0167
Zhao, B. (2017). Web scraping. Encyclopedia of big data, 1-3. Recuperado de https://www.researchgate.net/profile/Bo-Zhao-3/publication/317177787_Web_Scraping/links/5c293f85a6fdccfc7073192f/Web-Scraping.pdf
How to Cite
Copyright (c) 2022 Luis Felipe Rosa de Oliveira, Dalton Lopes Martins
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) which permits copying and redistributing the material in any medium or format, adapting, transforming and building upon the material as long as the license terms are followed.