Dataset extraction from advertising marketplaces: experiences with Facebook, Olx, and Mercadolivre


  • Eduardo Diniz Universidade Estadual de Montes Claros. Universidade Federal de Santa Catarina, Brasil
  • Gustavo Medeiros de Araújo Universidade Federal de Santa Catarina, Brasil



datasets, python, ecommerce, marketplaces, advertisings


Considering the increasing amount of information produced in digital marketplaces, the growing adhesion to these types of services by internet users in Brazil and the world, and the lack of work related to this topic, this research aims to experiment with dataset extraction ads. After analyzing the main e-commerce spaces in Brazil, the chosen marketplaces were: Mercado Livre, Facebook, and OLX. Python's programming language uses the following libraries: scrappy, beautifulsoup, and Selenium Webdriver. After analyzing the web structure of the ad results pages, scripts were created to extract the main variables of the ad within a common category among the marketplaces. The results show that scrapers can remove datasets from advertisements on these platforms in different formats. Such information has potential for exploration in various segments of data science


Download data is not yet available.


Castillo, B. A. V. (2020). Desarrollo de sistema de análisis de empleabilidad en portales web de empleos. Escuela Politécnica Nacional: ECUADOR. Disponível em

CETIC.BR. (2019). Pesquisa sobre o uso das Tecnologias de Informação e Comunicação nos domicílios brasileiros - TIC Domicílios 2019. Disponível em:

CRUMMY. (2020). Beautiful Soup Documentation for Python. Disponível em:

ECOMMERCEBRASIL. (2021). E-commerce brasileiro cresce 73,88% em 2020, revela índice MCC-ENET. 2021. Disponível em

Fiesler, C., Beard, N., & Keegan, B. C. (2020). No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service. Proceedings of the International AAAI Conference on Web and Social Media, 14(1), 187-196. Retrieved from

Gerhardt, T. E., & Silveira, D. T. (2009). Métodos de pesquisa. Plageder.

Fathalla, A., Salah, A., Li, K., Li, K., & Francesco, P. (2020). Deep end-to-end learning for price prediction of second-hand items. Knowledge and Information Systems, 62(12), 4541-4568.

OLX. (2021). Institucional: Quem somos. 2021. Disponível em:

Pandey, A. Car’s Selling Price Prediction using Random Forest Machine Learning Algorithm. Março de 2020. 5th International Conference on Next Generation Computing Technologies (NGCT-2019).

SCRAPY. (2021). An open source and collaborative framework for extracting the data you need from websites. Disponível em:

Thivaharan, S., Srivatsun, G., & Sarathambekai, S. (2020, September). A survey on python libraries used for social media content scraping. In 2020 International Conference on Smart Electronics and Communication (ICOSEC) (pp. 361-366).

TRENDS. (2021). Google Trends. Pesquisas relacionadas a marketplaces OLX, Facebook e Mercado Livre.

Wijaya, D. R., Paramita, N. L. P. S. P., Uluwiyah, A., Rheza, M., Zahara, A., & Puspita, D. R. (2020). Estimating city-level poverty rate based on e-commerce data with machine learning. Electronic Commerce Research, 1-27.

Xu, Q., Cai, M., & Mackey, T. K. (2020). The illegal wildlife digital market: an analysis of Chinese wildlife marketing and sale on Facebook. Environmental Conservation, 47(3), 206-212.

Zaheer, M. S. Random Forest Regression on OLX’s Dataset. 2018. Disponível em:



How to Cite

Diniz, E., & Medeiros de Araújo, G. (2022). Dataset extraction from advertising marketplaces: experiences with Facebook, Olx, and Mercadolivre. Advanced Notes in Information Science, 2, 63-73.