PulseAugur
EN
LIVE 08:45:37

Developer releases free dataset of 2M+ job postings from company sites

A developer has created and released a free dataset containing over 2 million job postings scraped directly from more than 100,000 company career sites. This dataset is updated daily and aims to provide a cleaner, more current view of the job market than aggregated listings from single job boards. The data is available in Parquet format and includes core fields such as job title, company name, and location. AI

IMPACT Provides a large, clean dataset for analyzing AI and tech job market trends.

RANK_REASON The cluster describes the creation and release of a novel dataset for research purposes. [lever_c_demoted from research: ic=1 ai=0.7]

Read on r/MachineLearning →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/MachineLearning TIER_1 English(EN) · /u/Invicto_50 ·

    I scraped over 2 million job postings across 100,000+ company career sites into a unified, daily-updated dataset. [P]

    <!-- SC_OFF --><div class="md"><p>Over the past few months, I've been working on a high-scale scraping pipeline to aggregate listings directly from company job boards and applicant tracking systems. Mapping over 100,000 distinct companies to their career pages turned out to be a …