Methodology

Last updated: December 2024

📁 Archive Notice: This tracker will be archived on December 31, 2025. Data collection will stop on that date and no further updates will be made.

Overview

This tracker collects federal job listing data from the official USAJobs APIs maintained by the U.S. Office of Personnel Management. Data is collected daily and stored in a public repository for transparency and reproducibility.

Repository: github.com/abigailhaddad/usajobs_historical

Data Sources

We collect from two complementary APIs:

Historical Jobs API

Endpoint: https://data.usajobs.gov/api/historicjoa
Documentation: developer.usajobs.gov/API-Reference/GET-api-historicjoa
Authentication: None required
What it returns: Job postings from the past (open or closed)
Our approach: Query one day at a time using StartPositionOpenDate and EndPositionOpenDate set to the same date
Code: scripts/collect_data.py

Current Jobs API

Endpoint: https://data.usajobs.gov/api/Search
Documentation: developer.usajobs.gov/API-Reference/GET-api-Search
Authentication: Requires API key (free from USAJobs)
What it returns: Jobs currently open for applications
Our approach: Query each occupational series separately (there are ~350) to work around the 10,000 result limit per query
Code: scripts/collect_current_data.py

Collection Schedule

Data is collected daily via GitHub Actions. Each run:

  1. Queries the historical API for the current day plus the previous 2 days (3-day overlap for redundancy)
  2. Queries the current API to capture all currently-open positions

The 3-day overlap ensures that if a job is missed on one day, it will be caught on subsequent runs. Combined with the current API as a backup, each job has multiple opportunities to be captured.

Code: update/update_all.py

Deduplication

Each job posting has a unique identifier (usajobsControlNumber). When we collect data:

Because both APIs capture the same jobs at different points in their lifecycle, there's built-in redundancy: if we miss a job from one API on a given day, we'll likely catch it from the other. Deduplication by control number ensures we don't double-count.

Appointment Types

Appointment type indicates whether a position is permanent, temporary, an internship, etc.

How We Get Appointment Types

The Current Jobs API returns numeric codes (e.g., 15328). We translate these to human-readable names by querying the official codelist:

Codelist endpoint: https://data.usajobs.gov/api/codelist/positionofferingtypes
Code: See fetch_position_offering_types() in scripts/collect_current_data.py

The Historical Jobs API returns appointment types as text strings directly.

Appointment Type Values

Code Name Description
15317 Permanent Ongoing position with no set end date
15318 Temporary Short-term position, usually under 1 year
15319 Term Time-limited position, typically 1-4 years
15322 Seasonal Recurring position tied to a particular season
15327 Multiple Posting includes multiple appointment types
15328 Internships Student internship positions
15326 Recent Graduates Pathways Recent Graduates program
15522 Intermittent Positions with irregular, as-needed schedules

To count internships: We filter for jobs where appointmentType equals "Internships" (or code 15328 in raw current API data).

Filling Data Gaps

Job postings include both an agency name (hiringAgencyName) and a department name (hiringDepartmentName), but occasionally one field is missing. When this happens:

These mappings are generated automatically by analyzing which agency/department pairs appear together most frequently across all job postings.

Code: tracking/generate_agency_mappings.py

Data Storage

All data is stored in Apache Parquet format, grouped by year based on positionOpenDate. The complete dataset is available in the repository under data/.

Reproducing This Analysis

# Clone the repository
git clone https://github.com/abigailhaddad/usajobs_historical

# Install dependencies
pip install -r requirements.txt

# See README for collection script details
# Summary generation:
python tracking/generate_jobs_summary_dynamic.py

Contact

Questions about methodology? Open an issue on GitHub.