About PhD Demographics
Tracking the outputs of American academia
This site exists because taxpayer-funded universities should graduate Americans.
It tracks who is actually receiving PhDs from US universities, who is funding their research,
and what happens to them after graduation — using exclusively official government data
and public university repositories.
What's in here
We track the full pipeline from international graduate enrollment to permanent residency,
one university and one department at a time:
- ~430,000 thesis and dissertation records from 101 US universities (2016–2024), scraped from each school's institutional repository — every name links back to the original record for verification.
- IPEDS completions data for the entire history of US doctoral education (1984–2024), broken out by citizenship, race/ethnicity, and field.
- Federal immigration data — OPT, STEM-OPT, CPT, H-1B, EB green card, and PERM — from ICE/SEVP, USCIS, and DOL.
- R&D funding from NSF HERD and NIH RePORTER, joined to PhD output by university and department.
- Country-of-origin views for India, China, Canada, the UK, EU, and Mexico.
Everything is queryable via the Data Query (SQL) page,
available as JSON, and documented in
Data Sources & Methodology.
Who built it
Made by Andy Barr.
The project is independent, self-hosted, and not affiliated with any institution
or political organization. Source data is government open data; code is hand-written;
AI tools are used as collaborators, not authors.
How to help
This is a one-person research project, and there are many gaps to fill. The most
useful things you can send:
- Links to department Students, Faculty, or Dissertations
directories we don't have yet (the Data Quality page shows
where coverage is thin)
- Corrections — wrong department, wrong year, missing names
- Datasets, federal sources, or scrapers we should add
- Bug reports and ideas for new analyses
Privacy & data philosophy
This site does not run third-party analytics, ad networks, or fingerprinting scripts.
It logs visitor IPs in its own server logs for traffic analysis. The Tip Line is
rate-limited and stores nothing — messages go directly to the owner's inbox.
All published data is already publicly available; we just made it searchable.