Skip to content

Frequently Asked Questions

Short answers to common questions about data sources, methods, and how to use VRscores. For the full write‑up, see theMethodology page.

Bulk dataset downloads (full panels) are available on the Download Data page.

Getting started with VRscores

What are VRscores?
VRscores summarize the partisan composition of an organization’s workforce using both two-party (Democratic vs. Republican) and overall shares. The measures are derived by linking voter registrations (L2) to employment profiles (Revelio Labs). The approach is documented in “VRscores: A Voter Registration-Based Approach for Measuring Workforce Politics”, which we encourage you to cite if you use the data.
Do VRscores measure ideology or partisanship?
VRscores capture partisan identity (party registration or imputed lean), not a full left–right ideology score. We recommend describing results as workforce partisanship rather than ideological placement.
Where do the data come from?
We combine two sources: L2’s national voter file (processed Nov 2024) and Revelio Labs’ employment profiles (captured Apr 2025). Matching is limited to metropolitan statistical areas (MSAs). Details and citations appear in the working paper cited above and on theRevelio Labs and L2 sections of the methodology page.
How do you match people to employers?
We use an ensemble of two approaches: probabilistic record linkage (Fellegi–Sunter via Splink) and an LLM‑assisted semantic matcher (fuzzylink). We keep one link per person and then aggregate to organizations. The working paper and Methodology page walk through both steps and validation checks in detail.
What’s the scale of the matched dataset?
The 2012–2024 employer (VRID) panel includes 6.26 million employer-year observations covering roughly 534,000 unique employers in metropolitan areas. Across those years we match 24.5 million unique workers to the voter file.
What fields are in the final dataset?
Each dataset ships with a data dictionary on the Download Data page. The employer-year panel lists identifiers (VRID, parent VRID when applicable), modal geographies, matched worker counts, raw and imputed partisan shares, political diversity metrics, and confidence tiers. Parallel codebooks document the occupation-year, industry-year, and metro-year panels with their respective identifiers and aggregates.
Should I use the registered or imputed partisan measures?
It depends on your question. The registration-based metrics usually carry less noise because they rely on official (or L2-modeled) party registration, so they are the best read on partisan affiliation. Imputed values extend coverage by inferring lean for voters who register as unaffiliated or with minor parties; the estimates are noisier but broaden coverage. In practice the two series are quite similar—only about 20% of workers have imputed partisanship—so most results are robust to either choice.

Coverage & methodology

Why do some organizations show extreme shares (near 0 or 1)?
Small cohorts are inherently noisy. The public data exclude organizations with fewer than 5 matched workers to reduce spurious extremes and protect privacy.
Is your data a balanced panel?
Not exactly. Each year we only include employers for which we can match at least five workers in that year. Smaller organizations can fall out of the panel in years when we cannot identify enough employees, while larger employers typically appear in every release.
Do VRscores include workers who are not registered to vote or who are non-citizens?
No. We match voter registrations to employment profiles, so people who are ineligible to vote, have never registered, or do not maintain public LinkedIn-style profiles are absent. That limitation applies to donations too, because only citizens can donate to federal campaigns. When a sector relies heavily on non-citizen labor you should treat coverage as partial.
Do you cover rural (non‑MSA) areas?
Matching is limited to metropolitan statistical areas (MSAs). Workers who live or work exclusively outside an MSA are typically not matched, so rural coverage is limited.
Are public‑sector employers included?
Yes. Government agencies, the military, and public universities appear alongside private employers. When benchmarking, keep sector differences in mind.
How do you handle people with multiple jobs in a year?
For industry, occupation, and MSA cuts, we deduplicate to one active position per person per year. In the employer‑year panel, a person with overlapping jobs can appear in multiple company‑years; counts are unique workers within each employer‑year.
How does geography work for remote roles?
Links require the Revelio Labs job‑location MSA and the L2 voter residence to fall within the same MSA. Fully remote or cross‑MSA arrangements may be harder to match.
Do you impute party for independents and third‑party registrants?
We report two sets of measures: registration‑based (from the voter file) and imputed. Registration‑based shares use official party registration where states provide it, or modeled party from L2 where they do not. For voters registered as independents or with third parties, we impute a lean using primary participation history and Bayesian models that blend precinct vote returns with demographics. Technical details and validation tests live in the cited working paper and the Methodology page.
What inputs feed the imputation model?
The imputation procedure combines three ingredients: (1) each voter’s recent partisan primary participation, (2) precinct-level general election returns that anchor the local partisan baseline, and (3) demographic covariates (age, gender, race/ethnicity) used as priors. We implement the Bayesian steps described in the VRscores working paper, which also reports validation checks comparing imputed labels to survey benchmarks. If both primary history and demographic cues are missing, we leave the voter as “other/unknown.”
How representative are VRscores compared with donation-based measures?
VRscores reflect tens of millions of matched workers across more than half a million employers, so they capture employees at many levels of a company, not just the small slice who make political donations. Donation datasets lean heavily toward senior, wealthier, coastal workers. If you want a read on the overall workforce, VRscores are a better fit; if you care about politically active elites, donation data can still be useful. Extensive benchmarking appears in the VRscores working paper and on the benchmarking page, where we compare coverage, representativeness, and firm-level correlations against donation-based measures.
How do you treat independents or unaffiliated registrations?
The dataset ships with both raw and imputed columns. The raw series reports Democratic, Republican, and “other” shares exactly as they appear in the voter file. The imputed series assigns lean for unaffiliated voters using their primary history, local voting patterns, and demographics. Use whichever version matches your tolerance for noise—most teams keep both to compare.
Are VRscores stable across years?
Yes. The typical within-firm two-party standard deviation is under ten percentage points over 2012–2024, so VRscores are fairly stable over time.
How do you handle mergers and acquisitions?
We rely on Revelio Labs’s entity hierarchy and maintain our own VRID mapping, including parent-child relationships as coded in early 2025, to standardize employers. That mapping captures many mergers and acquisitions by rolling newly combined entities under a shared parent VRID while retaining their child names and IDs. We supplement the hierarchy with linked identifiers (e.g., GVKEY) but researchers studying specific deals should still review those events case by case and choose the representation that best fits their question.
How are confidence tiers assigned?
Employer records carry a high confidence tag when we can link at least 200 workers and the underlying demographic columns (gender, race/ethnicity, and age bucket) are at least 80% complete after matching. Records with 25–199 matched workers or with larger pockets of missing demographic detail fall into the medium tier. Entries backed by fewer than 25 workers are surfaced as low confidence so you know coverage is sparse.
How should I interpret the ‘political diversity’ metrics?
We report diversity as one minus the sum of squared partisan shares (higher means more mixed composition) and an effective‑parties count (inverse Herfindahl). These summarize how concentrated the partisan mix is within an employer, occupation, industry, or MSA.
What about privacy? Do you expose individual records?
No. We release aggregated data at the organization, industry, occupation, and MSA levels and apply cohort minimums to protect individual identities. We never expose raw profile text or individual voter records. All statistics come from public sources: Revelio Labs ingests LinkedIn profiles that people have made public, and voter registration files are public records in the United States.
Why can’t I find a specific organization?
A few common reasons:
  • Below threshold: organizations with fewer than 5 matched workers are hidden.
  • Coverage: we match within MSAs and require valid employer and location; records missing either are excluded.
  • Source footprint: some organizations or sectors have limited online profile coverage in Revelio Labs.
  • Workforce mix: VRscores rely on LinkedIn-style profiles, so representation skews toward white-collar roles and may miss employers whose staff have fewer online profiles.

Explorer & downloads

Do explorer downloads reflect the filters I set?
Yes. CSV and JSON exports taken from the download menu reapply the same filters, scope, and partition (e.g., MSA, NAICS, confidence) that you see on screen.
What’s the default confidence filter in the explorer?
By default, the explorer includes all tiers (confidence=any). You can narrow to high, medium, or low to match your use case.
Will you add historical years or refresh the data?
The panel currently spans 2012–2024. We plan to refresh after each presidential election year. Updates and any additional historical backfills will be listed in the site changelog.
Can researchers get bulk data?
Yes. Visit the Download Data page to access the latest datasets, including employer, industry, occupation, and MSA panels plus documentation.
Is there an API for programmatic access?
Yes. The site exposes a lightweight shard reader at /api/slice. Example:/api/slice?year=2024&level=2&code=51,52&confidence=any&limit=500. Parameters: year (2012–2024), level (2 or 4‑digit NAICS),code (one or comma‑separated NAICS codes), confidence(any|high|medium|low), limit(≤2000), and optional q (name contains).
Can I combine VRscores with campaign-donation data?
Absolutely. VRscores capture workforce partisanship, while campaign-finance files capture the subset of employees (often executives) who give money to political campaigns. The two views only moderately correlate, so comparing them can surface interesting gaps—for example, a company with Democratic-leaning staff but a Republican-leaning donor class. We summarize the comparison in the VRscores working paper. Keep in mind that donations often reflect strategic giving by elites, so they are not always a clean read on personal identity or ideology.

Research, citation & support

How should I cite VRscores?
Please cite both of the following and include a link to politicsatwork.org:
  1. Kagan, Max; Frake, Justin; Hurst, Reuben (2025). “VRscores: A New Measure and Dataset of Workforce Politics Using Voter Registrations.” SSRN Working Paper No. 5104795. https://ssrn.com/abstract=5104795
  2. Frake, Justin; Hurst, Reuben; Kagan, Max (2025). “Political Segregation in the US Workplace.” SSRN Working Paper No. 4639165. https://ssrn.com/abstract=4639165 or http://dx.doi.org/10.2139/ssrn.4639165
Are there licensing restrictions?

The public datasets are available at no cost for educational, research, and journalistic uses. Commercial, governmental, or political applications are not permitted without the express permission of the authors.

Short notice: Use is subject to VRscores Terms of Use — noncommercial only; no governmental/quasi-governmental or political activity. Terms of Use (v1.1)

Prohibited uses

  • Political targeting. You may not use the data, or allow others to use the data, to identify, monitor, penalize, blacklist, or otherwise disadvantage any company, affiliate, or person based on real or inferred political composition or activity.
  • Government and contractors. Without the authors’ prior written consent, the data may not be used by or on behalf of any governmental or quasi‑governmental entity (including law‑enforcement, national‑security, immigration, regulatory, or procurement agencies) or their contractors for surveillance, investigations, enforcement actions, regulatory or procurement decision‑making, or other actions that could result in adverse treatment of an organization.
  • Discrimination. Any use that would violate anti‑discrimination, civil‑rights, or similar protections is strictly prohibited.
  • Re‑identification. You may not attempt to re‑identify individuals or infer sensitive attributes.

If you believe you are legally compelled to disclose or use the data in a restricted context, you agree to provide prompt notice to the authors so they may seek appropriate protections.

Attribution required: Any use of the datasets or derivatives (including analysis, visualizations, apps, or services) must cite VRscores and include links to the site and both papers above. Example: “VRscores (Frake, Hurst, Kagan, 2025), https://politicsatwork.org; SSRN 5104795; SSRN 4639165”.

For permissions beyond the above, contact admin@politicsatwork.org to discuss a specialized license.

I found a data error. What should I do?
VRscores rely on employment data from Revelio Labs and voter registration data from L2. Our occupation and industry mappings inherit from those inputs, so occasional misclassifications or stale records are possible. We remove non-employer identities whenever we can (for example: unemployed, retired, self-employed, student), but some errors may remain. If something looks off, please tell us — send an email to admin@politicsatwork.org with details about the issue.
How do journalists contact you?
For media inquiries and interview requests, email press@politicsatwork.org. For general questions about the site or dataset permissions, contact admin@politicsatwork.org.