Conditions of data release

Data from CIVS polls is released for research purposes only.

Privacy protections

Released data is fully anonymized, with all identifying information other than ballot rankings removed, including the title of the poll and the names of the choices. In addition, ballots are shuffled in two ways. First, the order of ballots is randomized, and second, the order of the choices is as well. Further, a small fraction of ballots (around 10%) are randomly dropped. Only polls with at least three voters are included. With more than 20,000 polls in the dataset, it is expected to be difficult to reliably identify most past polls and infeasible to determine the content of ballots that voters cast.

Format

Data is released as a compressed file in JSON format. The contents are an array of poll descriptions. The following attributes may be reported for each poll:

election_id
A pseudonymized identifier for the poll
mode
Either "proportional" or "nonproportional". Most polls are the latter.
num_choices
The number of choices/alternatives/candidates being ranked
test
Either "yes" or "no" depending on whether this was a test poll. Test polls often use synthetic data, depending on how they were used.
ballots
An array of ballots. Each ballot is an array containing the ranks of the choices, from 1 up to the number of choices. The rank may also be "?" if the voter did not express an opinion about the rank of that choice.
num_ballots
The length of the ballot array

Releases

Date Link Number of polls SHA-256 hash
2024-12-15 Zip file 22477 9f0d5aeec38ec84f031c1a6d07172d1d8a99f62ca1c0b5c24c5c65b3aee84d98