CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF)
CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF)
The DE-SynPUF was created with the goal of providing a realistic set of claims data in the public domain while providing the very highest degree of protection to the Medicare beneficiaries’ protected health information. The purposes of the DE-SynPUF are to:
- allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data;
- train researchers on the use and complexity of conducting analyses with CMS claims data prior to initiating the process to obtain access to actual CMS data; and,
- support safe data mining innovations that may reveal unanticipated knowledge gains while preserving beneficiary privacy.
The files have been designed so that programs and procedures created on the DE-SynPUF will function on CMS Limited Data Sets. The data structure of the Medicare DE-SynPUF is very similar to the CMS Limited Data Sets, but with a smaller number of variables. The DE-SynPUF also provides a robust set of metadata on the CMS claims data that have not been previously available in the public domain. Although the DE-SynPUF has very limited inferential research value to draw conclusions about Medicare beneficiaries due to the synthetic processes used to create the file, the Medicare DE-SynPUF does increase access to a realistic Medicare claims data file in a timely and less expensive manner to spur the innovation necessary to achieve the goals of better care for beneficiaries and improve the health of the population.
The DE-SynPUF contains five types of data – Beneficiary Summary, Inpatient Claims, Outpatient Claims, Carrier Claims, and Prescription Drug Events.
DE-SynPUF | Unit of record | Number of Records 2008 |
Number of Records 2009 |
Number of Records 2010 |
---|---|---|---|---|
Beneficiary Summary |
Beneficiary |
2,326,856 |
2,291,320 |
2,255,098 |
Inpatient Claims |
claim |
547,800 |
504,941 |
280,081 |
Outpatient Claims |
claim |
5,673,808 |
6,519,340 |
3,633,839 |
Carrier Claims |
claim |
34,276,324 |
37,304,993 |
23,282,135 |
Prescription Drug Events (PDE) |
event |
39,927,827 |
43,379,293 |
27,778,849 |
Note: Claim counts for 2010 are lower due to attrition from death, and some effects of disclosure treatment.
Accessing the files
Due to file size limitations, each data type in the CMS Linkable 2008-2010 Medicare DE-SynPUF is released in 20 separate samples (essentially each is a .25% sample). All claims for a particular beneficiary are in samples with the same number (i.e. all beneficiaries in sample 1 have all their claims in the sample 1 files). This design allows DE-SynPUF users who do not need the entire synthetic population of the DE-SynPUF to read in only as many samples as they desire.
A unique cryptographic identifier, DESYNPUF_ID, identifying beneficiaries was provided in each CMS linkable 2008-2010 Medicare DE-SynPUF. DE-SynPUF users can link CMS Linkable 2008-2010 Medicare DE-SynPUFs using this Beneficiary Code, DESYNPUF_ID, as the linking key. However, DESYNPUF_ID was specifically created for DE-SynPUFs and carries no information about the patient or any patient records, and is provided solely for reference and data processing purposes.
Click on the Sample below to be taken to the file download page:
Downloads
-
DE 1.0 Data Users Document (PDF)