The Synthetic Derivative (SD) is a rich, multi-source repository of data collected from VUMC’s clinical records and de-identified for use in research.
Type of Data
Electronic Health Records (EHR)
Years Available
1980s to present, most robust starting 2001
Description
The SD is a de-identified database created using electronic scrubbing techniques to remove identifiers while maintaining semantic integrity. Identifiers such as names and dates are replaced or shifted in a consistent but anonymized manner. The database includes over 3.9 million records and is structured according to the OMOP common data model, with some custom tables. As it contains no HIPAA identifiers, the SD qualifies as non-human subjects research.
Strengths
Contains over 3.9 million de-identified records.
Compliant with HIPAA Safe Harbor standards.
Integrated with BioVU genomics data.
Connected to ImageVU and MicroVU.
Can be accessed using a self service web tool, SD Discover, at no cost to users.
Limitation
Data from the 1980s-2000 may be incomplete or inconsistent due to evolving documentation practices and system changes.
Dates are systematically shifted, preventing exact date recovery.
Not all medical record data is included, though new elements are regularly added.
Available for research purposes only.
Availability
SD data can be accessed through:
SD Discover: A free, self-service user-interface for cohort selection and select data element export.
IDASC custom programming: A billable service for complex phenotype criteria or data requirement requests. Note: Funding is required to pay for hourly programming and project management costs.
Databricks access: A workspace available for investigators who have the expertise to write their own SQL queries. Funding is required to pay for workspace setup and compute costs.