Scottish Longitudinal Study
Development & Support Unit
What are synthetic data?
Synthetic data are microdata records created to improve accessibility whilst preventing disclosure of confidential information. They are produced by fitting statistical models to the original data and generating the synthetic data from these models – thus no records in the synthetic data correspond to real individuals.
The synthetic data set looks similar and behaves similarly to the real data, although the synthesising models may not capture all the relationships present in the real data. Usually the synthetic data will lead to the same conclusions as would be found from the real data, but this must be checked by running the final analyses on the real data.
To be absolutely clear, there are no records in the synthetic data set that are real. No real people can be identified from this data.