Scottish Longitudinal Study
Development & Support Unit
How reliable are results from synthetic data?
The synthetic spine data was produced with less emphasis on validity of the data and more emphasis on providing a resource which is freely available and can be used as a training dataset. All transitions will be accurate when aggregated to age (group), although not necessarily when aggregated to another variable.
- We encourage researchers to use this dataset as a training or preparation resource only.
In contrast, the bespoke synthetic datasets do aim to preserve the relationships between variables that exist in the real data as much as possible in the synthetic data. The SLS synthetic dataset will look and behave relatively similarly to the real data. However, there is no guarantee of the validity of the results obtained from the SLS synthetic data.
- The results produced from analysis of the synthetic dataset must not be published.
- Any analysis for publication must be run on the real dataset within the SLS Safe Setting and all outputs from this must be disclosure checked as standard.
All researchers using synthetic data will be expected to liaise with their SLS Support Officers on how results for the real and synthetic data compare, so that this can be reported to our development team. This feedback will be used in any future developments of the product.