Scottish Longitudinal Study
Development & Support Unit
Current Projects
Project Title:
Synthetic Data Estimation for UK Longitudinal Studies
Project Number:
2013_012
Researchers:
Adam Dennett (CASA, UCL)
Belinda Wu (CASA, UCL)
Nicola Shelton (Dept of Epidemiology, UCL)
Beata Nowok (University of St Andrews)
Start Date:
19 August 2013
Summary:
The aim of the study is to create synthetic datasets which will resemble the UK Longitudinal Studies to allow researchers to experiment and test ideas before applying to use the real LSs. We aim to generate transition probabilities from 1991 to 2001 across a range of commonly-used LS variables. Such probabilities will then be applied to a baseline microdata population simulated using the Census Samples of Anonymised Records (SARs), in order to maintain the distribution of the real LS variables. Initial work for this project will be carried out using the England and Wales LS, but upon completion of this work we will extend to the Scottish and Northern Irish LSs.
We will be generating synthetic SLS-like microdata using widely used microsimulation techniques. In Canada, a number of microsimulation models are used by Statistics Canada to generate synthetic population microdata, including synthetic longitudinal data in the LifePaths model (Canada 2012). An approach which has been used for a number of years, with its origins dating back to the work of Orcutt (1957), microsimulation has been used in fields such as economics, but recently in demography for generating synthetic populations of individuals (Falkingham and Lessof 1992; van Imhoff and Post 1998) and geography for estimating these populations for small areas (Ballas et al. 2005a; Ballas et al. 2005b; Harland et al. 2012; Smith et al. 2009; Smith et al. 2011). Geographical/spatial microsimulation generally proceeds by taking a microdata set which is typically attribute rich but lacks geographical detail, and re-weights the sample for the spatial units desired, according to the distribution of variables shared with another more spatially detailed dataset.
Whilst microsimulation has been used to generate synthetic longitudinal data outside of the UK in places like Canada (as already mentioned) and New Zealand (Davis, 2010), the technique has not been used, to date, in the UK. This project will both contribute to the microsimulation literature in this area, but more importantly will produce a synthetic dataset for use by all researchers interested in analysis using census-based longitudinal studies such as the SLS, NILS and ONS LS.
References:
- Canada, Statistics (2012), 'The LifePaths Microsimulation Model - An Overview', Statistics Canada. available online
- Orcutt, G.H. (1957), 'A new type of socio-economic system', Review of Economics and Statistics, 39 (2), 116-23.
- Falkingham, J. and Lessof, C. (1992), 'Playing God or LIFEMOD – the construction of a dynamic microsimulation model', in Sutherland, H. (ed.), Microsimulation Models of Public Policy Analysis: New Frontiers; London: Suntory-Toyota International Centre for Economics and Related Disciplines, LSE, 5-32.
- Ballas, D., Rossiter, D., Thomas, B., Clarke, G., and Dorling, D. (2005a), Geography matters: Simulating the local impacts of national social policies: Joseph Rowntree Foundation.
- Ballas, D., Clarke, G., Dorling, D., Eyre, H., Thomas, B., and Rossiter, D. (2005b), 'SimBritain: a spatial microsimulation approach to population dynamics', Population, Space and Place, 11 (1), 13-34.
- Harland, K., Heppenstall, A., Smith, D., and Birkin, M. (2012), 'Creating Realistic Synthetic Populations at Varying Spatial Scales: A Comparative Critique of Population Synthesis Techniques', Journal of Artificial Societies and Social Simulation, 15 (1), 1.
- Smith, D.M., Clarke, G.P., and Harland, K. (2009), 'Improving the synthetic data generation process in spatial microsimulation models', Environment and Planning A, 41 (5), 1251-68.
- Smith, D.M., Pearce, J.R., and Harland, K. (2011), 'Can a deterministic spatial microsimulation model provide reliable small-area estimates of health behaviours? An example of smoking prevalence in New Zealand', Health & Place, 17 (2), 618-24.
- Davies, P. Developing a simulation tool for policymakers. The Modelling the Early Life Course Project (MEL-C) COMPASS annual research colloquium. available online
Related Outputs (viewable on CALLS Hub):
- Synthetic data estimation for the UK Longitudinal Studies: an introduction to the SYLLS project
- Synthetic Data Estimation for the UK Longitudinal Studies (SYLLS project): An introduction to the Multiple Imputation approach
- How to generate synthetic data using the ‘synthpop’ package
- Synthetic data estimation for the UK longitudinaL studies – SYLLS
- Generating synthetic microdata to widen access to sensitive data sets: method, software and empirical examples
- Synthetic data estimation for the UK longitudinal studies – SYLLS
- Synthpop: Generating synthetic versions of sensitive microdata for statistical disclosure control. R package version 1.0-0
- Simplifying synthesis with the synthpop package for R
- Generating synthetic microdata using the synthpop package
- Generating synthetic microdata to widen access to sensitive data sets
- Generating synthetic microdata to widen access to sensitive data sets
- How to generate synthetic data using the ‘synthpop’ package
- How to generate synthetic data using the ‘synthpop’ package
- Prospects for synthetic data in longitudinal social science
- Synthetic data estimation for the UK longitudinal studies
- A simplified approach to generating synthetic data for disclosure control
- Synthetic Data Estimation for the UK Longitudinal Studies
- Synthetic data for the UK longitudinal studies – SYLLS
- An Introduction to Analysing Longitudinal Study Data Using the SYLLS Synthetic Spine Dataset, Practical exercise using the spine data
- An Introduction to Longitudinal Analysis using the National Synthetic LS Spine, Practical session
- An Introduction to Longitudinal Analysis using the National Synthetic LS Spine, Practical workshop
- An Introduction to the Synthetic Longitudinal Studies
- ONS LS Synthetic Data Spine
- Using synthetic data to improve the accessibility of the SLS
- The R package synthpop for producing synthetic data
- Comments on four papers on synthetic data in Volume 32 Issue 1 the Statistical Journal of the IAOS
- A synthetic Longitudinal Study dataset for England and Wales
- Utility Measures for Synthetic Data
- Synthetic data and better access
- The development of synthetic data sets to expand and transform use of disclosive data from the ONS Longitudinal Study
- Practical data synthesis for large samples
- Guidelines for Producing Useful Synthetic Data