Scottish Longitudinal Study
Development & Support Unit
Current Projects
Project Title:
Synthetic Data Estimation for UK Longitudinal Studies
Project Number:
2013_012
Researchers:
Adam Dennett (CASA, UCL)
Belinda Wu (CASA, UCL)
Nicola Shelton (Dept of Epidemiology, UCL)
Beata Nowok (University of St Andrews)
Start Date:
19 August 2013
Summary:
The aim of the study is to create synthetic datasets which will resemble the UK Longitudinal Studies to allow researchers to experiment and test ideas before applying to use the real LSs. We aim to generate transition probabilities from 1991 to 2001 across a range of commonly-used LS variables. Such probabilities will then be applied to a baseline microdata population simulated using the Census Samples of Anonymised Records (SARs), in order to maintain the distribution of the real LS variables. Initial work for this project will be carried out using the England and Wales LS, but upon completion of this work we will extend to the Scottish and Northern Irish LSs.
We will be generating synthetic SLS-like microdata using widely used microsimulation techniques. In Canada, a number of microsimulation models are used by Statistics Canada to generate synthetic population microdata, including synthetic longitudinal data in the LifePaths model (Canada 2012). An approach which has been used for a number of years, with its origins dating back to the work of Orcutt (1957), microsimulation has been used in fields such as economics, but recently in demography for generating synthetic populations of individuals (Falkingham and Lessof 1992; van Imhoff and Post 1998) and geography for estimating these populations for small areas (Ballas et al. 2005a; Ballas et al. 2005b; Harland et al. 2012; Smith et al. 2009; Smith et al. 2011). Geographical/spatial microsimulation generally proceeds by taking a microdata set which is typically attribute rich but lacks geographical detail, and re-weights the sample for the spatial units desired, according to the distribution of variables shared with another more spatially detailed dataset.
Whilst microsimulation has been used to generate synthetic longitudinal data outside of the UK in places like Canada (as already mentioned) and New Zealand (Davis, 2010), the technique has not been used, to date, in the UK. This project will both contribute to the microsimulation literature in this area, but more importantly will produce a synthetic dataset for use by all researchers interested in analysis using census-based longitudinal studies such as the SLS, NILS and ONS LS.
References:
- Canada, Statistics (2012), 'The LifePaths Microsimulation Model - An Overview', Statistics Canada. available online
- Orcutt, G.H. (1957), 'A new type of socio-economic system', Review of Economics and Statistics, 39 (2), 116-23.
- Falkingham, J. and Lessof, C. (1992), 'Playing God or LIFEMOD – the construction of a dynamic microsimulation model', in Sutherland, H. (ed.), Microsimulation Models of Public Policy Analysis: New Frontiers; London: Suntory-Toyota International Centre for Economics and Related Disciplines, LSE, 5-32.
- Ballas, D., Rossiter, D., Thomas, B., Clarke, G., and Dorling, D. (2005a), Geography matters: Simulating the local impacts of national social policies: Joseph Rowntree Foundation.
- Ballas, D., Clarke, G., Dorling, D., Eyre, H., Thomas, B., and Rossiter, D. (2005b), 'SimBritain: a spatial microsimulation approach to population dynamics', Population, Space and Place, 11 (1), 13-34.
- Harland, K., Heppenstall, A., Smith, D., and Birkin, M. (2012), 'Creating Realistic Synthetic Populations at Varying Spatial Scales: A Comparative Critique of Population Synthesis Techniques', Journal of Artificial Societies and Social Simulation, 15 (1), 1.
- Smith, D.M., Clarke, G.P., and Harland, K. (2009), 'Improving the synthetic data generation process in spatial microsimulation models', Environment and Planning A, 41 (5), 1251-68.
- Smith, D.M., Pearce, J.R., and Harland, K. (2011), 'Can a deterministic spatial microsimulation model provide reliable small-area estimates of health behaviours? An example of smoking prevalence in New Zealand', Health & Place, 17 (2), 618-24.
- Davies, P. Developing a simulation tool for policymakers. The Modelling the Early Life Course Project (MEL-C) COMPASS annual research colloquium. available online
Related Outputs (viewable on CALLS Hub):
- Guidelines for Producing Useful Synthetic Data
- Generating synthetic microdata to widen access to sensitive data sets
- An Introduction to the Synthetic Longitudinal Studies
- Synthetic data estimation for the UK Longitudinal Studies: an introduction to the SYLLS project
- How to generate synthetic data using the ‘synthpop’ package
- ONS LS Synthetic Data Spine
- Synthetic Data Estimation for the UK Longitudinal Studies (SYLLS project): An introduction to the Multiple Imputation approach
- How to generate synthetic data using the ‘synthpop’ package
- Using synthetic data to improve the accessibility of the SLS
- How to generate synthetic data using the ‘synthpop’ package
- Prospects for synthetic data in longitudinal social science
- The R package synthpop for producing synthetic data
- Synthetic data estimation for the UK longitudinaL studies – SYLLS
- Synthetic data estimation for the UK longitudinal studies
- Comments on four papers on synthetic data in Volume 32 Issue 1 the Statistical Journal of the IAOS
- Generating synthetic microdata to widen access to sensitive data sets: method, software and empirical examples
- A simplified approach to generating synthetic data for disclosure control
- A synthetic Longitudinal Study dataset for England and Wales
- Synthetic data estimation for the UK longitudinal studies – SYLLS
- Synthetic Data Estimation for the UK Longitudinal Studies
- Utility Measures for Synthetic Data
- Synthpop: Generating synthetic versions of sensitive microdata for statistical disclosure control. R package version 1.0-0
- Synthetic data for the UK longitudinal studies – SYLLS
- Synthetic data and better access
- Simplifying synthesis with the synthpop package for R
- An Introduction to Analysing Longitudinal Study Data Using the SYLLS Synthetic Spine Dataset, Practical exercise using the spine data
- The development of synthetic data sets to expand and transform use of disclosive data from the ONS Longitudinal Study
- Generating synthetic microdata using the synthpop package
- An Introduction to Longitudinal Analysis using the National Synthetic LS Spine, Practical session
- Practical data synthesis for large samples
- Generating synthetic microdata to widen access to sensitive data sets
- An Introduction to Longitudinal Analysis using the National Synthetic LS Spine, Practical workshop