Scottish Longitudinal Study
Development & Support Unit

Current Projects

Project Title:

Synthetic Data Estimation for UK Longitudinal Studies

Project Number:

2013_012

Researchers:

Adam Dennett (CASA, UCL)
Belinda Wu (CASA, UCL)
Nicola Shelton (Dept of Epidemiology, UCL)
Beata Nowok (University of St Andrews)

Start Date:

19 August 2013

Summary:

The aim of the study is to create synthetic datasets which will resemble the UK Longitudinal Studies to allow researchers to experiment and test ideas before applying to use the real LSs. We aim to generate transition probabilities from 1991 to 2001 across a range of commonly-used LS variables. Such probabilities will then be applied to a baseline microdata population simulated using the Census Samples of Anonymised Records (SARs), in order to maintain the distribution of the real LS variables. Initial work for this project will be carried out using the England and Wales LS, but upon completion of this work we will extend to the Scottish and Northern Irish LSs.

We will be generating synthetic SLS-like microdata using widely used microsimulation techniques. In Canada, a number of microsimulation models are used by Statistics Canada to generate synthetic population microdata, including synthetic longitudinal data in the LifePaths model (Canada 2012). An approach which has been used for a number of years, with its origins dating back to the work of Orcutt (1957), microsimulation has been used in fields such as economics, but recently in demography for generating synthetic populations of individuals (Falkingham and Lessof 1992; van Imhoff and Post 1998) and geography for estimating these populations for small areas (Ballas et al. 2005a; Ballas et al. 2005b; Harland et al. 2012; Smith et al. 2009; Smith et al. 2011). Geographical/spatial microsimulation generally proceeds by taking a microdata set which is typically attribute rich but lacks geographical detail, and re-weights the sample for the spatial units desired, according to the distribution of variables shared with another more spatially detailed dataset.

Whilst microsimulation has been used to generate synthetic longitudinal data outside of the UK in places like Canada (as already mentioned) and New Zealand (Davis, 2010), the technique has not been used, to date, in the UK. This project will both contribute to the microsimulation literature in this area, but more importantly will produce a synthetic dataset for use by all researchers interested in analysis using census-based longitudinal studies such as the SLS, NILS and ONS LS.

References:

  1. Canada, Statistics (2012), 'The LifePaths Microsimulation Model - An Overview', Statistics Canada. available online

  2. Orcutt, G.H. (1957), 'A new type of socio-economic system', Review of Economics and Statistics, 39 (2), 116-23.

  3. Falkingham, J. and Lessof, C. (1992), 'Playing God or LIFEMOD – the construction of a dynamic microsimulation model', in Sutherland, H. (ed.), Microsimulation Models of Public Policy Analysis: New Frontiers; London: Suntory-Toyota International Centre for Economics and Related Disciplines, LSE, 5-32.

  4. Ballas, D., Rossiter, D., Thomas, B., Clarke, G., and Dorling, D. (2005a), Geography matters: Simulating the local impacts of national social policies: Joseph Rowntree Foundation.

  5. Ballas, D., Clarke, G., Dorling, D., Eyre, H., Thomas, B., and Rossiter, D. (2005b), 'SimBritain: a spatial microsimulation approach to population dynamics', Population, Space and Place, 11 (1), 13-34.

  6. Harland, K., Heppenstall, A., Smith, D., and Birkin, M. (2012), 'Creating Realistic Synthetic Populations at Varying Spatial Scales: A Comparative Critique of Population Synthesis Techniques', Journal of Artificial Societies and Social Simulation, 15 (1), 1.

  7. Smith, D.M., Clarke, G.P., and Harland, K. (2009), 'Improving the synthetic data generation process in spatial microsimulation models', Environment and Planning A, 41 (5), 1251-68.

  8. Smith, D.M., Pearce, J.R., and Harland, K. (2011), 'Can a deterministic spatial microsimulation model provide reliable small-area estimates of health behaviours? An example of smoking prevalence in New Zealand', Health & Place, 17 (2), 618-24.

  9. Davies, P. Developing a simulation tool for policymakers. The Modelling the Early Life Course Project (MEL-C) COMPASS annual research colloquium. available online

Related Outputs (viewable on CALLS Hub):

Explore the variables held in the SLS data dictionary.

Recent News

Upcoming Events

Latest SLS Tweets