Scottish Longitudinal Study
Development & Support Unit

How are synthetic data created?

The two synthetic products that are available from the SLS are produced in different ways:

1. Synthetic spine data

The SLS ‘spine’ dataset is generated using the 2011 Scotland’s Census Teaching File dataset available from the National Records of Scotland and a series of 2001 to 2011 transitional probabilities of key demographic variables taken from the SLS.

The variables included are:

Age (10 year groups)
Marital Status
General Health
Religion
Approximated Social Grade

A series of algorithms are used firstly to estimate the numbers of individuals in a particular age group undergoing each longitudinal state transition (eg. Never married in 2001 to Married in 2011 or Good health in 2001 to Good health in 2011) and then allocate these changes (or not) to the appropriate number of individuals in the Census dataset, resulting in a new, plausible, SLS-like dataset which will include data from both 2001 (synthetic) and 2011 (real) for all individuals.

For more detailed information see ‘A Synthetic Longitudinal Study for the United Kingdom‘ The data can be accessed here

2. Bespoke synthetic extracts

Bespoke synthetic extracts are produced using the R package synthpop in response to user requests.

Variables are synthesised one by one using sequential regression modelling. This means that each synthetic variable is modelled separately and this variable’s relationship to all other variables in the real dataset is taken into account. This ensures that when analysis of the full dataset is performed the researcher will get results which will usually be very similar to results if this analysis was performed on the real data.

The synthetic data are produced from the user’s extract by staff at the SLS-DSU. This can be a complex task and users are expected to work with staff to facilitate their work. See ‘How to access synthetic data‘ – for details.

Upcoming Events

Events feed unavailable

Latest SLS Tweets

Tweets by @SLS_DSU

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__utma	2 years	Used to distinguish users and sessions. The cookie is created when the javascript library executes and no existing __utma cookies exists. The cookie is updated every time data is sent to Google Analytics.
__utmb	30 minutes	Used to determine new sessions/visits. The cookie is created when the javascript library executes and no existing __utmb cookies exists. The cookie is updated every time data is sent to Google Analytics.
__utmc		Not used in ga.js. Set for interoperability with urchin.js. Historically, this cookie operated in conjunction with the __utmb cookie to determine whether the user was in a new session/visit.
__utmt	10 minutes	Used to throttle request rate.
__utmz	6 months	Stores the traffic source or campaign that explains how the user reached your site. The cookie is created when the javascript library executes and is updated every time data is sent to Google Analytics.
_ga	2 years	Used to distinguish users.
_gat	1 minute	Used to throttle request rate.
_gid	24 hours	Used to distinguish users.

How are synthetic data created?

Recent News

Staff Vacancy

SLS-DSU & Safe Setting Easter Closure Dates

Upcoming Events

Events feed unavailable

Latest SLS Tweets