Sampling method
The TEDS twin DNA samples have been collected using cheek swabs (until 2009) and then using saliva samples (from 2014). DNA collection in the earliest phases was administered by parents, and in later phases by twins themselves. In all cases, the samples were collected at home and returned by post to TEDS.
Cheek swabs
In all DNA collections until 2009, cheek swabs were collected from twin pairs (not from unpaired twins) with the explicit consent of the parent. Each family was sent a pack containing the following materials:
- A letter inviting them to take part
- A consent form
- A sheet of instructions for sampling
- An information sheet
- Two sealed, labelled tubes containing cotton wool buds and a preserving fluid
- A padded return envelope for the tubes and consent form
Over the years of this study, there have been many slightly different versions of the letter, consent form, instruction sheet and information sheet. A recent version combined the letter, consent form and instruction sheet (pdf) in one document; a recent version of the information sheet (pdf) is also reproduced here.
Each tube in the pack was pre-labelled with the ID and name of one of the twins, to ensure that the returned samples could be properly identified.
The initial pack mailing was followed up with one or more written reminders for families that had not returned samples promptly. In some phases of the study, families were also phoned to encourage them to return their samples. See collection phases below for further details.
The contents of each pack (consent form and samples) were logged in the TEDS admin database on return to the TEDS office by post. If the samples were returned and the consent form was completed, then the samples were passed to the SGDP lab for extraction. If the samples were returned but the consent form had not been completed, the family was contacted again to obtain consent; no samples were extracted unless written consent had been obtained.
As a reward for returning the DNA samples, families of same-sex twin pairs were offered DNA zygosity tests (via the consent form). Requests for zygosity tests were passed to the lab along with the returned samples.
Saliva samples
In the final phase of DNA collection, from 2014 to 2015, saliva samples were collected from individual twins rather than from pairs, with the explicit consent for the twins themselves (because the twins were now aged over 16). The aim here was to maximise the twin sample for OEE genotyping, which is described in more detail below. Each twin was sent a pack containing the following:
- A letter with an invitation to take part (pdf)
- A consent form (pdf)
- A sheet of instructions for sampling (pdf)
- An information sheet (pdf)
- One proprietary salivary DNA pack including a tube, a funnel, a lid containing preserving fluid and printed instructions
- A padded return envelope for the tube and consent form
The saliva tube in the pack was pre-labelled with the ID and name of the twin, to ensure that every returned sample could be properly identified, and to avoid confusion in cases where both twins were sent tubes at the same address.
The initial pack mailing was followed up with one or more written and email reminders for twins who had not returned samples promptly. In certain prioritised cases (335 twins) callers were allocated to phone the twins to remind them to return their samples.
The contents of each pack (consent form and saliva sample) were logged in the TEDS admin database on return to the TEDS office by post. Every sample returned with a completed consent form was passed directly to the SGDP lab for extraction. If a sample was returned but the consent form had not been completed, the twin was contacted again to obtain consent; no sample was extracted without written consent.
As a reward for returning the saliva sample, each twin was offered a £15 electronic flexecode voucher, which could be redeemed at a variety of online retail outlets.
Collection phases
There have been five main phases of DNA collection in TEDS, and these are summarised in the table below. In all phases, families were excluded if they had withdrawn from TEDS, if they were known address problems, or if they were medical exclusions.
Phase | Years | Sample type | Selection criteria | Contact methods | Families contacted | Number of packs returned | % returned with consent | ||
---|---|---|---|---|---|---|---|---|---|
Returned with consent | Refused | Returned but no consent | |||||||
1 | 1998 to 2003 | Cheek swabs |
|
Mail only (no phone calls). Up to 3 written reminders sent to families that did not respond | 7680 | 5089 | 1312 | 101 | 66.3% |
2 | 2005 | Cheek swabs | New samples:
|
Families were initially phoned for verbal consent, before sending the pack. Up to 2 written reminders were sent to families who had given verbal consent but had not returned their packs. | 1801 | 943 | 172 | 5 | 52.4% |
Re-sampling:
|
469 | 368 | 14 | 3 | 78.5% | ||||
3 | 2007 to 2008 | Cheek swabs | New samples:
|
Families were initially mailed the pack; those who did not respond promptly to the initial mailing were contacted by phone for verbal consent. In addition, up to 2 written reminders were sent to families who had not returned their packs. | 1234 | 436 | 410 | 2 | 35.3% |
Re-sampling:
|
1258 | 990 | 58 | 3 | 78.7% | ||||
4 | 2008 to 2009 | Cheek swabs | Resampling only (no new samples), specifically for WTCCC study: see below for selection criteria. | Families were initially mailed the pack; they were phoned for verbal consent and reminders if they did not return their packs promptly. | 872 | 492 | 55 | 2 | 56.4% |
5 | 2014 to 2015 | Saliva | Both new samples and re-sampling, specifically for OEE genotyping: see below for selection criteria. | Mail only, with reminders, in most cases. Some prioritised twins were given phone reminders by callers. | 7275 [individual twins] |
2211 | 507 | 9 | 30.4% |
Genotyping
The WTCCC study
Selected TEDS twin DNA samples were included in the WTCCC-2 (the Welcome Trust Case Control Consortium phase 2), for a genetic study of reading and mathematics ability in 2009. For this study, TEDS submitted DNA samples, as well as phenotypic reading and maths data, to the Sanger Institute for a large sample of twins.
The genetic data extracted from the DNA samples were eventually returned to TEDS. The data from the WTCCC study have therefore enabled TEDS researchers to perform their own genome wide association studies (GWAS), and other genotypic analysis studies, relating to a wide range of different phenotypes.
Having provided the initial "discovery" sample to WTCCC, TEDS was also required to supply a replication sample to WTCCC. TEDS also prepared its own in-house replication sample. The selection criteria for these three twin samples are summarised in a table below.
For their discovery and replication samples, WTCCC specified certain minimum criteria for the mass, concentration and purity of each DNA sample. These formed part of the selection criteria when TEDS were preparing the twin samples. In the TEDS DNA collection study (see collection phases above), phase 3 was carried out prior to and in preparation for the WTCCC study, in order to maximise the number of twins available for selection; phase 4 was carried out during the preparation of the WTCCC discovery and replication samples, in order to re-sample DNA where existing samples had been found to fall below the thresholds set by WTCCC.
For all three twin samples (WTCCC discovery, WTCCC replication and in-house replication), the following exclusions were made:
- Medical exclusions
- Perinatal outliers
- Ethnic origin not known to be white
- English not known to be the language spoken at home
- Unknown twin sex
- Twin birth order records has been changed (leading to doubt over identify of DNA samples)
Further exclusions and selection criteria are described in the table below.
Twin sample | WTCCC discovery sample | WTCCC replication sample | in-house replication sample |
---|---|---|---|
DNA sample criteria |
|
|
|
Phenotypic data criteria | Each selected twin was required to have at least one of
the following:
|
Each selected twin was required to have at least one of the
following:
|
|
Twin criteria | Only one twin per pair. If both twins eligible then select the twin with more phenotypic data (more maths and reading web tests completed at age 12); if both twins have the same amount of data, select the twin having a larger mass of DNA. |
|
This sample was allowed to overlap with the WTCCC replication sample
but not with the WTCCC discovery sample:
|
Number of twins selected | 4440 | 2750 | 4923 |
In order to maximise the size of each of these twin samples, TEDS staff went through several cycles of collecting DNA samples from families, extracting and quantifying DNA in the lab, reprecipitating DNA samples to increase the concentration, and re-assessing the best twin from each pair. The selection criteria (shown in the table above) evolved over time according to practical considerations as well as the requirements of the WTCCC. The selection of families for phases 3 and 4 of DNA collection (especially for re-sampling) were gradually refined accordingly.
As part of the WTCCC study, some twin pairs who had not completed the maths and reading web tests in wave 2 of the 12 year study were contacted for a follow-up wave of web data collection in 2008. This is described in more detail on the 12 Year Study page. Twins who participated in this wave of web testing were also re-sampled (if necessary) in phase 4 of the DNA collection.
The genotypic data were eventually returned to TEDS by the Sanger Institute, as a subset of the 4440 twins who had been included in the discovery sample. Some had failed quality control measures at Sanger, and after further quality control steps within TEDS, the final sample of genotypic data included 3152 individual twins. This sample is sometimes referred to as the "Affymetrix sample" because this is the platform on which they were genotyped. This Affymetrix sample of 3152 twins included only unrelated twins (one per pair), from both MZ and DZ pairs.
The Affymetrix sample of genotypic data was later supplemented by a sample on the OEE platform - see below.
The OEE study
The 'OEE study' had the broad aim of maximising the size of the genotypic data sample for TEDS twins, building on the existing Affymetrix sample. DNA samples were genotyped in the SGDP labs on the OEE platform. This genotyping used existing (cheek swab) DNA samples where available, supplemented by newly-collected (salivary) DNA samples. The collection of new salivary DNA samples, as described above, took place in 2014-15. The OEE genotyping took place in several waves in 2015-16.
Over the two years of the study, and successive waves of actual genotyping, the criteria for selecting twin DNA samples were modified. The criteria for selecting suitable DNA samples were modified by trial and error, as feedback from genotyping determined the minimum sample characteristics (concentration, mass, quality) that were likely to produce a successful result. The phenotypic criteria for selecting appropriate twins were gradually broadened: in the earliest waves, unrelated twins with recent data were prioritised; in later waves, twins with less recent data were also selected, along with DZ twin pairs. Rather than describing the selection criteria for each wave, the paragraphs below describe the final criteria that applied broadly to the entire sample.
Salivary DNA collection and the related genotyping took place in cycles, such that twins previously selected but with failed genotyping (for whatever reason) would be re-selected if a new DNA sample could be collected. Similary, a twin previously rejected might be selected in a later wave after the selection criteria (phenotypic or DNA-related) had been relaxed. A similar approach was taken with the selection of a twin from a pair; if one twin had been prioritised but genotyping had failed, the other twin might subsequently be prioritised if suitable.
The OEE sample was designed to supplement the existing Affymetrix (WTCCC) sample. The 3152 twins already genotyped on Affymetrix, with satisfactory QC, were not genotyped again on OEE. The aim was to include the maximum number of unrelated twins from MZ pairs (one twin per pair), plus the maximum number of paired DZ twins, plus the maximum number of unpaired DZ twins where pairing proved infeasible. The only phenotypic data requirement was that 1st Contact data should be available, subject to certain exclusions.
Exclusion of twins from the sample were made as follows:
- Individual twins already genotyped on Affymetrix (WTCCC)
- No 1st Contact data available
- Twin ethnic origin was non-white, or unknown
- Medical exclusion (other than autism/ASD)
- Perinatal outliers (as defined in the 1st Contact dataset)
- Co-twin already genotyped (Affymetrix or OEE) in an MZ pair
After exclusion, twins were selected according to the availability of a suitable DNA sample. The final minimum DNA criteria were as follows:
- A new saliva sample was available (regardless of extracted DNA mass/volume/concentration)
- If only a cheek swab DNA sample was available, then it should have a volume of at least 8ul, a concentration of at least 30ng/ul, and a mass of at least 240ng.
For many twins (especially those with the older cheek swab DNA), multiple samples and dilutions were stored in the lab; a process of selecting the 'best' available sample was needed. Similary, in the case of MZ pairs, only one twin could be chosen and this was done on the basis of the twin with the 'best' DNA sample. Samples were prioritised in the following order:
- Use a salivary DNA sample if available
- For an MZ pair with salivary DNA samples, select the one with the higher mass
- If no saliva sample is available, preferentially select a cheek swab sample with 'ideal' characteristics: a volume of at least 8ul, a concentration of at least 50ng/ul, and a mass of at least 400ng (over a sample with minimal characteristics as described above)
- Prioritise certain named types of plates of cheek swab DNA, known to be newer and higher-quality (often used for the WTCCC study) over certain older and more obscure types of plate.
- Select the cheek swab sample with higher mass
The number of twins selected for genotyping is difficult to measure exactly. Some were selected repeatedly after genotyping failure then trying again with different samples. Some twins were initially selected but rejected prior to genotyping (for example, because the DNA sample could not be found or was found to be inadequate). A few twins were included by mistake then eliminated during QC checks, and so on. In all, in addition to the 3152 twins previously genotyped on Affymetrix (WTCCC study), around 4500 more unrelated twins and around 4000 DZ cotwins were selected for genotyping.
The genotypic sample
The TEDS twin genotypic data sample now includes both the Affymetrix and OEE data. The data from the two platforms were combined and subjected to common QC checks, after which the Affymetrix sample size dropped from 3152 to 3057 twins. The size of the combined sample can be described as follows:
- 10346 individual twins:
- 3057 genotyped on Affymetrix
- 7289 genotyped on OEE
- Counting pairwise:
- 3320 DZ pairs in which both twins have been genotyped
- 3706 pairs of any zygosity in which only one twin has been genotyped (2670 MZ, 1011 DZ, 25 unknown zygosity)
- A total of 7026 pairs containing either one or two genotyped twins (hence it is possible to select 7026 unrelated genotyped twins)
The availability of phenotypic data among the genotyped twins is highly variable. On the whole, those who were genotyped had of course provided DNA samples, which generally meant that they or their parents were responsive at least until the 4 year study and/or in more recent studies up until age 12. Of the 10346 individual genotyped twins, for example:
- 10337 have 1st Contact data
- 8164 have 4 year booklet data
- 8262 have 7 year parent booklet data
- 6935 have 12 year twin booklet data
- 7390 have 16 year GCSE/exam questionnaire data
- 5897 have TEDS21 twin phase 1 questionnaire data
- 5059 have TEDS26 twin questionnaire data
As usual, the availability of data from some other TEDS phenotypic studies is more limited because not all twin cohorts were included, or because data returns were lower.
In the polygenic score dataset (see polygenic scores), the scores for ungenotyped MZ twins have been copied from their genotyped cotwins. This copying assumes that the twins in an MZ pair have identical genotypes and hence identical polygenic scores. The purpose of this copying is to give more flexibility in analysis, especially in selection from pairs in which only one twin provided phenotypic data.
Data sharing
Please refer to the TEDS privacy policy and the TEDS data access policy, on the main TEDS web site, for detailed statements describing our policies for sharing data including DNA and genotypic data.
The twin DNA samples are not available for external sharing. They will only be used for new research by TEDS researchers within KCL.
Similarly, the raw genotypic data are not available for sharing outside KCL. The data access policy describes circumstances under which the genotypic data may be analysed (within KCL) as a part of a collaborative research project.
Polygenic scores are variables that are derived from the raw genotypic data for TEDS twins. These scores are easily shared and, unlike the raw genotypic data, they do not carry any risk of making participants identifiable. See the polygenic scores page for further details.