Data collection
Data collection was preceded by at least two pilot studies, including pilot versions of the booklets and telephone piloting of some cognitive tests. The pilot studies are not documented in this data dictionary, but are described briefly in the pilot studies page.
Data collection periods were linked to the school year (September to August) in which twins reached the age of 9 years. The reason for this was to ensure that all twins were in the same school year when the teacher data were collected. Hence, families were divided into school cohorts according to when their twins were born (as described above). This simplified administration of the study, because all families in a cohort could be contacted at once. Twin ages vary from roughly eight and a half to nearly ten years, at the time when the data were collected.
Initial contact with families was made by sending them an information pack together with a consent form. The consent form was printed on a postage-paid postcard (unfortunately no electronic copy of the consent form has been retained). Parents were given the option of opting out of the study if they did not want to take part. On the same consent form, parents were also asked to give consent for TEDS to contact the twins' teachers, and were asked to provide contact details for the teachers. Parents were sent up to three written reminders if they did not return their consent forms promptly.
Each family that returned written consent was then sent a pack of three booklets: a parent booklet (pdf), and a child booklet (pdf) for each twin. Up to four written reminders were sent to families that did not return their booklets promptly. In cohort 1, a telephone reminder was also used (after the written reminders) for families that had consented and received vouchers, but had not returned their booklets.
All of the large-scale mailings were handled externally by Group Sigma: the initial consent mailing to families, and the subsequent mailings of booklets both to families and to teachers. In addition, in cohort 1, Group Sigma handled the returns of the consents and booklets. The consent postcard, and the freepost envelopes for the booklets, were addressed to Group Sigma, who logged the returns as they received them, and sent regular updates to TEDS to enable us to keep track of progress. This sometimes caused problems in communications with families, because the TEDS records were not always up to date. Therefore, for cohort 2 the consents and family booklets were returned directly to TEDS instead; the booklets were subsequently sent to Group Sigma for scanning.
In cohort 1, vouchers (to the value of £5 for each twin) were sent to families along with the booklets, as an inducement to encourage the twins to complete their booklets. This procedure was reviewed at the end of the cohort, because nearly 400 families had failed to return their booklets even though they had received the vouchers. Hence in cohort 2, the vouchers were sent to the twins as a reward, only after they had completed and returned the booklets.
The teacher study included all families that had given consent and provided contact details for the twins' teachers and schools. Teacher booklets (pdf) were sent directly to teachers, and were not seen by the families themselves. Each teacher also received a return envelope addressed to Group Sigma. Up to three written reminders were sent to teachers that had not returned their questionnaires promptly.
The measures used in the parent, child and teacher booklets are described in detail on another page.
Data entry
General data entry issues (for all studies) are described in a separate page. Data from the 9 Year booklets were taken up electronically by optical scanning. The layout and design of the booklets had been designed with this in mind. Scanning was handled externally by Group Sigma, a commercial company.
In cohort 1 (as explained above), the booklets were mailed directly to Group Sigma by families and by teachers. In cohort 2, the booklets were mailed first to TEDS, and then delivered to Group Sigma in batches. In both cohorts, the booklets were scanned in large batches. After scanning each batch of booklets, Group Sigma returned the data to TEDS in plain text files.
The parent booklet contained several free text fields, which could not be scanned; the verbatim text from these fields was instead entered manually by Group Sigma into Excel spreadsheets. The Excel files for each batch of booklets were returned to TEDS alongside the text files containing the scanned data.
All these raw data, from scanning and from verbatim text data entry, were cleaned and aggregated into the 9 Year study Access database. Any booklets returned too late for scanning were manually entered (in TEDS) directly into the same Access database. This Access database is now the master copy of all 9 Year booklet data used to construct the dataset. The only data not in the Access database are the story data (see below). Raw data files are described in more detail in 9 Year raw data files.
The "story" in the twin booklets consisted entirely of free text written by the twin. Before the booklet was scanned, the story page was removed, and the twin's ID and name were written on it. These story pages were later transcribed, independently of the data entry for the rest of the twin booklet. Transcription was carried out initially by Group Sigma staff, and later by staff employed by TEDS. Each story was entered manually into a plain text file. Each file was saved (with the .txt file extension) with a file name which was the same as the TwinID. At the time of data entry, some initial coding was also carried out, with the codes incorporated alongside the text within the file. Additional coding into scores was done manually for each story file, and the scores were entered into the Access database. The coding rules were complex and are not described further here. At a later date, the original paper copies of the handwritten stories were scanned into images, saved in pdf files, and the paper copies were destroyed. These pdf images were converted into pseudonymous form by altering the IDs (in the filename) and by removing other identifying information such as names.
There are annotated versions of the parent, twin and teacher bookets (pdfs). These documents show, in blue, the field names and value codes that have been used in the cleaned raw data and, in red, the variable names and value codes used in the dataset.