TEDS Data Dictionary

21 Year Study Data Files

Contents of this page:

Introduction

This page relates specifically to data collected in the 21 Year study. More general issues relating to the storage and organisation of TEDS data files are discussed on another page.

Raw Data files

These are currently stored in the \System\Rawdata\21yr\ folder, and the list below refers to files and sub-directories within this folder.

  • 21yr.accdb.
    This is the Access database file (2007 format) containing aggregated and cleaned TEDS21 questionnaire data that were collected by means of paper booklets (but not data collected electronically via the CMS or web backup). The database contains booklet data that were entered by two means: optical scanning and manual keying - data entered by these two methods have been aggregated together. This database file also provides the programs for manual data entry of the paper questionnaires. It also contains administrative data relating to the data collection, for example dates when participants were contacted. It also contains a recoding table for verbatim text responses that have been cleaned and recoded into numeric category variables (either existing or new variables as appropriate). The data structures within this database file are subject to changes while these complex data are being entered and cleaned.
    This Access database is now treated as the master copy of the paper booklet and administrative data, and is the source of such data for the analysis dataset. The main data tables in the database are TwinPhase1Part1, TwinPhase1Part2 (containing data entered from the phase 1 twin paper booklets), TwinPhase2Part1, TwinPhase2Part2 (containing data entered from the phase 2 twin paper booklets), ParentPhase1 (containing data entered from the phase 1 parent booklets), TEDS21progress (containing administrative data relating to data collection), TwinPhase2MedicalRecoding (containing recoded data from verbatim text responses where given), covid1_health_11b_coding and covid2_health_11b_coding (containing recoded data from verbatim text responses in covid phases 1 and 2, where given). The TEDS21progress table currently resides in the TEDS admin database, but will be moved into this Access database at the end of the study.
  • \original files\ subdirectory, containing the rawest data files that were exported from the CMS and backup web servers during the various phases of the TEDS21 study; plus the raw files of paper booklet data returned by Group Sigma after electronic scanning. These files have different forms as described below. These files are processed in the first stages of building the dataset, whereby the files are imported into SPSS, and files from the same data collection (e.g. twin phase 1) are made compatible with each other; the resulting SPSS files are then saved into the \Export\ subdirectory (see below). The files in the \original files\ folder include:
    • \phase 1 parent\ subdirectory:
      contains raw data files exported from the servers for the TEDS 21 phase 1 parent questionnaire study.
      • CMS subdirectory:
        contains the final raw data file exported from the CMS server; this is a text file with fields delimited by the pipe symbol (|).
      • backup subdirectory:
        contains the final raw data files exported from the backup web server. There is a main data file and an admin data file; both were exported as tab-delimited text files. There is also an Excel version of the admin file.
      • scanned booklet subdirectory:
        contains raw data files of scanned data provided by Group Sigma; these are comma-delimited text files. There is also an SPSS script for checking the data and converting to Excel for incorporation in the Access database.
    • \phase 1 twin\ subdirectory:
      contains raw data files exported from the servers for the TEDS 21 phase 1 twin questionnaire study.
      • CMS subdirectory:
        contains the final raw data file exported from the CMS server; this is a text file with fields delimited by the pipe symbol (|).
      • backup subdirectory:
        contains the final raw data files exported from the backup web server. There is a main data file and an admin data file; both were exported as tab-delimited text files. There is also an Excel version of the admin file.
    • \phase 2 twin\ subdirectory:
      contains raw data files exported from the servers for the TEDS 21 phase 2 twin questionnaire study.
      • CMS subdirectory:
        contains the final raw data file exported from the CMS server; this is a text file with fields delimited by the pipe symbol (|).
      • backup subdirectory:
        contains the final raw data files exported from the backup web server. There is a main data file and an admin data file; both were exported as tab-delimited text files. There is also an Excel version of the admin file.
    • \g game\ subdirectory:
      contains the final raw data files exported from the web server for the G-game (Pathfinder) twin cognitive study.
      There is a main data file and an admin data file, both of which were exported from the server as tab-delimited text files. There is also an Excel version of the admin file.
    • \covid phase 1\ subdirectory:
      contains the final raw data files exported from the web server for the Covid-19 phase 1 twin questionnaire study.
      There is a main data file and an admin data file, both of which were exported from the server as tab-delimited text files. There is also an Excel version of the admin file.
    • \covid phase 2\ subdirectory:
      contains the final raw data files exported from the web server for the Covid-19 phase 2 twin questionnaire study.
      There is a main data file and an admin data file, both of which were exported from the server as tab-delimited text files. There is also an Excel version of the admin file.
    • \covid phase 3\ subdirectory:
      contains the final raw data files exported from the web server for the Covid-19 phase 3 twin questionnaire study.
      There is a main data file and an admin data file, both of which were exported from the server as tab-delimited text files. There is also an Excel version of the admin file.
    • \covid phase 4\ subdirectory:
      contains the final raw data files exported from the web server for the Covid-19 phase 4 twin questionnaire study.
      There is a main data file and an admin data file, both of which were exported from the server as tab-delimited text files. There is also an Excel version of the admin file.
    • \covid text coding\ subdirectory:
      contains results from numerically coding a text response question (Health section, question 11b) that was included in phases 1, 2 and 3 of the covid study. The coded data are in a set of Excel files, and there is an SPSS syntax file that aggregates the coded data ready for inclusion in the dataset.
  • \Export\ subdirectory, containing exported 21 Year raw data files. These are files of raw data that are the direct sources of data for building the analysis dataset. These include csv (text) files exported from the Access database described above; and SPSS data files containing raw data that have already been imported into SPSS from the electronic sources (CMS and backup web). The files are called:
    • TEDS21admin.csv (admin data from table TEDS21Progress in the Access database, including questionnaire return dates)
    • ParentPhase1.csv (TEDS21 phase 1 parent paper questionnaire data, exported from the Access database)
    • TwinPhase1Part1.csv, TwinPhase1Part2.csv (TEDS21 phase 1 twin paper questionnaire data, exported from the Access database)
    • TwinPhase2Part1.csv, TwinPhase2Part2.csv (TEDS21 phase 2 twin paper questionnaire data, exported from the Access database)
    • TwinPhase2Medical.csv (TEDS21 phase 2 twin recoded verbatim text responses, exported from the Access database
    • phase1parentpaper.sav (TEDS21 phase 1 parent paper questionnaire data, imported from the raw csv files into SPSS)
    • phase1parentbackup.sav (TEDS21 phase 1 parent questionnaire data, collected via the backup web system, imported into SPSS)
    • phase1parentCMS.sav (TEDS21 phase 1 parent questionnaire data, collected via the CMS system, imported into SPSS)
    • phase1twinpaper.sav (TEDS21 phase 1 twin paper questionnaire data, imported from the raw csv files into SPSS)
    • phase1twinbackup.sav (TEDS21 phase 1 twin questionnaire data, collected via the backup web system, imported into SPSS)
    • phase1twinCMS.sav (TEDS21 phase 1 twin questionnaire data, collected via the CMS system, imported into SPSS)
    • phase2twinpaper.sav (TEDS21 phase 2 twin paper questionnaire data, imported from the raw csv files into SPSS)
    • phase2twinbackup.sav (TEDS21 phase 2 twin questionnaire data, collected via the backup web system, imported into SPSS)
    • phase2twinCMS.sav (TEDS21 phase 2 twin questionnaire data, collected via the CMS system, imported into SPSS)
    • GgameAndCovidUsernames.csv (csv file, exported from the TEDS admin database, containing a list of twin IDs linked to usernames for the G-game study and each phase of the Covid study)
    • ggame.sav (g-game twin data, collected via the web system, imported into SPSS)
    • covid.sav (Covid study twin data, collected via the web system, imported into SPSS; different study phases are merged into a single file)
    • covidq11codeswave1.sav (Covid phase 1 recoded verbatim text responses, aggregated from raw Excel files)
    • covidq11codeswave2.sav (Covid phase 2 recoded verbatim text responses, aggregated from raw Excel files)
    • covidq11codeswave3.sav (Covid phase 3 recoded verbatim text responses, aggregated from raw Excel files)

Data flow

The 21 Year study involved a number of independent data collections, from different participants (parents and twins) and in distinct phases (phase 1, phase 2, g-game, covid phases). In each TEDS 21 data collection (but not in the g-game or covid studies), participants had a choice of methods of providing data: paper booklets, web (backup system) or app/web (CMS system). In addition, TEDS 21 paper booklets were entered by two methods: optical scanning and manual keying. As a result, there are multiple raw data files for each TEDS 21 data collection, and these files went through a series of processing stages in order to build the dataset.

The essential stages, and the files involved at each stage, are set out below as a data flow in the form of an ordered list. These stages were essentially the same for each TEDS 21 data collection (parents and twins, phase 1 and phase 2); with the exception of recoding of verbatim text responses, which was only needed in a few variables of the TEDS 21 twin phase 2 questionnaire.

The initial stages for the g-game and covid studies were somewhat simpler, as each of these data collections was carried out exclusively via the web.

  1. TEDS 21 CMS data
    The CMS system integrated data collected via app (from mobile devices) and web (from desktops and laptops).
    1. During data collection, data from app and web were aggregated on the CMS server.
    2. At the end of data collection, data were exported from the server into a delimited text file. This text file is stored in the \original files\ subdirectory.
    3. During dataset construction, one of the first steps is to import this text file into SPSS and to make the variables compatible with those in files from the other sources. The resulting SPSS file is stored in the \Export\ subdirectory.
  2. TEDS 21 backup web data
    The backup web system is not integrated with the CMS system, hence it produces a distinct set of files.
    1. During data collection, web data were aggregated on the backup server.
    2. At the end of data collection, data were exported from the server into two delimited text file (main data and admin data). These text files are stored in the \original files\ subdirectory.
    3. During dataset construction, one of the first steps is to import the two text files into SPSS, merge them together, and to make the variables compatible with those in files from the other sources. The resulting SPSS file is stored in the \Export\ subdirectory.
  3. TEDS 21 paper booklet data
    1. Paper booklets were entered into two different ways, but the data were eventually aggregated into a single location (the Access database) as follows.
      1. Some booklets were entered manually, by staff in the TEDS office: these data were entered directly into the Access database described above. This included all the twin paper booklets (phase 1 and phase 2) and late returns of the parent paper booklets.
      2. Other booklets (most of the parent booklets) were entered by optical scanning, by an external commerical company (Group Sigma). The raw data from scanning were returned in delimited text files, each file containing data for a batch of booklets. These raw files have been retained and are stored in the \original files\ subdirectory.
      3. After some data cleaning, the data from the raw scanned text files were aggregated into the Access database, alongside the manually entered data.
    2. Before constructing the dataset, the aggregated paper booklet data in the Access database are exported into comma-delimited text files. These files are stored in the \Export\ subdirectory.
    3. During dataset construction, one of the first steps is to import these exported text files into SPSS, merge them together, and to make the variables compatible with those in files from the other sources. The resulting SPSS file is stored in the \Export\ subdirectory.
  4. At this stage, for each data collection in TEDS 21, there are 3 SPSS data files of raw data: one each from the CMS, backup web and paper booklets. The cases from these files can now be merged into a single file. This is part of the dataset construction process carried out by the scripts described on the 21 year processing page.
  5. Recoding of verbatim text responses in the TEDS 21 twin phase 2 questionnaire (Medical Conditions and Self-Harm measures). This recoding was carried out after the data (CMS, backup and paper) had been processed as outlined above.
    1. The verbatim text responses, and associated numeric variables, were copied from each version of the twin phase 2 questionnaire (CMS, backup, paper) into a table of the Access database.
    2. The verbatim responses were then manually recoded into numeric categorical variables. These numeric categories included copies of existing variables (e.g. the Autism and ADHD responses) as well as a few new variables (e.g. 'other' methods of self-harm).
    3. Before constructing the dataset, the numeric variables from this table are exported into a comma-delimited text file. This file is stored in the \Export\ subdirectory.
    4. During dataset construction, this file is imported into SPSS before being merged with the other files of twin data. The cleaned numeric variables for medical categories in this file replace equivalent variables in the raw data.
  6. G-game data
    These were collected exclusively via the web. During data collection, data were aggregated on the web server; at the end of data collection, the data were downloaded then deleted from the web server in two tab-delimited text files (admin data and main data file). These text files are stored in the \original files\ subdirectory. During dataset construction, these two files are imported into SPSS, merged together, merged with a further file linking the logins to twin IDs, and the resulting file is stored in the \Export\ subdirectory.
  7. Covid study data
    These were collected exclusively via the web. There were separate data collections for each phase of the study, in which the questionnaire was repeated with slight variations. During data collection for each phase, data were aggregated on the web server; at the end of data collection, the data were downloaded then deleted from the web server in two tab-delimited text files (admin data and main data file). These text files are stored in the \original files\ subdirectory. During dataset construction, for each phase, these two files are imported into SPSS, merged together, and merged with a further file linking the study logins to twin IDs. The files for different study phases are then merged (using the twin ID) into a single file, and the resulting file is stored in the \Export\ subdirectory.
  8. After the preliminaries described above, the various files can be merged together to create one large dataset, containing data for all the 21 Year Study data collections (TEDS 21, G-game, Covid). Details are described further in the 21 year processing page.

Dataset files

These files are currently stored in the \System\Datasets\21yr\ folder. The following list refers to items within this folder.

  • Udb9456_full.sav - the SPSS version of the full 21 Year dataset, including every variable
  • \working files\ - this subdirectory contains various intermediate files, saved during the process of converting the raw data into the dataset. These files include working datasets u2merge, u3clean, u4derive, u5label, u6double (all .sav files), saved at the end of scripts 2 to 6. The latter file is identical (except for the name) to the full dataset mentioned above.

Syntax files (scripts)

These files are currently stored in the \System\Scripts\21yr\ folder.
Note that these are SPSS syntax files. The names of the scripts are U1a_import_CMS, U1b_import_backup, U1c_import_paper, U1d_import_ggame, U1e_import_covid, U2_merge, U3_clean, U4_derive, U5_label, U6_double (all .sps files). The processing carried out by these scripts is described on another page.