Network file storage
The primary copies of all the TEDS data files are securely stored within the KCL network, according to KCL guidelines for storage of confidential data. For reasons of confidentiality and security, details of the network locations are not given here. The phenotypic data files contain identifiable personal data relating to TEDS participants. Access to the file storage is therefore restricted to TEDS admin staff who require access for administering TEDS studies and processing the data. The file storage is specific to the TEDS study and is not accessible to other KCL staff.
Within the centralised TEDS file storage outlined above, all phenotypic data (raw data, scripts, datasets) are stored within a folder named \SYSTEM\.
Backup copies of the phenotypic data files, along with archived files that are no longer in current use, are stored separately on the TEDS archive server, located on the KCL network. This storage is devoted to TEDS data and is not shared; it is accessible only by TEDS admin staff. Files stored here include old versions of datasets, datasets merged and prepared for specific projects and old scripts. This storage is also used for various backup files including the genotypic data.
Organisation of data files
The TEDS phenotypic data files are stored within the \SYSTEM\ folder as mentioned above. The files are organized as follows:
This folder contains the raw data collected during each of the main TEDS studies. Wherever possible, the un-cleaned raw data files, in their original formats, have been saved for future reference. However in all cases, copies of the cleaned raw data are also saved formats (typically .csv text files) that can be used more directly to build the analysis datasets.
This folder contains the datasets that are directly used for analysis. There are also sub-directories containing some of the working files that are produced during the creation of the analysis datasets. These are generally SPSS data files.
This folder contains the syntax files (scripts) that are used to convert the cleaned raw data into the analysis dataset for each study.
Each of these three folders contains a set of sub-directories, one for each TEDS study: \1c\ (1st Contact), \2yr\ (2 Year), and so on.
The processes used to convert the original raw phenotypic data into the final analysis datasets are as follows, in most cases:
- Code and enter the original data to create electronic files of uncleaned raw data.
- Clean the raw data, using copies of the data files so as to leave the original raw data unchanged.
- Aggregate and store the cleaned raw data in an Access database file.
- Before making a new version of a dataset, export cleaned raw data from the Access database into csv text files.
- Use the scripts to import the csv data files into SPSS, and then to carry out further processing to build the analysis dataset.
All the steps above are documented (both in this data dictionary and elsewhere). Steps 4 and 5 are easily repeatable, and are designed to allow each dataset easily to be re-made at appropriate times (for example, after addition of new raw data, after further cleaning of the raw data, or after changes to the scripts). The earlier steps, 1 to 3, are repeatable in theory, although this might be extremely time-consuming and difficult in practice. Not all of the original records of the raw data have been retained, particularly from the earliest TEDS studies.
Raw data files
The raw data folder for each study (e.g. \SYSTEM\Rawdata\7yr\ for the 7 Year study) generally contains the following files and folders. For more details for a given study, follow the link at the top left of this page.
- An Access database file (e.g. 7yr.accdb
for the 7 Year study).
This database contains the cleaned and aggregated raw data from the study, usually divided into various tables for various components of the data. These components typically include various booklets and questionnaires and a record of admin data such as return dates (but not web test data). The Access database also contains queries and macros to enable the data conveniently to be exported for building the analysis dataset.
- \Export\ sub-directory.
This folder contains csv files of cleaned raw data, which have been exported from the Access database. The scripts convert these files into the analysis dataset.
- \original data files\ or related
These vary between studies. In web studies, these folders will contain the raw "analysis" files (delimited text files) exported from the web server. In earlier studies, these folders may contain original raw data files produced during scanning or manual data entry of paper booklets. In most cases, the original data files are stored in zip archives in order to economize on storage space. In some cases, along with the original source files, there are intermediate working files that were used during initial processing and cleaning of the data.
The original source files have various formats. In most cases, these are either Excel workbooks (sometimes containing several sheets), or plain text files (typically with comma-delimited variables). In the case of Excel files, usually a copy has been made and saved as a csv file, for maximum future software compatibility. The original source files of twin web test data (age 10 upwards) are "analysis files" generated on the web server; these are delimited plain-text files (typically comma-delimited).
The exported files of cleaned raw data, in the \Export\ sub-directory, are mostly plain text files with comma-separated values, saved as .csv files. This is a standard file type, which can be opened by a wide variety of programs (including SAS, SPSS, Excel, and text editors such as Notepad); furthermore, there is no restriction on the number of columns or rows of data in such a file (as there would be in, for example, an Excel spreadsheet). The main exceptions to this rule are the files of twin web test data (collected from age 10 onwards), where the aggregated data for each web test are saved in an SPSS data file, with .sav file extension.
The dataset folder for each study (e.g. \SYSTEM\Datasets\7yr\ for the 7 Year study) generally contains the following items. For more detail, follow the links at the top left of this page for each study.
- The current version of the analysis dataset for general use.
- The \working files\ sub-directory,
containing various intermediate files created (by the scripts) during the processing of data to make the analysis dataset.
The analysis datasets are all constructed using SPSS scripts, and saved as SPSS data files, with the .sav file extension. The intermediate files in the \working files\ sub-directory are also SPSS data files.
The scripts folder for each study (e.g. \SYSTEM\Scripts\7yr\ for the 7 Year study) contains a set of syntax files used to make the dataset.
Generally, the processing of the data involves very many lines of syntax, which is why the syntax has been split into several script files. The scripts must be run in the correct order, and usually the script file names contain numbering to make the order clear. Where possible, each script carries out a logically related set of processing steps, e.g. importing and merging data, labelling variables, creating scales, etc.
The scripts are SPSS syntax files (with the .sps file extension). SPSS scripts are in fact plain text files, so they can be opened and read using a text editor such as Notepad. However, they must be saved with the correct file extension in order to be recognised by the SPSS software.