Coding used during data collection
For a description of how the data were collected, coded and entered during the 7 Year study, see the main 7 Year Study page. Although there were numerous changes to data collection and data entry procedures over the course of the study, the actual data items were kept the same. If any changes were made to the item coding at the point of data entry, then the data from all cohorts were recoded or restructured to ensure consistency.
The twin telephone interviews consisted largely of cognitive and reading tests, in which twin responses had to be coded as scores. Where possible, interviewers coded the responses as they were made on the telephone, so that numeric scores/codes were recorded rather than verbatim records of what the twin had said. Where this was difficult or impossible to do, callers recorded the verbatim responses during the interview, and the text were coded into scores afterwards. The twin telephone interview script (pdf) shows scoring rules for most of the tests; the rules for the Similarities and Vocabulary tests (pdfs) are shown separately because these were generally coded afterwards. For the TOWRE tests, scoring simply involved a count of the number of correct responses made during the allowed time. The coding used in the cleaned raw twin data, for all tests, is summarized in the twin item coding (pdf).
Parent data were collected using telephone interviews in some cases, and using paper booklets in other cases. The questionnaire items were identical for both methods. Most items had multiple-choice responses, captured using tick-boxes in the booklets. The coding used for such items is shown in the annotated parent booklet (pdf). Other items required free responses, which were initially recorded verbatim; these text responses have subsequently been coded into numeric categories as shown in the document. See also coding parent occupations below.
Teacher data were collected using paper questionnaires. Item data were captured using tick-boxes, and these were converted to numerical codes as shown in the teacher raw data coding (pdf).
Coding used at data entry
Data entry methods varied over the course of the study, as described in the main 7 Year page. A more general description of data entry methods is on another page. The 7 Year study is unusual for the wide variety of data entry methods used:
- CATI (by NOP staff) for twin and parent telephone interviews in cohort 1
- Scanning (by Group Sigma) for twin and parent data from cohort 2 onwards
- Manual data entry (by NOP staff) for teacher questionnaires in cohort 1
- Scanning (by NOP) for teacher questionnaires in cohort 2
- Scanning (by Group Sigma) for teacher questionnaires from cohort 3 onwards
For scanned data, value coding (for categorical responses) was built into the initial scan setup. For manual data entry, data entry staff were given rules for coding each item as it was entered; these rules were either displayed on screen, or printed in documents.
In general, the field names and value coding that were initially used by NOP were translated directly into the item variables that are now used in the cleaned raw data. However, the optical scanning carried out by Group Sigma required a different approach. Firstly, the scanned data items were identified not by field names, but by position in the resulting data file. Secondly, some items required a change in the value coding for the purposes of the scan. Thirdly, missing or not applicable responses were denoted simply using blanks in the data, rather than using specific value codes such as -99 or -77. The files of scanned data returned by Group Sigma were processed (using SPSS scripts) in order to recode at add variable names for consistency with the rest of the raw data.
The original raw data entered in the various ways described above have subsequently been cleaned and aggregated together. The cleaned raw data are now stored in an Access database, from which files are exported to make the dataset. The numerous original raw data files from NOP and Group Sigma have not been retained. See the 7 year study data files and data processing pages for more information.
The coding of the cleaned raw data is documented in the annotated documents: parent booklet, twin score sheet and teacher questionnaire (pdfs). All three documents show item variable names and value coding, both in the cleaned raw data (blue font) and in the dataset (red font).
Coding parent occupations
The parent interview/booklet includes questions about the occupations of the respondent parent and his/her partner. These data were collected with the aim of deriving occupational codes using the SOC classifications, as had been done in the 1st Contact data. However, this coding was not done at the time of data collection or data entry in the 7 Year Study. The SOC coding of the 7 Year data was done in 2004/05, and the resulting codes are now incorporated into the analysis dataset.
The SOC coding was carried out by TEDS staff. Each parent and partner occupation was assigned to one of the nine major SOC groups, as shown on the first page of the Standard Occupational Classification 2000 (SOC2000) guide (pdf). The sub-categories used on subsequent pages were not coded, but were found very useful because they give examples of occupations falling into each of the major groups.
During coding, any occupations not clearly falling into one of the 9 major groups were set aside for discussion in a meeting of the coders. Any occupations that could still not be coded after discussion were given missing values for the codes.
Coding parent educational qualifications
The parent data includes questions about the highest level of educational qualifications of the respondent parent and his/her partner. The main response options were "no qualifications", and 6 categories listed in increasing order of educational level. However, an "other qualifications" option was also provided; for this option, a description was requested, and recorded as verbatim text.
After the 7 Year Study, when all data had been entered, it was found that a large number of "other" responses had been recorded. In order to classify these responses more precisely, the verbatim text descriptions were examined with the aim of recoding "other" to one of the main response options. This recoding was carried out by TEDS staff in 2005.
In order to achieve consistency in coding, a qualifications coding guide (pdf) was devised by mutual agreement between the coders, and with reference to published information where available. Problematic cases were brought to meetings between the coders for discussion.
Wherever a recoding decision was made, the "other" value code (0) was replaced by the relevant specific value code (2 to 7) in the same variable. This change was made in the raw data, and has been carried through into the analysis dataset.