Data collection and data entry
Data collection periods were linked to the school year (September to August) in which twins reached the age of 7 years. The reason for this was to ensure that all twins were in the same school year when the teacher data were collected. Hence, families were divided into cohorts according to when their twins were born (see table below for details). Whereas in earlier TEDS studies each family was contacted close to their twins' birthday, in this study (and subsequent studies) all families in a cohort were contacted at once. This simplified administration of the study. Twin ages vary from roughly six and a half to seven and a half years, at the time when the data were collected.
The September-95 to August-96 cohort was split into two sub-samples: cohort 3 (Sep-95 to Dec-95) and cohort 4 (Jan-96 to Aug-96). This aided administration because the twin telephone study included the 1994 and 1995 twin births (cohorts 1-3) but not the 1996 twin births (cohorts 4-5).
The measures used were identical in all cohorts, except that one of the twin telephone tests was dropped after cohort 1. The table and paragraphs below describe specific data collection and data entry procedures used in each cohort. Other pages describe coding issues relating to the 7 Year data, the data files in which the 7 Year data are stored, and the processing used to make the analysis dataset. More general pages (not specific to the 7 Year study) describe data entry and data cleaning issues.
Data collection and data entry procedures varied greatly between cohorts. The main variations are presented in the table below. More detailed descriptions of the procedures used in each cohort follow after the table.
|Cohort:||Cohort 1||Cohort 2||Cohort 3||Cohort 4||Cohort 5|
|Twin birth dates:||Jan-94 to Aug-94||Sep-94 to Aug-95||Sep-95 to Dec-95||Jan-96 to Aug-96||Sep-96 to Dec-96|
|Number of families contacted:||3256||4862||1693||3090||1680|
|Period of data collection:||2000/01||2001/02||2002/03||2003/04|
|Personnel used for phone calls:||NOP employees||TEDS employees||(not applicable)|
|Consent collection:||Written consent form, followed up by telephone calls.||Written consent form only (attached to parent booklet).|
|Parent data collection:||By telephone interview.
Data recorded electronically during interview.
|Parents chose either a telephone interview or a paper
For telephone interviews, interviewers recorded data in paper booklets.
|By paper booklet sent to parents.|
|Parent data entry:||Electronically entered by NOP at the time of interview.||Booklets scanned by Group Sigma.|
|Twin data collection:||By telephone interview.
Data recorded electronically during interview.
|By telephone interview.
Data recorded in paper score sheets.
|Twin data entry:||Electronically entered by NOP at the time of interview.||Score sheets scanned by Group Sigma.||(not applicable)|
|Teacher data collection:||Paper questionnaires posted to teachers, and returned to NOP.||Paper questionnaires posted to teachers, and returned to the TEDS office.|
|Teacher data entry:||Questionnaires entered manually by NOP.||Questionnaires scanned by NOP.||Questionnaires scanned by Group Sigma.|
Twin telephone tests
The twin telephone tests were only carried out for twins born in 1994 or 1995, as shown in the table above. In cohort 1, these tests were carried out by callers at NOP, while in cohorts 2 and 3 the tests were carried out by TEDS callers.
Families that had given initial consent (in writing or by telephone) were sent the test stimulus booklet (pdf) before they were contacted for the tests themselves. Twins were directed to look at pages of the stimulus booklet during the tests. The instructions for carrying out the tests are contained in the twin interview script (pdf), which was given to each TEDS caller. NOP callers used computer aided telephone interviews (CATI), and an equivalent script appeared on screen for the NOP callers to follow. During the call, TEDS callers recorded the twin responses in the twin score sheet (pdf), while NOP callers recorded responses directly into their CATI system. The recorded responses (on the score sheets, or in the CATI system) consist partly of numeric scores or codes (coded during the interview itself), and partly of verbatim text responses. The latter were scored directly after interview by the callers themselves, and the scores recorded in the score sheets or CATI system.
Scoring rules for most tests are contained within the interview script, but there are separate, more detailed scoring rules for the similarities and vocabulary tests (pdfs). Further checks on test scoring were made later by TEDS staff, with corrections and feedback to callers where appropriate. During data entry (by scanning) from the score sheets, only the coded and scored data were taken up, not the verbatim text responses. Likewise, in the data returned by NOP from their CATI system, only the scored/coded data have been retained for analysis, and not the verbatim text responses. The coding and scoring used in the raw data and in the analysis dataset are shown in separate documents (pdfs).
Some of the tests had discontinue rules as follows:
- Conceptual Grouping test: from question 4 onwards, discontinue if three consecutive items are wrong, missing or "don't know" responses
- Similarities test: from question 3 onwards, discontinue if three consecutive items are wrong (zero score), missing or "don't know" responses
- Vocabulary test: from question 2 onwards, discontinue if three consecutive items are wrong (zero score), missing or "don't know" responses
- Picture Completion test: from question 3 onwards, discontinue if three consecutive items are wrong (zero score), missing, timed out, or "don't know" responses
These discontinue rules were intended to be enforced by the callers, in order to shorten the tests for twins that found them too difficult. It was later found that the rules had not been enforced uniformly, so that there were inconsistencies in the scoring. A systematic search for discrepancies in the application of discontinue rules was included in subsequent data cleaning: where it was found that a test had been continued in error, corrections were made by deleting responses and scores after the correct discontinue point.
Each family was initially contacted by sending a consent form (pdf) by post (this version of the consent form is the one used in cohorts 2 and 3; in cohort 1 there was no option for parents of filling in a booklet themselves). In the consent form, parents were asked for consent to participate in the telephone study (they were also asked for their telephone numbers), and they were also asked for consent to contact the twins' teachers (they were also asked to provide teacher/school contact details). Up to two written reminders were sent to families that did not return the consent forms promptly. These were followed up by telephone calls to seek verbal consent. For families with phone problems, a further postcard reminder (called a "gold card") was sent. All families that gave written or verbal consent were sent a stimulus booklet (pdf) by post, in readiness for the twin telephone interviews. At the same time, t-shirts were sent to the twins as a reward for their participation. The families were then contacted for telephone interviews with the twins and a parent.
A small number (around 200) of families with special needs, for example twin medical conditions, were interviewed by TEDS staff. However, the vast majority were interviewed by employees of NOP. The necessary family details, such as IDs, names and telephone numbers, were sent to NOP, where families were allocated to specific callers. Scripts for the parent and twin interviews (pdfs) were provided for all callers.
NOP staff used Computer Aided Telephone Interviews (CATI) for collection of the data during a telephone call. This was essentially an electronic data entry system, which allowed callers to enter the data directly during the interview. In most cases, the parent and both twins of a family were interviewed during a single telephone call. Most data were coded/scored by the callers during the interview and data entry process; however, for some twin telephone tests in particular (notably the TOWRE, Similarities and Vocabulary tests), the callers recorded twin responses verbatim. Other data recorded by NOP (and returned to TEDS) were the dates and times of the interviews, and the identities of the callers. On completion of the twin telephone interviews, the twins were sent certificates as a reward.
One of the items in the parent interview requests the heights and weights of the twins, in either imperial or metric units. As parents often did not know these data, and could not be expected to measure their twins during the interview, the height and weight measurements were often missing in the parent interview data. Special postcards were therefore prepared to collect the twin heights and weights from parents. The TEDS address was printed on the postcard, with postage pre-paid; on the reverse were spaces for parents to record twin heights and weights, and the FamilyID was printed on the card in order to identify the family. The data from these postcards were electronically entered by hand in the TEDS office.
For each caller, one or more telephone interviews were recorded onto audio cassette tape, and these tapes were returned to TEDS for caller evaluation. Each NOP caller was then graded for the quality of their interviews, and a grade from 1 (very good) to 8 (very bad) was recorded for the caller.
Data from the parent and twin interviews were exported from NOP's CATI instruments, and returned in the form of plain text files. These files were returned in several batches during the course of the study for cohort 1. These raw data files are described in more detail in the 7 Year data files page. Some twin responses returned in the files consisted of raw verbatim text; these verbatims were coded by TEDS staff (see Similarities and Vocabulary scoring (pdfs) for details). The field names and value codes used by NOP in the raw data are shown in blue in annotated versions of the parent booklet and twin score sheet (pdfs).
The teacher study included all families that had given consent and provided contact details for the twins' teachers and schools. Teacher questionnaires (pdf) were sent directly to teachers, and were not seen by the families themselves. Each teacher also received a return envelope addressed to NOP. Up to three written reminders were sent to teachers that had not returned their questionnaires promptly.
Teachers posted their completed questionnaires directly to NOP, where they were entered electronically by hand. The entered data for cohort 1 were returned to TEDS in a single Excel spreadsheet. The filed names and value codes used by NOP in the raw data are shown in blue in an annotated version of the teacher questionnaire (pdf).
Cohorts 2 and 3
The data collection procedures in cohorts 2 and 3 were broadly similar to those used in cohort 1. However, there were many differences of detail, which are described here.
At the end of cohort 1, there were concerns about the quality of telephone interviews, and about some administrative procedures involving NOP. Therefore for cohorts 2 and 3, TEDS recruited its own callers and NOP were no longer used for telephone interviews.
Whereas in cohort 1 all parent data were collected via telephone interviews, in cohorts 2 and 3 parents were offered the alternative option of filling in a paper booklet (pdf). The items in the booklet were identical to the items in the interview. The choice of booklet or interview was offered by adding an item to the consent form (pdf). Where parents did not express a preference, they were sent a paper booklet. Parents who received the paper booklet but did not return it promptly were sent up to two written reminders, followed up by a telephone reminder.
As in cohort 1, for parents having telephone interviews, twin heights and weights were collected using postcards. This was not necessary for parents who completed the booklets.
In cohort 1, parent data were entered electronically during the interview using CATI (see above). But in cohorts 2 and 3, where parents were interviewed by phone, callers recorded the data in paper booklets, and the data were electronically entered at a later stage. So from cohort 2 onwards, all parent data was initially recorded in paper booklets.
A similar approach was used for twin telephone interview data. Instead of using CATI, interviewers recorded the twin data in paper score sheets (pdf). Also, callers were asked to code the data themselves, using the score sheet, immediately after the interview, rather than returning the data to TEDS for coding. A random sample of each caller's score sheets were checked (by TEDS staff) for accuracy of coding; if necessary, feedback was given to the caller, and corrections were made.
In addition to the score sheets, callers were asked to report any concerns or problems with the interviews, which might have interfered with the quality of the data. Any such concerns were logged in the TEDS admin database, and have been coded as variables in the dataset.
The collection of teacher data in cohort 2 followed the same procedure as in cohort 1. However, from cohort 3 onwards the teacher questionnaires were returned directly to TEDS, not to NOP.
Data entry procedures in cohorts 2 and 3 were quite different from those used in cohort 1. As stated above, from cohort 2 onwards the parent and twin data was initially recorded on paper (in booklets and score sheets). These data were subsequently taken up electronically by optical scanning - the booklets and score sheets had been designed with this in mind. The scanning of the data was carried out by a company called Group Sigma. The booklets and score sheets were delivered to Group Sigma from the TEDS office in batches. After scanning each batch, Group Sigma returned the parent and twin data in plain text files (the data files are described in detail in another page). The parent booklet contains many free text fields, which could not be scanned; the verbatim text from these fields was instead entered manually by Group Sigma into Excel spreadsheets. The twin score sheets were fully coded before scanning, so only numeric data were taken up.
Whereas in cohort 1 the teacher questionnaires were electronically entered by hand, from cohort 2 onwards the questionnaires were optically scanned. In cohort 2, the scanning was carried out by NOP (the questionnaires had been returned directly to NOP by the teachers). This scanning was done in a single large batch, and the data were returned in a csv file (comma-delimited plain text). From cohort 3 onwards, the teacher questionnaires were returned directly to TEDS, and were then sent to Group Sigma for scanning. As with the parent and twin data, this was done in batches, and the data were returned in plain text files.
There are annotated versions of the parent booklet, twin score sheet and teacher questionnaire (pdfs). These documents show, in green, the scan positions used for each field in the Group Sigma raw data files. They also show, in blue, the field names and value codes that were used by NOP in their raw data files; these field names are still retained in the cleaned raw data.
Cohorts 4 and 5
In cohorts 4 and 5, the procedures used for collecting and entering the parent and teacher data were broadly the same as in cohort 3. Any differences are described below. Note that twin telephone interviews ceased after cohort 3.
For cohorts 4 and 5, parent data were collected entirely using the paper booklets, without the option of telephone interviews. No postcards for twin heights and weights were required.
As consent was no longer required in advance for telephone interviews, the consent form was not sent separately but was attached to the front of the parent booklet. Hence, parents completed and returned the consent and booklet together. The text of the consent form was modified to remove references to telephone interviews.
Parents that did not return the booklet/consent promptly were sent up to three written reminders (but were not reminded by telephone).
In cohort 5, the parent booklet/consent was sent simultaneously with the "8 Year" CAST questionnaire (twins were on average aged 7 not 8 years for this cohort). See the 8 Year Study for further details.