First letter prefix: TEDS study
One of the most consistent variable naming conventions, applying across all the main TEDS studies, is the use of the first letter to denote the TEDS study. The ordering 1 to 26 of the letters 'a' to 'z' in the English alphabet has been used to denote the twin age (in years) corresponding to each main TEDS study. Hence, in the 1st Contact dataset all variables start with 'a' because the study was planned to collect data when twins were aged roughly 1 year; in the 2 Year study, all variables start with 'b', and so on.
This correspondence of this first letter with actual twin age has become increasingly approximate in later studies. As described elsewhere, later studies have often involved multiple data collections at different times, and sometimes data have been collected from several TEDS cohorts at the same time, leading to a range of twin ages within each dataset. However, in all cases, the convenient study name corresponds with the variable name prefix, even if not very exactly with actual twin ages.
Variable name prefix | TEDS study | Approximate actual twin ages (years) |
---|---|---|
a | 1st Contact | 1.5 |
b | 2 Year | 2 |
c | 3 Year | 3 |
d | 4 Year | 4 |
e | In Home | 4 to 5 |
g | 7 Year | 6.5 to 7.5 |
h | 8 Year | 7 to 9 |
i | 9 Year | 8.5 to 9.5 |
j | 10 Year | 9.5 to 10.5 |
l | 12 Year | 10.5 to 12.5 |
n | 14 Year | 12 to 14 |
p | 16 Year | 15 to 17 |
r | 18 Year | 18 to 20 |
u | 21 Year | 21 to 26 |
z | 26 Year | 26 to 30 |
Additional prefixes: data collections
Later TEDS studies have often involved multiple data collections. Beginning with the 7 Year study, questionnaires were collected from twins' teachers as well as from parents and twins. Starting with the 9 Year study, some of the same measures were collected simultaneously from parents and from twins themselves. Starting with the 10 Year study, web data collections were used alongside paper booklet data collections. In the 16, 18 and 21 Year studies, there were multiple data collections from twins themselves, carried out independently and at different times.
To differentiate between variables from different data collections within the same main TEDS study, additional prefixes (directly after the first letter) have been used as shown in the table below. While these prefixes have not been used entirely systematically, they do at least serve to distinguish between similar or identical measures from different data collections within the same main study.
The most consistently used second letters for these prefixes are as follows:
- p: parent data
- c: twin (child) data
- t: teacher data
TEDS study | Variable name prefix | Data collection |
---|---|---|
In Home | ec | Twin tests |
ep | Parent questionnaire | |
epv | Post-visit questionnaire | |
7 Year | g | Parent questionnaire, twin phone interviews |
gt | Teacher questionnaire | |
9 Year | ip | Parent questionnaire |
ic | Twin questionnaire | |
it | Teacher questionnaire | |
10 Year | j | Twin web data |
jpq | Parent web questionnaire | |
jt | Teacher questionnaire | |
12 Year | lp | Parent questionnaire |
lpnc | Parent reported NC levels | |
lt | Teacher questionnaire | |
lc | Twin questionnaire | |
l | Twin web and phone tests | |
14 Year | np | Parent questionnaire |
nsl | Parent SLQ | |
nt | Teacher questionnaire | |
nc | Twin questionnaire | |
n | Twin web tests | |
16 Year | ppbh | Parent 'behaviour' questionnaire |
ppl2 | Parent 'Leap-2' questionnaire | |
pp | Parent web questionnaire | |
pcex | Twin exam results | |
pcbh | Twin 'behaviour' questionnaire | |
pcl2 | Twin 'Leap-2' questionnaire | |
p | Twin web activities | |
18 Year | rcq | Twin 18 year questionnaire |
rcp | Twin Perception web study | |
rcb | Twin Bricks web study | |
rck | Twin Kings Challenge web study | |
rcn | Twin Navigation web study | |
rcf | Twin FFMP web study | |
21 Year | u1p | Parent TEDS21 phase 1 questionnaire |
u1c | Twin TEDS21 phase 1 questionnaire | |
u2c | Twin TEDS21 phase 2 questionnaire | |
ucg | Twin G-game web study | |
ucv1 | Twin Covid study phase 1 questionnaire | |
ucv2 | Twin Covid study phase 2 questionnaire | |
ucv3 | Twin Covid study phase 3 questionnaire | |
ucv4 | Twin Covid study phase 4 questionnaire | |
26 Year | zmh | Twin TEDS26 mental health questionnaire (MHQ) |
Abbreviations for measures and items and scales
After the prefixes described above, in cases where the variables form a set of items from a named measure, the next parts of the variable name are typically an acronym or abbreviation of the measure name followed by an item number.
There are too many measures in the TEDS dataset to list their variable name abbreviations here. There are some measures that were included in many TEDS data collections, although for historical reasons the same measure does not always have the some variable name abbreviation in different datasets. For full details, see the study variables lists (links top left on this page) and other pages such as the questionnaires annotated with dataset variable names. Here are a few illustrative examples, showing the extended variable name prefix (study, measure, item):
- dbh09: 4 Year study, Behaviour section, item 9 (the Behaviour section in fact included several measures with items randomly mixed)
- gcg2c: 7 Year study, Conceptual Grouping measure (twin phone test), item 2, part c
- jpc07s: 10 Year study, Picture Completion web test, item 7, score
- ltaps19: 12 Year study, teacher questionnaire, APSD measure, item 19
- pcbhsdq20r: 16 Year study, twin 'Behaviour' questionnaire, SDQ measure, item 20, reversed version
- u2cvict05: 21 Year study, TEDS21 phase 2, twin questionnaire, Victimisation measure, item 5
The abbreviation denoting the measure typically has two, three or four letters as shown in these examples.
The item numbering reflects the ordering of items as presented to participants in the original questionnaire or test. Where there were fewer than 10 items, they are numbered 1-X with a single digit. Where there were 10 or more items, they are number 01-XX with two digits.
As in some of the examples above, the item number may be followed by one or more additional letters as further descriptors. Examples are the use of 'r' to denote a reverse-coded version of an item (e.g. pcbhsdq20r); the use of letters (a, b, c, ...) to denote parts of a multi-part question (e.g. gcg2c); the use of various letters to denote different measurements for a test item (for example, 's' for score, 'a' for answer, 'rt' for response time), e.g. jpc07s.
Many derived variables are computed as means or totals from the items of a measure. These derived variables are usually referred to as scales (total or mean of all items in a measure), subscales (total or mean of a subset of items), total scores (total of test item scores) and composites (typically derived from more than one measure). Where the variable is derived from items of a single measure, usually the variable name includes the same abbreviation of the measure as used in the items; the item number is then usually replaced by one of these suffixes:
- t: denotes a 'total' of measure items (although often computed as a re-scaled mean)
- tot: sometimes used instead of t for test total scores
- m: denotes a simple mean of measure items
- xxt or xxm: a subscale, where 'xx' is replaced by two or three letters forming an abbreviation of the subscale name
Here are some illustrative examples of scale/subscale/score variable names:
- lgktot: 12 Year study, General Knowledge twin web test, total score
- ppbhconnt: 16 Year study, parent 'Behaviour' questionnaire, Conners measure overall total score
- ppbhconnimpt: 16 Year study, parent 'Behaviour' questionnaire, Conners measure, subscale for impulsivity
- u1cpilm: 21 Year study, TEDS 21 phase 1, twin questionnaire, Purpose In Life measure, mean score
Suffix for twin variables
Twin-specific phenotypic data in the TEDS datasets always comprise 'double entered' variables (see glossary for further explanation). This means that any variable referring to a specific twin, including items and derived variables, and including data collected from parents and teachers as well as from twins themselves, is effectively duplicated within the dataset: the same variable value is shown once for the twin in question, and again as a co-twin variable for the other twin of the same pair.
Each twin variable therefore effectively appears as a pair of variables, distinguished by the variable name suffix: 1 or 2. A variable name ending in 1 contains data for the twin identified in any given row of data in the dataset; the same variable but with name ending in 2 contains data, from the same source, for the co-twin.
For this reason, a twin variable in dataset may often be referenced as a pair of variables with '1/2' written at the end, e.g. u1cpilm1/2
Here are some illustrative examples, using some of the same variable name prefixes that were illustrated above.
Variable description | Twin variable name | Co-twin variable name | Twin pair variables |
---|---|---|---|
12 Year study, teacher questionnaire, APSD measure, item 19 | ltaps191 | ltaps192 | ltaps191/2 |
16 Year study, parent 'Behaviour' questionnaire, Conners measure overall total score | ppbhconnt1 | ppbhconnt2 | ppbhconnt1/2 |
Wherever possible, variable name suffixes '1' and '2' have been avoided for variables that apply to the parent, the family or the twins as a pair rather than individually.
Exceptions
The TEDS dataset contains many thousands of variables, from data collections going back over many years. Some variables were named historically without thought of consistent and systematic naming across future data collections. Other variables simply do not fit into the patterns described above. There are therefore many exceptions to the variable naming conventions described above. This sections briefly describes a few of the many types of exceptions to the variable naming rules.
Variables that do not originate from a specific TEDS study do not have the prefix a, b, c, etc. Examples of these are the background variables, ID variables, and non-phenotypic variables such as polygenic scores.
Item variable names may not follow the naming convention described above, of measure abbreviation followed by item number. In some cases this may be because the item is not part of a clearly defined measure. Even where the measure is clearly defined, the items may not follow a clearly numbered sequence or they may have widely-varying formats, so item numbering may not be used. The 1st Contact dataset contains many variables of these sorts; while these variables do have the 'a' prefix (denoting 1st Contact), and do have the 1/2 suffix if twin-specific, the remainder of the variable name is typically an abbreviation of some description of the question.
Some derived variables like twin ages do not relate to any given measure, even if they do relate to a specific study. Other derived variables, usually referred to as 'composites', are derived from more than one measure. Variables of these sorts will generally have abbreviated descriptive names although with prefixes and suffixes as above if appropriate. Examples include pcbhage1/2 (twin age in the 16 Year study when the twin 'Behaviour' questionnaire was returned) and drawg1/2 (4 Year study, general cognitive ability or 'g' composite, derived from several different cognitive tests).