TEDS Data Dictionary

Cleaning Raw Web Data

Contents of this page:


The web data collected by TEDS from twins (and occasionally from parents) are superficially clean in the following senses:

  • All data were recorded directly by the subjects themselves, eliminating the sorts of errors that normally occur during data entry from paper questionnaires.
  • Every web test was rigorously checked and tested during production, to ensure that (under normal conditions) the responses entered at the computer terminal were correctly translated into electronic coded form.
  • Test rules were rigorously programmed and tested, theoretically eliminating errors in the conduct of each test. Depending on the test, the programmed rules may govern properties such as response validation, response time limits, item scoring, branching, discontinuing, and computation of a total score.
  • Each instance of a twin web test at a given age is unique and correctly identified by twin ID. The rigorous web login procedure, and the programmed rules allowing a twin to attempt each test only once, ensure that neither duplicates nor unidentified test instances occurred.
  • Examination of the raw data gathered during web testing has confirmed that test rules (such as scoring, branching and discontinuing) had been correctly followed in the vast majority of cases.

However, although the web test data are very largely clean in the senses above, the raw data contain various types of peculiarities and anomalies. Firstly, tests may contain branching, discontinue and timeout rules that can result in missing item values in the raw data. Secondly, individual tests may have their own peculiarities that cause differences between tests in the way that item data are recorded and coded. Thirdly, errors can occur within individual test items through faults in the computer system or through actions (intentional or unintentional) of the twins while taking the tests. Fourthly, in rare cases an entire test may malfunction in some way, for example a branching or timeout rule may fail to function correctly, or the test as a whole may crash, generally because of unforeseen technical difficulties. Lastly, twins may respond randomly or without apparent effort throughout most or all of a test, resulting in invalid data even if the items are technically error-free.

The cumulative effects of events such as as item timeouts, item errors and random responding may seriously compromise test validity, where those events are repeated for a given twin test. Problems such as these are not always obvious in the raw data, because of the large volume and complexity of the web test item data (for example, the PIAT test at age 10 has around 600 item variables for each twin, and the test has 7 branching points).

This page describes attempts to recognise the various types of peculiarities and anomalies that may occur in web tests, and how they have been recoded in a consistent way so as to make them recognisable in the analysis datasets. The recoding of the raw data items allows test-wide problems to be recognised and addressed. Badly-affected instances of tests are excluded from analysis by removal of the item data (recoding to missing), and such instances are flagged using test status variables.

Such data cleaning is carried out in the syntax used to build the analysis datasets. The cleaning mechanisms are therefore fully documented as part of the syntax itself. Furthermore, the cleaning can easily be modified or removed by changing the syntax. Hence the cleaning does not involve any permanent loss of raw data. The syntax involved is very lengthy and is not reproduced here in the data dictionary.

Effects of programmed test rules

Branching rules in questionnaires

In web/app questionnaires, notably in the TEDS21 study, measures may contain branching questions to help ensure consistent responding to follow-up questions. For example, an initial screening question may have a yes/no response, followed by further questions that are only applicable to those who answered 'yes'; the branching rule will then ensure that those follow-up questions are skipped for those who answered 'no. The end result, in the data, is that the follow-up questions will necessarily have missing values for those twins who answered 'no' in the screening question.

In TEDS21, the same questionnaire (both in phase 1 and phase 2) was presented via the alternative means of a phone app and a more conventional web site. Great efforts were made, during planning and testing, to ensure not only that the questions and responses were consistent, but also that the branching mechanisms worked in the same way in both presentations.

In TEDS21, some twins who preferred not to use the web or app were allowed to complete a paper questionnaire instead. Here, of course, branching rules could not be enforced but could be encouraged using instructions like 'if yes ... '. Additional steps could then be taken, either at data entry or in the dataset, to ensure that branching rules were effectively mimicked by eliminating inconsistent patterns of response.

Branching rules in tests

Many of the web tests in the 10 and 12 Year studies, and a few tests in later studies, had built-in adaptive branching. The branching rules are peculiar to each test, and details can be found by following the links above.

Where an upward branch occurs, a twin may move directly to a more demanding set of items in the test, skipping a number of intermediate items. Unless the twin subsequently branches down again in the same test, the intermediate items will remain unanswered, and in theory these items should be credited as though they were answered correctly. In the raw data for many tests, such items are not explicitly coded, and there may be missing or uncoded item data. Such items may generally be identified in the raw data by one of the following methods:

  1. Where the raw data contains item order variables, the order in which items were presented may be deduced, and hence upward branches may be identified.
  2. Inspection of the scores for branch point items, in relation to the known branching rules, should reveal whether an upward branch should subsequently have occurred.

Having identified the items skipped due to upward branching, they are recoded in the datasets as follows:

  1. Item responses, which may be missing or may have some default coding, are recoded to the value -3.
  2. Item scores, which may be missing, are recoded to the appropriate full score for the item (usually 1)
  3. Item response times are generally missing, and are left as such.

Discontinue rules

Many of the tests have discontinue rules, designed to shorten a test where a participant is performing badly and has reached a ceiling. Typically, such a rule causes the test to end when a participant has given x incorrect responses in y consecutive items. The discontinue rules are peculiar to each test, and details can be found by following the links above.

After discontinuing, the remaining items of the test are not attempted, and in theory these items should not contribute towards the participant's test score. In the raw data for many tests, such items are not explicitly coded, and there may be missing or uncoded item data. Such items may generally be identified in the raw data by one of the following methods:

  1. In some tests, the discontinued items may have entirely missing data and may easily be distinguished from other types of item events.
  2. Inspection of the scores for consecutive items, in relation to the known discontinue rules, should reveal whether a discontinue point should have occurred.

Having identified the discontinued items, they are recoded in the datasets as follows:

  1. Item responses, which may be missing or may have some default coding, are recoded to the value -2.
  2. Item scores, which may be missing, are recoded to 0.
  3. Item response times are generally missing, and are left as such.

Timeout rules

Many of the tests have item timeout rules, such that if a response is not given within the time limit the item is forfeited and the next item is presented. The timeout rules are peculiar to each test, and details can be found by following the links above.

A timed out item is generally treated as an incorrect response with zero score. In the raw data for many tests, such items are not explicitly coded, and there may be missing or uncoded item data. Such items may generally be identified in the raw data by one of the following methods:

  1. In some tests, item responses are explicitly coded to indicate timeouts
  2. In most tests, the recorded item response time is greater than or equal to the item time limit. The item score is usually recorded correctly as 0.

Having identified the timed out items, they are recoded in the datasets as follows:

  1. Item responses, which may be missing or may have some default coding, are recoded to the value -1.
  2. Item scores, which may be missing, are recoded to 0.
  3. Item response times generally exceed the item time limit, and are recoded to missing.

Note that there were some tests in which an item timeout did not necessarily lead to a score of 0. In these tests, twins were required to record a response then click on a button. If the time limit was reached and a response was recorded but the button had not been clicked, then the response was submitted and might receive a valid non-zero score. In such cases, a different approach was taken to item recoding: item responses were coded as timeouts (-1) only if no response had been recorded, or in some cases if a zero-scoring response had been been recorded. One type of example is seen in the Maths/Understanding Number test (ages 10, 12 and 16) and the Science test (age 14), where in some items the response was a number entered from the keyboard, or a set of two or more clickable options. Similarly, in the Making Inferences and Listening Grammar tests (age 12), twins had to select two clickable options before clicking on the button. Another example is seen in the 2D and 3D Drawing tests (age 18), where twins were asked to draw a set of lines by clicking on adjacent dots and to click on the button when the drawing was completed.

Errors and anomalies in tests

Crashed and interrupted items

A test item may "crash" or malfunction for a variety of reasons, for example: a technical error may occur either on the twin's computer or on the server; there may be a loss of internet connection; the test may fail to function properly due to very slow connection speeds; or there may be an undetected bug in the test programming (sometimes related to browser issues).

A test item may be "interrupted" by twin actions, either intentional or unintentional: clicking the browser's Back or Refresh buttons, closing the browser or turning off the computer in mid-test. (Most web tests were deliberately programmed to recognise attempts to use the Back or Refresh buttons. The programming usually prevented the item from being repeated, resulting in a forfeited item.)

Item crashes and interruptions generally had the same effects on test items, and cannot be distinguished in the item data. In most tests, crashed or interrupted items may be identified as follows:

  1. Some item data are missing. This may affect the item response, score, response time and/or order number. The precise effects vary from test to test.
  2. The missing item data cannot be explained by test branching, discontinue or timeout rules as described above.

Having identified the crashed or interrupted items, they are recoded in the datasets as follows:

  1. Item responses are recoded to the value -4.
  2. Item scores are recoded to 0.
  3. Item response times are recoded to missing.

Other item anomalies

Other rare anomalies, sometimes peculiar to an individual test, have been detected. For example:

  1. Items may have abnormally large item response times if the item was started before midnight and finished after midnight.
  2. When a test item requires a single click for a response, very rapid repeated clicks may have abnormal effects on the submission of the response, either for the current item or for the following item.
  3. In very rare cases, item timeout rules seem not to have functioned, so that normal responses are recorded but with times in excess of the time limit.

Where detected, such anomalies can be cleaned up in the syntax. For example, extreme outlying response times are recoded to missing; and invalid item data may be treated in the same way as item crashes (responses recoded to -4).

The anomalies caused by very fast repeated clicks within a single item are often impossible to detect, but are thought to be rare. The outcome may be that the second click results in a second response (and/or score) being submitted and then recorded for the following item. This is only detectable in tests where successive items have different response/score formats, such that anomalies can be identified (e.g. Maths at 10 and 12, Science at 14). Where identified, such anomalous responses are recoded to -4 and treated as item crashes. Another possibly outcome from such an event might be the disruption of the test branching or discontinue rules; errors in branching rules can sometimes be detected as described below.

Abandoned tests

Certain tests were deliberately programmed so that they would be "abandoned" if a crash or interruption occurred during a test:

  • Reading Fluency (ages 12 and 16)
  • Dot Number (age 16)
  • Number Sense (age 16)
  • Reaction Times (age 16)

In these tests, a crash or interruption caused the entire test to be abandoned. On resuming, a twin would continue with the next test in the battery. In these tests, the items were not submitted individually to the web server but were gathered by javascript programs in the server and only submitted at the end of the test. Hence, an interruption before the test's end would cause the loss of all the data collected so far. A deliberate decision was made not to require the twin to re-start the test in these circumstances. The tests listed above may be recognised in the raw data by the following means:

  1. In the 16 Year tests, the test status variable is explicitly coded with value 3
  2. Test variables such as the data flag and start and end times are non-missing, but all item responses have missing values

Having identified instances of abandoned tests, they are recoded in the datasets as follows:

  1. The test status variable is recoded to value 3, if it does not already have this value
  2. The test data flag is recoded from 1 to 0.
  3. Test-level items such as dates and times that are non-missing are recoded to missing

Later tests at age 18 (Perception, Bricks) worked in a similar way using javascript to collect test data, such that an interruption would result in the loss of all test data. However, in these web batteries it was decided that the twin should re-start the test on resuming. Hence, abandoned tests do not appear in the raw data.

In all other tests, data were submitted to the web server item by item, so that a crash or interruption would generally affect only a single item and would not result in a test being abandoned.

Test malfunctions

Even after extensive testing, web tests may contain bugs that only come to light during the course of a twin study. Often, such bugs may only cause item crashes or anomalies as described above. In some circumstances a bug may cause more serious problems affecting an entire test, for example:

  • Certain tests did not have built-in pauses after an item timeout, but proceeded directly to the next item where a further timeout could occur if the test was left unattended. Hence, a long series of timeouts could occur leading to a serious loss of data.
  • A bug that causes repeated item crashes may compromise the entire test through loss of data
  • In tests with branching rules, item crashes sometimes result in branching errors, for example where a twin is branched up to harder items instead of down to easier items

The tests that could be affected by serial timeouts were Reading Fluency and Figurative Language (ages 12 and 16), Dot Number, Number Sense, Reaction Times and PVT (all at age 16), and Elithorn Mazes (age 18).

Excessive loss of data in a test may be detected by counting the number of items in various categories (meaningful responses, timeouts, discontinued, credited after branching, crashed or interrupted).

Branching errors may be detected by examining the scores within each branch point, and by determining whether the correct branch was subsequently made. Sometimes, a branch item that crashed or malfunctioned was treated (by the web server programs) as though it had been answered correctly for the purposes of the branching rule, even though such an item was treated as incorrect for the purpose of its contribution to the total test score. The outcome of such an event could therefore be an incorrect upward branch (depending also on other item scores within the same branch point, and on the strictness of the branching rule); and an incorrect upward branch will in most cases result in an overestimate of the twin's test score (although in some tests a subsequent downward branch might correct this).

Where a test is compromised in these ways to the extent that the test data may be considered to be invalid, the data are recoded in the datasets as follows:

  1. The test status variable is recoded to value 3
  2. The test data flag is recoded from 1 to 0.
  3. Item data are recoded to missing.

The criteria for deciding the circumstances in which a test should be treated in this way are described below.

Random responding

A twin who responds randomly or without apparent effort in a given test (or questionnaire section) may be referred to as a "random responder" or a "clicker". A twin who responds in this way in one test may make a genuine effort in another test, so these instances must be dealt with on a test-by-test basis. Twins may respond in this way in order to move on quickly to the next test in the battery, either because the given test is felt to be too long, boring, difficult or otherwise inconvenient, or because the twin is impatient to claim the reward offered for completing the battery of tests.

We do not have direct evidence of twins responding in this way: the twins completed the tests remotely, on their home computers, and we received no reports of such behaviour from twins themselves or from their families. Evidence for such instances of "random responding" can therefore only come indirectly from patterns of response in the data. Such patterns may include the following:

  • Rapid responses (low response time values) in many of the items
  • Repeated clicks in the same part of the screen for many items
  • Linked to either of the above, incorrect responses in most items
  • Incorrect responses to quality control (QC) items, inserted into some activities from the 21 Year Study onwards

In the absence of stronger evidence, it is not possible to identify all random responders. Generally, only the more extreme cases have been identified and excluded, often using a combination of the factors above. A conservative approach has been taken in order to avoid excluding, for example, genuinely low-scoring twins who responded moderately rapidly, or rapidly-responding twins who were able to achieve moderately good test scores.

Every web test has its own characteristics, and the exact criteria for exclusion may be judged from co-distributions of appropriate variables (generally to identify groups of obvious outliers). Some tests, for example Reading Fluency (ages 12 and 16) and Dot Number (age 16), actively encouraged rapid responses to items that would have been relatively easy if presented without time limits; in these tests, clicking may still be detected either by extremes of rapid responding, by outlying low scores, or by repeated clicking in the same part of the screen. Some other tests, for example Maths (ages 10 and 12) and Science (age 14), do not have the same response format in all items, so it is impossible to identify twins who repeatedly click in the same part of the screen. Still other tests, such as Number Line (age 16) and 2D/3D Drawing (age 18) have very unusual response formats, and may require different approaches to identifying clickers.

Nevertheless, some consistent approaches to identifying random responders across different web tests and at different ages have been used. The methods used are described in more detail below. Where exclusions are identified, test instances are recoded as follows in the datasets:

  1. The test status variable is recoded from 2 to 4
  2. The test data flag is recoded from 1 to 0.
  3. Item data are recoded to missing.

In questionnaires at TEDS21 and later, a similar approach has been taken within each questionnaire section or 'theme': where random responding is detected, the theme status variable is recoded from 2 to 4, and the item data for that theme are recoded to missing. Exclusion at the level of the entire questionnaire (across all themes) is enforced if repeated theme exclusions are found, and in these cases the questionnaire data flag is recoded from 1 to 0.

Summary of web test item and test coding

Where detected, item events other than normal responses are generally coded as follows in the dataset:

Item event Typical identification in the raw data Item variable coding
Response Score Response time
Timed out Response time greater than item time limit. -1 0 missing
Discontinued Preceding pattern of responses corresponds to discontinue rule. Item data are generally missing. -2 0 missing
Skipped due to upward branching Preceding pattern of responses corresponds to branching rule. Score = 1, response and response time missing. -3 1 missing
Crashed, malfunctioned or interrupted Response is missing, not explained by other events above. -4 0 missing

Where test instances are identified as having been compromised, they are coded as follows in the dataset.

Test outcome Description Test variable coding
Status variable Data flag variable Item data
Not started Not attempted by the twin 0 0 missing
Unfinished Test was started but left unfinished 1 0 missing
Successfully completed Completed, no problems identified 2 1 non-missing
Completed but compromised by missing data Test abandoned or crashed, or many items without meaningful data 3 0 missing
Random responses Twin apparently answered without thought or effort in most of the items 4 0 missing

Parameters for categorising compromised tests

An attempt has been made to use exclusion rules which are reasonably consistent across different tests from the various web studies. However, some flexibility in use of the rules is needed where different tests have quite different characteristics. Graphs are generally used to identify groups of extreme outliers and to adjust cut-offs used for exclusion.

Web tests: parameters for categorising random responding

Random responders or "clickers" in web tests are generally identified using some combination of the following three test characteristics, as appropriate to the test.

  1. Rapid responses
    1. Low mean item response time: use a cut-off between the 5%-ile and the 10%-ile of the distribution of mean item response times for the given test.
    2. High proportion of answered items have very fast response times. Define "very fast" response times using a cut-off that is around the 1%-ile of the distribution of mean item response times for the given test (this cut-off is around 1 second for many tests). Define a "high proportion" of answered items using a cut-off around 40%. Define this proportion using only items with meaningful responses (not timed out, discontinued or crashed/interrupted items).
  2. Low response variability
    Measure response variability using the standard deviation of item responses for all items in the given test. Define "low" variability using a cut-off roughly 2 standard deviations below the mean of this measurement, for all twins in the given test.
  3. Low test score
    Use a cut-off close to the 10%-ile of scores for the given test. Another useful guideline is the expected mean score for a completely random responder. Note that a low test score is usually only used as an exclusion rule in combination with one of the other rules above.

Generally, a low test score is only used as an exclusion rule in combination with one of the other factors (low response variability, or rapid responses). However, there are certain types of test where a very low score in itself is grounds for exclusion. For example, the Number Sense, Dot Task, Reaction Times and Reading Fluency tests all encouraged twins to make rapid responses to a series of easy items; in these tests, outlying low test scores may suggest twins who did not make a serious effort to respond correctly in many of the items, or may suggest a technical problem experienced during the test. Another example is the Author Recognition test, which was not timed: a twin who only selected one or two of the 42 displayed authors, or who selected all 42 of them, for example, might be classified as a clicker.

21 Year: parameters for categorising random responding

For the first time at age 21, quality control (QC) items were included within some questionnaire measures in TEDS21, and in some of the sub-tests in the g-game. A typical questionnaire QC item is "This is a quality control item, please select 'Sometimes'" (assuming that 'Sometimes' is one of the response categories in the given measure). Idenfication of careless or random responders was then based primarily, but not solely, on the occurrence of incorrect responses to such questions.

The TEDS21 questionnaires, which were lengthy, were divided into sections or 'themes'. An effort was made to place at least one QC item in each theme, with the aim of making exclusions by theme rather than necessarily across the entire questionnaire. It was found that QC items were best placed within measures having a long series of questions all with the same response options, as these are the measures in which clicking seems more likely to occur. The instruction given in the QC item should be to select a response which is unlikely to be the default response for the majority of questions within the measure; hence, the QC item should help identify someone who is giving the same default response throughout, without reading the questions carefully. The QC items should ideally be spread evenly through the questionnaire, and this may guide the placement of measures within the questionnaire. It was decided not to place QC items amongst highly personal or sensitive questions. There were a few themes in which QC items were not placed, because they lacked suitable measures and because the response options were highly variable between measures.

In the 21 Year g-game, a QC item was placed in 4 of the 5 sub-tests. Each QC item was designed to be trivially easy, and had the same presentation and response format as in the test items that preceded it.

Excluded random (or careless) responders in these studies are generally identified using combinations of the following characteristics within each questionnaire theme (section) or sub-test.

  1. QC item incorrect response
    Any response other than the one indicated in the QC question was counted as an incorrect response, providing evidence for careless responding.
  2. Patterns of uniform response
    Check the responses for the 4 items adjacent to the QC item (generally 2 before and 2 after). If at least 3 of the 4 responses were the same as the (incorrect) response to the QC question, then this provided strong evidence for careless responding.
  3. Fast theme/sub-test completion time
    The time taken to complete each theme was measured in electronic versions of each TEDS21 questionnaire; dividing by the number of questions actually answered (because of branching) gave the mean item response time within the theme. In the g-game, item response times were recorded, so the mean item response time within the sub-test was used. A 'fast' theme/sub-test time is generally judged to be a time in the lowest 20%-ile of this distribution. A twin making a QC item error, and having a 'fast' time within the same theme/sub-test, is quite likely to have been responding carelessly.
  4. Low test score
    This applies in the g-game but not in TEDS21 questionnaires. As in earlier web tests (see above), the guideline for low-scoring cut-offs is the expected mean score for a completely random responder. Note that a low sub-test score is usually only used as an exclusion rule in combination with one of the other rules above.

Hence, within each theme or sub-test, twins are excluded for careless responding if they make a QC error and either (a) show a pattern of uniform responding in the same measure, or (b) have a 'fast' response time within the theme. In the g-game, twins are additionally excluded for extremes of rapid responding.

Such exclusions, if isolated, were made within each theme/sub-test independently. However, a twin excluded in two or more themes/sub-tests of the same study can be judged to be a persistent careless responder, and as such is excluded across the entire dataset for that study.

Web tests: parameters for categorising loss of data

Most test exclusion rules relating to loss of data fall into one of the following categories (which are overlapping to some extent):

  1. Branching errors
    Exclude if an upward branch has occurred in error, resulting in a set of items being wrongly skipped and credited. The usual cause of such a branching error is a crash or malfunction in one or more of the items of the branch point.
  2. Insufficient responses
    Exclude if the number of items answered falls below some minimum number that is necessary for a meaningful test score (this minimum may be implied by the test rules). For example, some tests start with a branch point comprising X items, following which a twin is expected to branch to a higher or lower level of questions; in these cases, it may be appropriate to exclude twins with fewer than X meaningful item responses.
  3. Too many lost items
    A generally useful rule is to exclude test instances in which the number of crashed/malfunctioned/interrupted plus timed out items is greater than a quarter (0.25) of all the items presented to a twin in a given test. "Presented" items are defined as all those items that should have appeared on screen, hence not discontinued items.

Summary of exclusions made in all web tests

The table below summarises the types of exclusions applied in each web test, and the number of twins excluded. The exclusion types are coded as follows:

  • A1 Missing data: abandoned test (complete loss of data)
  • A2 Missing data: items skipped due to branching errors
  • A3 Missing data: insufficient responses and/or too many items lost to timeouts/crashes
  • B1 Clicker: low response variability (* combined with low test score, ** combined with low mean response time)
  • B2 Clicker: low mean response time (* combined with low test score)
  • B3 Clicker: high % of very fast responses (* combined with low test score)
  • B4 Clicker: outlying low score (regardless of other factors)
  • C Careless questionnaire responder: QC error combined with either uniform responses or rapid responding
10 Year test Exclusion types made N and % of twins excluded
Missing data Clicker Missing data Clicker
PIAT A2, A3 B1*, B3* 20 (0.3%) 106 (1.7%)
Maths A2, A3 B3* 100 (1.8%) 42 (0.7%)
Ravens A2, A3 B1*, B3* 6 (0.1%) 96 (1.7%)
Picture Completion A2, A3 B1*, B3* 15 (0.3%) 28 (0.5%)
Vocabulary A2, A3 B1*, B3* 8 (0.15%) 51 (0.9%)
General Knowledge A2 B1*, B3* 4 (0.1%) 18 (0.3%)
Author Recognition - B4 - 386 (6.9%)
12 Year test Exclusion types made N and % of twins excluded
Missing data Clicker Missing data Clicker
PIAT A2, A3 B1, B3* 52 (0.5%) 98 (0.9%)
Reading Fluency A1, A3 B1 190 (1.8%) 17 (0.2%)
Reading Comprehension A3 B1, B3* 6 (0.1%) 121 (1.1%)
Maths A2, A3 B3* 182 (1.7%) 217 (2.1%)
Ravens A3 B1, B3 11 (0.1%) 84 (0.9%)
Picture Completion A3 B1, B3* 55 (0.6%) 45 (0.5%)
General Knowledge A2, A3 B1, B3* 20 (0.2%) 17 (0.2%)
Vocabulary A3 B1, B3* 10 (0.1%) 45 (0.5%)
Figurative Language A3 B1 80 (0.9%) 22 (0.2%)
Listening Grammar A2, A3 B3* 177 (2%) 114 (1.3%)
Making Inferences A3 B3* 62 (0.7%) 62 (0.7%)
Hidden Shapes A2, A3 B1, B3* 45 (0.4%) 81 (0.8%)
Jigsaws A2, A3 B1, B3* 28 (0.3%) 54 (0.5%)
Eyes A3 B1, B3* 4 (0.04%) 53 (0.5%)
Author Recognition - B4 - 663 (6.8%)
14 Year test Exclusion types made N and % of twins excluded
Missing data Clicker Missing data Clicker
Science A3 B3* 23 (0.4%) 281 (5%)
Vocabulary A3 B1, B3* 2 (0.03%) 47 (0.7%)
Ravens A3 B1, B2* 4 (0.07%) 92 (1.7%)
16 Year test Exclusion types made N and % of twins excluded
Missing data Clicker Missing data Clicker
Reading Fluency A1, A3 B2* 175 (3.2%) 42 (0.8%)
Passages - B2* - 605 (12.5%)
Understanding Number A2, A3 B2 97 (2%) 82 (1.7%)
Dot Number A1, A3 B2* 293 (5.1%) 32 (0.6%)
Number Sense A1, A3 B2, B4 278 (5.1%) 165 (3%)
Number Line A3 B1, B4 11 (0.2%) 47 (0.8%)
PVT - B1, B2, B4 1 (0.02%) 201 (3.9%)
Corsi Block A3 - 16 (0.3%) -
Reaction Times A1, A3 B4 75 (1.5%) 20 (0.4%)
Ravens - B1, B2* - 86 (1.7%)
Mill Hill Vocabulary - B1, B2* - 68 (1.2%)
Figurative Language - B1 - 1 (0.02%)
EWQ part A - B1* - 31 (0.6%)
EWQ part B - B1* - 51 (1%)
EWQ part C - B1* - 78 (1.5%)
EWQ part D - B1* - 97 (2%)
18 Year test Exclusion types made N and % of twins excluded
Missing data Clicker Missing data Clicker
Perception: Faces - - - -
Perception: Cars - - - -
Bricks: 2r A3 B1, B3 3 (0.1%) 55 (1.8%)
Bricks: 2rv A3 B1, B3 4 (0.1%) 81 (2.7%)
Bricks: 2v A3 B1, B3 9 (0.3%) 89 (2.9%)
Bricks: 3rv A3 B1, B3 1 (0.03%) 46 (1.5%)
Bricks: 3r A3 B1, B3 1 (0.03%) 69 (2.3%)
Bricks: 3v - B1, B3 - 67 (2.3%)
Kings Challenge: cs A3 B1**, B3* 30 (1%) 75 (2.6%)
Kings Challenge: 2d A3 B3* 52 (1.8%) 13 (0.4%)
Kings Challenge: pa A3 B1**, B3* 74 (2.6%) 53 (1.9%)
Kings Challenge: em A3 - 375 (13.6%) -
Kings Challenge: mr A3 B1**, B3* 1 (0.04%) 34 (1.2%)
Kings Challenge: pf A3 B1**, B3* 65 (2.4%) 48 (1.8%)
Kings Challenge: 3d A3 B3* 147 (5.5%) 62 (2.3%)
Kings Challenge: sr A3 B1**, B3* 96 (3.6%) 69 (2.6%)
Kings Challenge: pt A3 B1**, B3* 56 (2.1%) 85 (3.2%)
Kings Challenge: ma A3 B1**, B3* 64 (2.4%) 96 (3.6%)
Navigation: od1 A1, A3 - 8 (0.3%)  -
Navigation: od2 A1, A3 - 9 (0.3%) -
Navigation: od3 A1, A3 - 32 (1.1%) -
Navigation: od4 A1, A3 - 25 (0.9%) -
Navigation: od5 A1, A3 - 44 (1.7%) -
Navigation: ol1 A1, A3 - 6 (0.2%) -
Navigation: ol2 A1, A3 - 11 (0.4%) -
Navigation: ol3 A1, A3 - 12 (0.4%) -
Navigation: ol4 A1, A3 - 21 (0.8%) -
Navigation: ol5 A1, A3 - 21 (0.8%) -
Navigation: mn1 A1, A3 - 6 (0.2%) -
Navigation: mn2 A1, A3 - 21 (0.8%) -
Navigation: mn3 A1, A3 - 7 (0.2%) -
Navigation: mn4 A1, A3 - 11 (0.4%) -
Navigation: mn5 A1, A3 - 18 (0.7%) -
Navigation: mw1 A1, A3 - 4 (0.1%) -
Navigation: mw2 A1, A3 - 11 (0.4%) -
Navigation: mw3 A1, A3 - 17 (0.7%) -
Navigation: mw4 A1, A3 - 12 (0.5%) -
Navigation: mw5 A1, A3 - 11 (0.4%) -
Navigation: sc1 A1, A3 B3* 2 (0.1%) 9 (0.3%)
Navigation: sc2 A1, A3 B3* 1 (0.04%) 7 (0.3%)
Navigation: sc3 A1, A3 B3* 11 (0.4%) 11 (0.4%)
Navigation: sc4 A1, A3 B3* 7 (0.3%) 15 (0.6%)
Navigation: sc5 A1, A3 B3* 12 (0.5%) 11 (0.4%)
Navigation: ps1 A1, A3 B3* 6 (0.2%) 13 (0.4%)
Navigation: ps2 A1, A3 B3* 6 (0.2%) 22 (0.8%)
Navigation: ps3 A1, A3 B3* 6 (0.2%) 16 (0.6%)
Navigation: ps4 A1, A3 B3* 10 (0.4%) 12 (0.4%)
Navigation: ps5 A1, A3 B3* 11 (0.4%) 120 (4.6%)
21 Year Exclusion types made N and % of twins excluded
Missing data Clicker Missing data Clicker
TEDS21 phase 1 theme 1 - C - 730 (5.1%)
TEDS21 phase 1 theme 2 - C - 252 (1.8%)
TEDS21 phase 1 theme 3 - C - 352 (2.5%)
TEDS21 phase 1 theme 6 - C - 83 (0.6%)
TEDS21 phase 1 theme 8 - C - 335 (2.4%)
TEDS21 phase 1 theme 9 - C - 143 (1.0%)
TEDS21 phase 2 theme 1 - C - 95 (0.7%)
TEDS21 phase 2 theme 2 - C - 46 (0.3%)
TEDS21 phase 2 theme 3 - C - 103 (0.7%)
TEDS21 phase 2 theme 4 - C - 121 (0.8%)
TEDS21 phase 2 theme 5 - C - 259 (1.8%)
TEDS21 phase 2 theme 6 - C - 140 (1.0%)
TEDS21 phase 2 theme 7 - C - 121 (0.8%)
G-game: voc - B2*, C - 17 (0.1%)
G-game: ist - B2*, C - 26 (0.2%)
G-game: mis - B2* - 8 (0.1%)
G-game: rav - B2*, C - 26 (0.2%)
G-game: ver - B2*, C - 8 (0.1%)