21  Data Dictionary and Templates

21.1 Data overview

The primary deliverables for the project are two data files:

  • Participant data: 1 row per participant
  • Trial data: 1 row per trial

In addition, labs using eye trackers are asked to submit raw gaze data (either as a single file or 1 file/participant).

21.2 MB5 Data Dictionary

The MB5 Data Dictionary lists all of the variables that should be included in the Participant Data and Trial Data files. Each row contains a variable and that variable’s specified format (e.g., string, integer), set of example values, and description. It is important that your lab’s data follows these specifications exactly in order to allow for data harmonization with the full dataset.

Important

The MB5 Data Dictionary is under development.

21.3 MB5 Data templates

IMPORTANT NOTE #1

These two files must use identical, anonymous subject identifiers (participant_id). We must be able to link participant- and trial-level data across files! You can use your lab’s normal participant numbering convention (e.g., 001, 002, 003, etc.), as long as participant_ID DOES NOT include any private information (e.g., initials, birth date).

IMPORTANT NOTE #2

These files must contain de-identified data ONLY. All potentially identifying information should be stripped from your data file before submission. For example, you SHOULD NOT include birth date and test date in your Participant Data file. Instead, use an age calculating tool (e.g., https://www.calculator.net/age-calculator.html) to calculate each participant’s age in days, and report that value in the age_days variable. If you have any questions about ensuring your data is de-identified, email us.

IMPORTANT NOTE #3

It’s really important to remember that these files are designed to be read by a computer program, not a person. So anything that violates the template (e.g., variables that aren’t of the specified type, formatting, comments, etc.) will not work. For example, cells in the column lang1_exposure in the participant data file should contain numbers. If you write “80 to 90” this will cause errors because it contains characters in addition to numbers (note: please check questionnaire responses before participants leave the lab to avoid NA responses). If you have questions, comments, or calculations, please communicate directly with the analysis team, rather than embedding them in the data.

IMPORTANT NOTE #4

Please do not leave any fields blank. If something does not logically have an answer, or if you did not collect this information, please mark it as “NA”.

Please download a copy of each template and use them as guides for formatting your lab’s data.

Files must be in CSV (comma separated values) format. PLEASE NOTE that saving in this format will remove any formulas or other non-plain-text features of your spreadsheet (e.g,. color fills, formatting); all information should be captured in the text within each cell.

Participant Data

A .csv file with one row for each participant, and with columns showing participant-level variables (e.g., participant_id, age, demographic information, notes on the session, etc.

Important

Participant Data template is under development.

Trial Data

A .csv file with one row for each trial, and with columns showing participant_id and trial level variables (e.g., trial number, trial type, looking time, etc.)

Important

Trial Data template is under development.