After data collection
Contact Martin Zettersten (martincz@princeton.edu) with questions about data submission and data preparation (please read text below carefully first!)
- 17 Data Reporting Guidelines
- data templates
- data dictionary
Data templates
The primary deliverables for the project are two data files, filled out by your lab from the templates provided below. Please download a copy of each template and use a guide for formatting your lab’s data. We prefer files in the CSV data format. PLEASE NOTE that saving in this format will remove any formulas or other non-plain-text features of your spreadsheet (e.g,. color fills, formatting); all information should be captured in the text within each cell.
Participant Data – A .csv file with one row for each participant, and with columns showing the participant’s subject number, age, demographic information, notes on the session, etc. Participant Data - CSV template Trial Data – A .csv file with one row for each trial, and with columns showing the participant’s subject number, trial number, trial type, looking time, etc. Trial Data - CSV template IMPORTANT NOTE 1: These two files must use identical, anonymous subject identifiers (‘participant_id’). We must be able to link participant- and trial-level data across files! You can use your lab’s normal participant numbering convention (e.g., 001, 002, 003, etc.), as long as participant IDs DO NOT include any private information (e.g., initials, birth date, gender). IMPORTANT NOTE 2: These files must contain de-identified data ONLY. All potentially identifying information should be stripped from your data file before submission. For example, you SHOULD NOT include birth date and test date in your Participant Data file. Instead, use an age calculating tool (e.g., https://www.calculator.net/age-calculator.html) to calculate each participant’s age in days, and report that value in the ‘participant_age_days’ variable. If you have any questions about ensuring your data is de-identified, email contact@manybabies.org. IMPORTANT NOTE 3: It’s really important to remember that these files are designed to be read by a computer program, not a person. So anything that violates the template (e.g., variables that aren’t of the specified type, formatting, comments, etc.) will not work. For example, cells in the column “lang1_exposure” in the participant data file should contain numbers. If you write “80 to 90” this will cause errors because it contains characters in addition to numbers (note: please check questionnaire responses before participants leave the lab to avoid NA responses). If you have questions, comments, or calculations, please communicate directly with the analysis team, rather than embedding them in the data. IMPORTANT NOTE 4: Please do not leave any fields blank. If something does not logically have an answer, or if you did not collect this information, please mark it as “NA”. Language. If you collect data from children who are learning more than one language, please provide an approximate percentage of exposure to each language, either by parental report, or if it is standard practice in your lab, using a day-in-the-life style questionnaire administered by the RA. The total should add up to 100%.
Data dictionary
MB5 Data Dictionary – This spreadsheet lists all of the variables that need to go into the Participant Data and Trial Data files (Note that there is one worksheet/tab for each data file). Each row contains a variable and that variable’s specified format (e.g., string, integer), set of example values, and description. It is important that your lab’s data follows these specifications exactly in order to allow for data harmonization with the full dataset. MB5 Data Dictionary
Data Validation
After preparing your Participant Data and Trial Data files, please use the MB Data Validator to ensure your lab’s data is in the correct format (MB Validator User Manual). Common issues: If the validator is rejecting your CSV file, it may be due to different country standards around the use of commas as decimal points, etc. in your numeric format. Please use periods (‘.’) and not commas (‘,’) as decimal points. Make sure that there are no stray marks in cells outside of your data range. For example, a space entered in a cell in an otherwise empty row or column will cause an error. If you encounter any unexpected issues please send an email to Martin Zettersten (martincz@princeton.edu). Your data files MUST pass validation before submission.
Data submission
Once your data files have passed validation, please upload both the Participant Dataand Trial Data files using the MB5 Data Upload form. Please take note of the following: It is essential that both file names include your ManyBabies LabID. Refer to the LabID list here to find your lab’s unique LabID. Use the following naming convention for your Participant and Trial Data files: yourLabID_participant_data.csv (e.g., babylabPrinceton_participant_data.csv) yourLabID_trial_data.csv (e.g., babylabPrinceton_trial_data.csv)
Video records of data collection (optional)
If you are sharing videos of your data collection (and this is strongly encouraged, if it’s at all possible given your ethics approval), you can store them in Databrary if you are a member. The naming convention for Databrary volumes is “ManyBabies5: yourLabID” (e.g. “ManyBabies5: babylabPrinceton”). We ask that you use this naming convention so people can easily search for all the ManyBabies-related volumes.