Preparing your data for deposit in the EMOTE database
Thank you for contributing your intensive longitudinal dataset to EMOTE! We appreciate
your dedication to open science principles by making your data available to others.
This guide outlines how to prepare your data for upload. If you have any questions throughout this process,
please get in touch with our team at emote-database@unimelb.edu.au.
Before you begin, please make sure your data meets these requirements
Collected by humans
Your data were collected from humans using intensive longitudinal methods (e.g. experience sampling, ecological
momentary assessment, diary methods, ambulatory assessment). These methods are characterized by multiple measurements
collected from the same participant during their everyday lives. We refer to this kind of data as
ESM data in this guide.
Collected ethically
Your data has ethics approval from an ethics committee,
and your ethics clearance allows you to share de-identified data with other researchers.
De-identified
To ensure the anonymity of your participants, any identifying information must be
removed from the data.
De-identifying your data
Please follow these four steps to make sure you have removed all identifying information about participants:
Remove any specific identifying informationemail addresses, names, addresses, IP addresses, and identification numbers from online data collection services like MTurk and Prolific
Omit any demographic information that is very specific (e.g., a specific job title)
and enduring (is stable across time), or recode these variables to make them more abstract
recoding a job variable so it lists industry or job level, rather than specific job title
Review any open-text or string variables to ensure that participants have not accidentally
included identifying information in these fields
e.g. entering an email address by accident in a field meant for something else
.
Remove any potentially identifying responses or variables or recode these variables into more abstract categories.
Review whether someone could slice across all the demographic variables
available to identify a participant
One particular demographic variable might not be an issue, but in combination with other demographics in the dataset, and the year and context the data were collected in, there might be more cause for concern (particularly for participants from under-represented groups).
.
Remove any variables that are part of potential concern, or specific answers that might identify participants.
How should I format my data for upload?
Please follow these instructions to ensure your dataset can be uploaded quickly and is easily searchable by other users:
Create or relabel a randomly generated, de-identified participant ID as 'UUID'.
Move this variable to the first column in your dataset.
Add a 'Date_Local' and a 'Time_Local' variable, reflecting
the date and time each ESM survey was scheduled in the participant's local time zone.
Use NA for expired surveys.
Date_Local formatted as YYYY-MM-DD
Time_Local formatted as HH:MM:SS (or HH:MM), using 24-hour time
See Microsoft’s guide for transforming common date and time variables in Excel.
Include only raw data from your study. If you aren’t sure which variables to leave in or take out, check our FAQ.
Omit all data exclusions for withdrawals, technical issues, and careless responding. This information will be captured in the data deposit form.
Missing values may be left blank or empty, or replaced by NA or NaN. If you have used a different missing value flag (e.g., -999) please replace these with empty cells or NA.
Please translate any information provided to EMOTE into English (or omit information that cannot be translated).
Format the data as a single CSV data file encoded in UTF-8 format
(encoding guide),
in long format wherein each row represents a single measurement occasion.
Time-invariant data (e.g., baseline or follow up) should be repeated for each row belonging to a given participant.
Example data including ESM measures, Baseline and Follow-up data. Baseline and Follow-up measures (in blue) were only collected once and are repeated for each row of ESM data (in green).