In an effort to improve research reproducibility and align with FAIR guiding principles, an increasing number of funders and publishers are asking researchers to share the de-identified data underlying their research. This includes the NIH Data Management and Sharing Policy (NIH DMSP), which expects researchers to take steps to maximize scientific data sharing. Any justifiable factors (i.e., ethical, legal, or technical) that necessitate limiting sharing data need to be included in the data management plan.
Institutional Review Board (IRB)
The UC Davis IRB determines whether researchers have included adequate provisions to protect the privacy of human subjects and to maintain the confidentiality of any identifiable data at each segment of the research lifecycle.
Informed Consent
Informed consent is one of the founding principles of research ethics. Its intent is that human participants can voluntarily enter research with full information about what it means for them to take part, and that they give consent before they enter the research study. It’s best to plan for sharing your data from the very start of your research project, including considerations for obtaining informed consent from human participants regarding the storage and sharing of research data for future use.
"Consumers viewed consent as the most important privacy protection. The central role of consent may reflect the value placed by consumers on preserving autonomy and the ability to choose whether and how their personal data are used."
Gupta R, Iyengar R, Sharma M, et al. Consumer Views on Privacy Protections and Sharing of Personal Digital Health Information. JAMA Netw Open. 2023;6(3):e231305. doi:10.1001/jamanetworkopen.2023.1305
Resources
De-Identification
Generally, the scientific data derived from human research participants, including qualitative data, should be adequately de-identified prior to sharing to ensure protection of research participants, maintain privacy, and mitigate risk, especially for vulnerable or marginalized groups.
Resources
- Curated Lists of Tools:
- Images
- Clinical Text
- NLM-Scrubber – Clinical text de-identification tool developed by the National Library of Medicine
- Social Science Data
- De-Identification Certification
Controlled Access
Certain studies (e.g., qualitative or mixed-methods projects) may generate scientific data that are challenging to de-identify or still pose privacy risks even when data are de-identified due to the presence of information that can allow inferences to be made about a research participant’s identity. For example, imaging data, rich clinical/phenotypic data, transcripts from focus groups or in-depth interviews, ethnographic observations, audio recordings of deliberative community-based engagements, social media posts, etc. may need special protections to ensure participant privacy. In these instances, selecting a data repository with a controlled-access mechanism may be the most appropriate option to ensure the protection of participants in the study.
Data Collection
- Limit information collected to only what is necessary to address the research question(s)
- Avoid collecting superfluous identifiable information (including electronic identifiers) unless it is necessary for your research
- Example: Qualtrics collects IP addresses by default, which is considered by some IRBs and international standards to be personally identifiable information. Settings can be changed to "Anonymize Response" to prevent Qualtrics from collecting IP addresses.