Please visit each month for a new case that may be used as a framework for a brief conversation about best research practices in your lab meeting, research conference, journal club, or any research meeting.

Case of the Month

May 2022 – Creating a Public Archive of Sensitive Data

Frances is a researcher studying the molecular basis of cancer. She plans to sequence the genomes of children with cancer. Frances also intends to make such sequencing data publicly available on line. Internet-based DNA sequence databases would allow other scientists to analyze the data and ideally come up with important findings more quickly. This may lead to rapid identification of targets for new pediatric cancer treatments. 

“Re-identification occurs when two conditions are satisfied: 1) uniqueness, and 2) linkage.
The first condition is satisfied when unique values exist in the shared medical records.
The second condition is satisfied when attributes in the medical record
can be used to link to identified information external to the medical record.”  
– Bradley Mali and Kathleen Benitez (Vanderbilt University)

If a large amount of sequence data is made available, it may be possible for individuals to eventually be re-identified, which could have negative consequences. For example, participants in childhood cancer studies could become known to future employers or insurers based on their genetic information (their history of cancer and their publicly archived data). Further, data obtained for one study might later reveal other information such as susceptibility to other diseases or previously unknown family relationships. If a subject in a genome-wide sequencing study later released a small set of genetic information to another party for a different purpose, that information might be matched to the more extensive sequence data on the internet, revealing more about the subject to that party than the subject intended. This is a risk that subjects of such research should be made aware of during the informed consent process.

However, children, unlike adults, cannot legally consent. Publishing their personal DNA sequence would be based on parental permission. Since publication of such data is irreversible, parents would have to agree to this on their children’s behalf. 

Federal regulations permit pediatric research that has no direct benefit to the child only when risks are minimal. The determination that a study involves only minimal risk requires the evaluation of the magnitude of possible harms as well as the probability of such harms.

How should Frances proceed?

Discussion Questions

  • Can you describe scenarios in which the data could be re-identified?
  • What are some possible harms of re-identification?
  • What are ways in which publicly available DNA sequencing data are used by others?
  • How might a subject learn about actual occurrence of breaches of confidentiality?
  • Do you think that children should have an opportunity to refuse requests to assent to the public use of their genetic information?

The above scenario was copied from the Office of Research Integrity “RCR Casebook: Stories about Researchers Worth Discussing“. Please see the Data Management Practices web page for more information about data management.

We believe that research integrity is not achieved by simply taking an RCR course and "checking the box" that training is done. Our vision is to maintain a research culture in our everyday lives as UF researchers and research trainees in which we naturally follow best practices to ensure that the research we do is responsible, rigorous, and reproducible.

