diff --git a/Human Participants Data Essentials Data Curation Primer/human-participants-data-essentials-data-curation-primer.md b/Human Participants Data Essentials Data Curation Primer/human-participants-data-essentials-data-curation-primer.md
index e883ada..60436ef 100644
--- a/Human Participants Data Essentials Data Curation Primer/human-participants-data-essentials-data-curation-primer.md
+++ b/Human Participants Data Essentials Data Curation Primer/human-participants-data-essentials-data-curation-primer.md
@@ -6,10 +6,10 @@
| Topic | Description |
| :------------- | :------------- |
-|Primary fields or areas of use|Any research on information or physical samples taken from human beings that is either subject to IRB or other regulatory approval, used consent forms, or where the data presents ethical quandaries due to human subjects content.
Example fields:
- **Health Sciences:** Clinical, Public Health, Neuroscience, Biomedical Engineering
- **Behavioral and Social Sciences:** Psychology, Sociology, Demography, Economics, Anthropology, Education, Social Work, etc. |
-|Key questions for curation review|
- Is there indication the research was governed by a review board (e.g., institutional, community, tribal) or other regulatory protocol?
- Is there a copy of the consent form included with the data?
- Is the data de-identified in consideration of both direct and indirect identifiers?
- Are other peripheral means of re-identification removed?|
+|Primary fields or areas of use|Any research on information or physical samples taken from human beings that is either subject to IRB or other regulatory approval, used consent forms, or where the data presents ethical quandaries due to human subjects content.
Example fields:
- **Health Sciences:** Biomedical Engineering, Clinical, Public Health, Neuroscience
- **Behavioral and Social Sciences:** Anthropology, Demography, Economics, Education, Linguistics, Psychology, Social Work, Sociology, etc. |
+|Key questions for curation review|
- Is there indication the research was governed by a review board (e.g., institutional, community, tribal) or other regulatory protocol?
- Is there evidence of a consent process?
- Is the data de-identified taking into account both direct and indirect identifiers?
- Are other peripheral means of re-identification removed?|
|Metadata-specific considerations |The level of detail in the metadata (or any lack of clarity that impedes understanding what metadata are present) may increase disclosure risk. Some datasets may also include hidden or embedded metadata (e.g., geolocation on images) that constitute a disclosure risk. See Brief Introduction to Identifiers and Communicating about De-identification with a Depositor, below.|
-|Context-specific considerations|
- Consent Form Review
- Screening for De-identification
- Suggesting Changes with Depositor|
+|Context-specific considerations|
- Consent Form Review
- Screening for De-identification
- Suggesting Changes to Depositor|
|Tools for curation review|
- [ARX Data Anonymization Tool](https://arx.deidentifier.org): Full-featured freeware for statistical risk assessment and anonymization. Requires knowledge of techniques.
- [The sdcMicro package in R](https://cran.r-project.org/web/packages/sdcMicro/) includes disclosure control and cell suppression techniques for tabular data.
- [PARAT Core](https://privacy-analytics.com/health-data-privacy/health-data-software/eclipse-risk/) (Privacy Analytics Eclipse): Commercial service for risk analysis and anonymization oriented to structured medical records.Typically for an institutional subscription.
- [Spirion.com](https://www.spirion.com): For fee. Covers only direct identifiers at enterprise network level. Not recommended.
- [NLM-Scrubber](https://scrubber.nlm.nih.gov): Highlights direct identifiers and typical medical identifiers for redaction. ASCII text input.|
|Date Created|March 2, 2020|
|Created by|
- Jenn Darragh, Duke University
- Alicia Hofelich Mohr, University of Minnesota
- Shanda Hunt, University of Minnesota
- Rachel Woodbrook, University of Michigan
- Dave Fearon, Johns Hopkins University
- Jennifer Moore, Washington University in St.Louis
- Hannah Hadley, Pennsylvania State University|
@@ -23,7 +23,7 @@ _This work was created by the Data Curation Network’s curator subgroup (Human
[Summary](#summary)
-[Introduction to Human Subjects](#introduction-to-human-subjects)
+[Introduction to Human Participants Data](#introduction-to-human-participants-data)
[Key Questions to Ask Yourself](#key-questions-to-ask-yourself)
@@ -35,7 +35,7 @@ _This work was created by the Data Curation Network’s curator subgroup (Human
[Next Steps](#next-steps)
-[Other Considerations](#other-considerations)
+[Other Considerations](#educational-opportunities)
[Glossary of terms](#glossary-of-terms)
@@ -45,7 +45,7 @@ _This work was created by the Data Curation Network’s curator subgroup (Human
[Appendix B Links to sources on consent documentation](#appendix-b-links-to-sources-on-consent-documentation)
-[Appendix C Human Participant CURATED checklist](#appendix-c-human-subjects-curated-checklist)
+[Appendix C Human Participant CURATED checklist](#appendix-c-human-participant-curated-checklist)
# Summary
@@ -87,13 +87,13 @@ At each step in the curation process, it is important to be cognizant of ethical
#### What was the consent process?
- Was a consent form, participant information sheet, or other participant agreement used during data collection?
- Seek this information in the documentation provided by the researcher with the dataset.
- - If the form itself is not included, we encourage requesting it (as with other standard documentation), or even requiring its submission.
+ - If a blank version of the form itself is not included, we encourage requesting it (as with other standard documentation), or even requiring its submission.
- Are there any other indications of how participant information will be used in other documentation of the dataset? (Focus group transcript, questionnaire, etc.?). Do they contradict consent documentation?
- Institutional or repository policies may differ, from simply storing the consent form as documentation to assessing its content. We suggest a minimum standard of checking that there is no language explicitly stating the data will not be shared. If vetting the consent form, see the section on “Brief Introduction to Consent Review,” below. Be sure to work with the depositor and involve their IRB and compliance offices whenever in doubt about permissions for releasing data.
#### Is any directly identifiable information present in the data?
- The [HIPAA privacy rule](https://privacyruleandresearch.nih.gov/pr_08.asp) is a good standard to follow when looking for direct identifiers, even if the data are not necessarily subject to the HIPAA privacy rule. However, be aware that these regulations were created in the 1990s, when information moved very differently than it does today, and there is evidence that even medical data de-identified to current HIPAA standards may expose patients to re-identification risks (Yoo, 2018).
-- For qualitative data, any video or voice recordings are considered inherently identifiable.
+- For qualitative data, any images or video of people, or voice recordings, are considered inherently identifiable.
#### Are there any indirect (quasi) identifiers present in the data?
- See “Brief Introduction to Identifiers” section below for more details and examples.
@@ -107,13 +107,15 @@ At each step in the curation process, it is important to be cognizant of ethical
# Brief Introduction to Consent Review and Communicating about Informed Consent with a Depositor
-In order to determine whether participants consented to data sharing, it is important to look at the informed consent, consent information sheet (for exempt studies), or participant agreement (for non-IRB reviewed studies) where one exists. Several repositories require these documents to be submitted along with the data at deposit.
+In order to determine whether participants consented to data sharing, it is important to look at the informed consent, consent information sheet (for exempt studies), or participant agreement (for non-IRB reviewed studies) where one exists. Some repositories require these documents to be submitted along with the data at deposit.
Example of required consent for deposit to repository: The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository states that “All requests for samples and data are carefully reviewed against the consent forms by the Repository and NIDDK staff members to ensure adherence.” (NIDDK, n.d.)
Other repositories ask for these documents upon submission of data that appear to be collected from humans. There are repositories that do not review or collect the consent form, but we do not recommend this practice.
-### Reviewing the consent form - there are three common scenarios regarding the mention of data in consent forms:
+### Reviewing the consent form
+
+There are three common scenarios regarding the mention of data in consent forms:
#### Consent form does not mention data at all.
@@ -131,16 +133,16 @@ Other repositories ask for these documents upon submission of data that appear t
- Bring your concerns back to the researcher who submitted the data.
- Repositories may consider a formal policy regarding the review of the consent form.
-- Data deposit agreements may include a statement that appropriate consent for data sharing has been obtained from all participants.
+- Data deposit agreements for the repository may be updated to include a statement attesting that appropriate consent for data sharing has been obtained from all participants.
- Depositors may be required to upload the consent form template as part of the data submission process, either for public dissemination or internal curation reference.
- Repositories may offer consent consulting services to ensure depositors use appropriate language in consent forms.
- IRBs are the authority on consent forms and should be consulted whenever in doubt about permissions to share data.
-For greater detail and more resources on reviewing the consent form as part of the curation process, please see the primer on Curation of Data Collected via Informed Consent (Please check back, this primer is coming soon!).
+For greater detail and more resources on reviewing the consent form as part of the curation process, please see the [Curation of Data Collected by Informed Consent Primer](https://github.com/DataCurationNetwork/data-primers/blob/master/Consent%20Forms%20Data%20Curation%20Primer/consent-forms-data-curation-primer.md).
# Brief Introduction to Identifiers and Communicating about De-identification with a Depositor
-De-identification is performed by the depositor or their proxy, but the data curator should be aware of considerations for this process that may need to be discussed with depositors. Human participant data that has not been de-identified may not be shared openly or even under restricted access conditions in many cases. This section will therefore introduce a high-level description of content that can be shown to the depositor as evidence of the need for de-identifying actions. This is an introduction to a complex subject, and de-identification methods depend entirely on the data content and context.
+De-identification is performed by the depositor or their proxy, but the data curator should be aware of considerations for this process that may need to be discussed with depositors. Human participant data that has not been de-identified in many cases may not be shared openly or even under restricted access conditions. This section will therefore introduce a high-level description of content that can be shown to the depositor as evidence of the need for de-identifying actions. This is an introduction to a complex subject, and de-identification methods depend entirely on the data content and context.
### Types of identifiers
@@ -179,7 +181,7 @@ These tend to be more nuanced and may be difficult for you, as a curator, to rec
# Steps for Screening De-identified Data for Remaining Risk
-The following is an initial set of procedures for reviewing data deposits for remaining direct and indirect (quasi) identifiers that could be considered Personally Identifiable Information (PII) or Personal Health Information (PHI) that might remain in the dataset and pose a risk level that is beyond the threshold for public access. We welcome our data curation community to build these initial steps into guidelines that can be thoroughly and efficiently applied. A particular need is an efficient and accurate method to calculate risk thresholds of indirect (quasi) identifiers that meet privacy industry standards of professional data repositories for public access data. The steps discussed here are currently “manual” visual screening. The software listed in this primer is not currently sophisticated to a point where such screening can be automated to any degree, nor used without sufficient knowledge and experience of privacy risk assessment and anonymization techniques.
+The following is an initial set of procedures for reviewing data deposits for remaining direct and indirect (quasi) identifiers that could be considered Personally Identifiable Information (PII) or Personal Health Information (PHI) that might remain in the dataset and pose a risk level that is beyond the threshold for public access. We welcome our data curation community to build these initial steps into guidelines that can be thoroughly and efficiently applied. A particular need is an efficient and accurate method to calculate risk thresholds of indirect (quasi) identifiers that meet privacy industry standards of professional data repositories for public access data. (Update: an initial resource for this purpose has now been created and released by the Portage Network's Sensitive Data Expert Group, as the ["Sensitive Data Toolkit for Researchers Part 2: Human Participant Research Data Risk Matrix"](https://zenodo.org/records/4088954)). The steps discussed below are “manual” visual screening. The software listed in this primer is not currently sophisticated to a point where such screening can be automated to any degree, nor used without sufficient knowledge and experience of privacy risk assessment and anonymization techniques.
### Privacy risk screening steps:
@@ -189,23 +191,23 @@ As addressed in the prior section.
#### 2. Ask for the codebook, data dictionary, or other documentation of variables and data elements.
-The codebook, especially one with sufficient description and parameters of variables and collected data elements, is particularly useful for an initial review of potential identifiers. The codebook should show which are direct identifiers that should have been removed or masked with pseudonyms or codes. The codebook should also indicate which variables are indirect (quasi) identifiers. Note which elements could be linked to external information that could be publicly known. Also consider how combinations of indirect (quasi) identifiers could make one or more records more uniquely identifiable than others. These indirect (quasi) identifiers, individually or in combination, will have lower counts of records under some parameters, such as extreme age, specific locations, demographic or other features that are unusual for some participants for that sampled population. Such participants are at more risk for reidentification from publicly knowable information, be it from hackers or inadvertently from family members or Facebook posts.
+The codebook, especially one with sufficient description and parameters of variables and collected data elements, is particularly useful for an initial review of potential identifiers. The codebook should indicate which variables are direct identifiers that should be removed or masked with pseudonyms or codes. The codebook should also help determine which variables are indirect (quasi) identifiers. Note which elements could be linked to external information that could be publicly known. Also consider how combinations of indirect (quasi) identifiers could make one or more records more uniquely identifiable than others. These indirect (quasi) identifiers, individually or in combination, will have lower counts of records under some parameters, such as extreme age, specific locations, demographic or other features that are unusual for participants from the sampled population. Such participants are at more risk for reidentification from publicly knowable information, be it from hackers or inadvertently from family members or Facebook posts.
-Consider creating a version of the codebook for reviewing the data and potentially sending back to the depositor with comments (see Step 5). Mark which variables to check in the dataset that are potential direct or indirect identifiers and include a brief description of the risk to potentially report back to the depositor. Ideally, the depositor can also provide documentation on which variables were transformed or de-identified.
+Consider creating a copy of the codebook for reviewing the data and potentially sending it back to the depositor with comments (see Step 5). Mark which variables to check in the dataset that are potential direct or indirect identifiers and include a brief description of the risk to potentially report back to the depositor. Ideally, the depositor can also provide documentation on which variables were transformed or de-identified.
-#### 3. Review data for remaining direct identifiers.
+#### 3. Review data for direct identifiers.
-All depositors should have removed direct identifiers or masked them with codes or pseudonyms. If any apparent direct identifiers remain in the data, send the files back to the depositor for remediation. (Direct identifiers should also be removed for any restricted access repository.)
+All depositors should have removed direct identifiers or masked them with codes or pseudonyms before submitting. If any apparent direct identifiers remain in the data, send the files back to the depositor for remediation. (Direct identifiers should also be removed for any restricted access repository.)
-It is essential to securely delete (i.e. remove all backups) the deposited data that contains direct identifiers because these may well contain privacy violations. Two areas to check specifically as potential quasi-direct identifiers with re-identification risk are dates and geography more specific than US States, including datasets “skewed” toward populations from a particular area, such as students of a particular faculty member’s department.
+It is essential to securely delete (i.e. remove all backups) the deposited data that contains direct identifiers because these may well contain privacy violations.
-#### 4. Locate all indirect (quasi) identifiers that could possibly link to external datasets.
+#### 4. Locate all indirect (quasi) identifiers that could link to external datasets.
This step is often the most challenging. It may be difficult to determine the degree of risk a given indirect (quasi) identifier may have of being linked to knowable external information. Also, calculating risk of combined indirect (quasi) identifiers is time consuming, often not easily accomplished with software, and requires some expertise in privacy risk assessment to properly evaluate.
Ideally, indirect (quasi) identifier risk should be measured against a risk threshold. The K-Anonymity level, developed by Latanya Sweeney (2000), is an example of a more basic risk threshold measure. A K-Anonymity level of 3, for example, means that there should be no fewer than 3 records (i.e., participants in the dataset) that match either a single indirect (quasi) identifier, or set of matching indirect (quasi) identifiers that have potential risk. The privacy industry and professional data repositories, however, typically set risk thresholds at levels of K=11 to 20. It may be challenging for researchers to meet these levels, especially for smaller datasets. It is equally challenging for data curators to calculate risk thresholds in datasets.
-Often the most practical and responsible approach is to point out which indirect (quasi) identifiers or combinations of such seem potentially risky and ask the researcher to review that risk. The depositor, with your help when possible, should be willing to apply remediation steps, and/or explain any transformation steps already applied to those indirect (quasi) identifiers or sets of records with potential risk. When remediation is not feasible, such as when requiring advanced statistical anonymization techniques, suggest that the researcher consider restricted access repositories.
+Often the most practical and responsible approach is to point out which indirect (quasi) identifiers or combinations of such seem potentially risky and ask the researcher to review that risk. (Two areas to check specifically as potential quasi-direct identifiers with re-identification risk are dates and geography more specific than US States, including datasets “skewed” toward populations from a particular area, such as students of a particular faculty member’s department.) The depositor, with your help when possible, should be willing to apply remediation steps, and/or explain any transformation steps already applied to those indirect (quasi) identifiers or sets of records with potential risk. When remediation is not feasible, such as when requiring advanced statistical anonymization techniques, suggest that the researcher consider restricted access repositories.
#### 5. Report back to the depositor any variables or data elements that appear to pose a risk.
@@ -215,7 +217,7 @@ Consider formatting the report as a simple table based on the codebook or data d

-Depositors will then need to decide about what remediation they can make, and then resubmit a new version of the data, ideally with documentation of the changes they applied. Curators should then give the new version another round of screening.
+Depositors will need to decide what remediation they can implement, and then resubmit a new version of the data, ideally with documentation of the changes they applied. Curators should give the new version another round of screening.
In responding to depositors, make clear that, as librarians and data curators, we can only give opinions about remaining risks, and we have not been authorized by IRB or other compliance offices to officially declare a dataset free of privacy risk. Often, however, IRB and compliance officers have no training in privacy risk screening or anonymization techniques. In such cases, consider reminding the depositor that researchers, and ultimately the project principal investigator, take final responsibility for violations of privacy from remaining risk in data released publically. This should ideally be stated in a deposit agreement signed by the depositor and/or principal investigator.
@@ -237,17 +239,15 @@ Recommendations/further conversation: Depending on what is discovered during the
*Example:* The consent form of an IRB-submitted study specifies that data will only be shared with the research team, but the researcher would like to distribute de-identified data publicly. You refer the depositor back to the IRB. The IRB instructs the depositor...
-- to re-consent participants if they wish to share data more widely. The researcher determines that the effort required would be prohibitive, and decides not to distribute their data.
-- that they may share their data. The depositor would like you to publish their data based on this approval.
-- that the study was exempted, so the IRB declines to provide guidance. The depositor returns to you for advice.
+- ...to re-consent participants if they wish to share data more widely. The researcher determines that the effort required would be prohibitive, and decides not to distribute their data. OR
+- ...that they may share their data. The depositor would like you to publish their data based on this approval. OR
+- ...that the study was exempted, so the IRB declines to provide guidance. The depositor returns to you for advice.
**5. Deposit rejection:** You should know and be able to articulate under what conditions your repository will not accept a deposit due to human participant considerations. This information should also be clearly available to potential depositors before they submit data for curation. Even in situations where a deposit has to be turned away or accepted in less-than-ideal condition, conversations started as part of the curation process can lead to improvements in future practices. Consider what information you are providing about policies or guidelines for datasets accepted, and in what venues.
# Educational opportunities
-Educational opportunities may present themselves via direct communication with researchers before or during a data submission, or as part of campus presentations. You may want to be prepared to answer questions and concerns related to data sharing resistance. Some literature shows that researchers believe participants won’t allow their data to be shared if given the opportunity, but the opposite may be true. One research study found that less than 1% of survey respondents said they think data should be destroyed, and over 90% said data should be made available for verification or reuse. “I think it should be stored and shared. Why would you create a study without intending to share the data?” (Bottesini, Rhemtulla, and Vazire, 2018). There's also a study on clinical trial data that says similar things - very few respondents were concerned about the risks of data sharing, and only 8% thought the risks outweighed the benefits (Mello, Lieou, and Goodman, 2018). And at times, populations will declare a desire and need for openly shared data about themselves. Indeed, the National Center for Transgender Equality calls for increased research and data with trans people because “we lack official information about unemployment rates, income and poverty, drug and alcohol abuse, suicide, and all other data that are regularly measured in the general population” (n.d.).
-
-
+Educational opportunities may present themselves via direct communication with researchers before or during data submission, or as part of campus presentations. You may want to be prepared to answer questions and concerns related to data sharing (or resistance to it). Some literature shows that researchers believe participants won’t allow their data to be shared if given the opportunity, but the opposite may be true. One research study found that less than 1% of survey respondents said they think data should be destroyed, and over 90% said data should be made available for verification or reuse. “'I think it should be stored and shared. Why would you create a study without intending to share the data?'” (Bottesini, Rhemtulla, and Vazire, 2018). There's also a study on clinical trial data that says similar things - very few respondents were concerned about the risks of data sharing, and only 8% thought the risks outweighed the benefits (Mello, Lieou, and Goodman, 2018). And at times, populations will declare a desire and need for openly shared data about themselves. Indeed, the National Center for Transgender Equality calls for increased research and data regarding trans people because “we lack official information about unemployment rates, income and poverty, drug and alcohol abuse, suicide, and all other data that are regularly measured in the general population” (n.d.).
Some general guidelines for depositors to consider in planning for future data collection (see Meyer, 2018) include refraining from explicit statements about destroying or not sharing data, or promising that analysis of collected data will be limited to certain topics (unless there are strong reasons for this, and they have a concrete plan for how to enact it). On the other hand, requesting consent to retain and share data, incorporating data retention and sharing clauses into IRB templates or applications, and working with a data repository before data collection can all help ensure that data are able to be appropriately shared at the close of research.
@@ -255,29 +255,29 @@ Some general guidelines for depositors to consider in planning for future data c
# Glossary of terms
-**Data anonymization:** The process of encrypting or removing personally identifiable information within a data source. Personally identifiable information may include direct or indirect (quasi) identifiers. Pseudonymization is a similar process to make data less identifiable, but this data may still be tracked back to an individual. In contrast, anonymized data is not personally identifiable. Because of the difficulty in making human participant data completely anonymous, the term "de-identification" is often used for this process, especially in the United States.
+**Data anonymization:** The process of encrypting or removing personally identifiable information within a data source. Personally identifiable information may include direct or indirect (quasi-) identifiers. Pseudonymization is a similar process to make data less identifiable, but this data may still be tracked back to an individual. In contrast, anonymized data is not personally identifiable. Because of the difficulty in making human participant data completely anonymous, the term "de-identification" is often used for this process, especially in the United States.
**Data disclosure risk:** An assessment to evaluate the amount of potential for a participant’s identity to be discovered and shared without his/her explicit permission. A curator may be able to assess risk simply by checking for direct and indirect (quasi) identifiers, but a deeper assessment of disclosure risk may require more advanced methods that need to be performed by a statistician, honest broker, or other expert (see Statistical Disclosure Control).
-**De-identification:** A standard by which potentially sensitive information is evaluated and personally identifying content is removed before data may be shared. Personally identifying content may include direct or indirect (quasi) identifiers. Example methods include expert determination and safe harbor. This process is not performed by the data curator, but exposed identifiers may be brought up to the data depositor in communications about publishing and reuse. The data depositor or their proxy would need to perform de-identification of the data.
+**De-identification:** A standard by which potentially sensitive information is evaluated and personally identifying content is removed before data may be shared. Personally identifying content may include direct or indirect (quasi-) identifiers. Example methods include expert determination and safe harbor (see Expert determination and Safe Harbor, below). This process is not performed by the data curator, but exposed identifiers may be brought up to the data depositor in communications about publishing and reuse. The data depositor or their proxy would need to perform de-identification of the data.
**Direct identifiers:** Information that when used alone may identify specific individuals, such as a name, telephone number or address.
-**Expert determination:** A HIPAA de-identification standard that relies on statistical and scientific methodologies. This method may be more suitable than Safe Harbor to address indirect (quasi) identifiers.
+**Expert determination:** A HIPAA de-identification standard that relies on statistical and scientific methodologies. This method may be more suitable than Safe Harbor to address indirect (quasi) identifiers. See also [Guidance on Satisfying the Expert Determination Method](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#guidancedetermination).
-**HIPAA:** (Health Insurance Portability and Accountability Act of 1996) United States law that provides privacy standards to protect identifying health information. HIPAA establishes conditions for sharing and reuse of health information by researchers. See: [Research/HHS.gov](Research/HHS.gov).
+**HIPAA:** The Health Insurance Portability and Accountability Act of 1996. HIPAA is a United States law that provides privacy standards to protect identifying health information. It establishes conditions for sharing and reuse of health information by researchers. See: [Research/HHS.gov](Research/HHS.gov).
**Honest broker:** An honest broker is a third party group or individual working with the researcher who manages the de-identification of data to ensure only non-identifying data goes to the research team and any other appropriate outlets.
**Indirect identifiers:** Information that can be combined with other information to identify specific individuals. For example, a birth date combined with a geographic location may reduce possibilities significantly to allow identification of a study participant. Identification of study participants must be avoided through de-identification of data. See also: Quasi-identifiers.
-**Quasi-identifiers:** An alternate term for indirect identifiers that has the same meaning. Usage of this term includes professionals engaged in curation for human subjects data. However, many help sources use the common term indirect identifiers.
-
-**Statistical Disclosure Control (aka SDC):** Advanced statistical techniques used in quantitative research to ensure that no person or organization is identifiable from the results of an analysis of survey or administrative data, or in the release of microdata (individual or household level data rather than aggregate statistics).
+**Quasi-identifiers:** An alternate term for indirect identifiers that has the same meaning. Some professionals engaged in curation for human subjects data use this term. However, many help sources use the common term indirect identifiers.
**PHI (Personal Health Information; also Protected Health Information):** Information about an individual’s health and the provision or payment of their healthcare. The HIPAA privacy rule provides United States federal protections for PHI.
-**Safe Harbor:** A HIPAA de-identification standard consisting of a list of 18 criteria that may increase the risk of identification of individuals. This method is popular and simplistic, but minimally addresses indirect (quasi) identifiers.
+**Safe Harbor:** A HIPAA de-identification standard consisting of a list of 18 criteria that may increase the risk of identification of individuals. This method is popular and simplistic, but minimally addresses indirect (quasi) identifiers. See also [Guidance on Satisfying the Safe Harbor Method](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#safeharborguidance).
+
+**Statistical Disclosure Control (aka SDC):** Advanced statistical techniques used in quantitative research to ensure that no person or organization is identifiable from the results of an analysis of survey or administrative data, or in the release of microdata (individual or household level data rather than aggregate statistics).
# Bibliography and Further Reading
@@ -311,19 +311,22 @@ Sweeney, L. (2000). Simple Demographics Often Identify People Uniquely. Carnegie
Yoo, J. S., Thaler, A., Sweeney, L., & and Zang, J. (2018). Risks to Patient Privacy: A Re-identification of Patients in Maine and Vermont Statewide Hospital Data. Technology Science. [https://techscience.org/a/2018100901](https://techscience.org/a/2018100901)
-# Appendix A Links to sources on de-identification
+# Appendix A: Links to sources on de-identification
El Emam, Khaled; Arbuckle, Luk. (2013). Anonymizing health data case studies and methods to get you started. Sebastopol, California : O’Reilly Media.
El Emam, Khaled. (2013). Guide to the de-identification of personal health information. Boca Raton, Fla. : CRC Press.
+Fearon, Dave. (2019). “Guides: Protecting Identifiers in Human Subjects Data.” Accessed February 7, 2020.
+
HITRUST De-identification Methodology Training [https://hitrustalliance.net/hitrust-academy/](https://hitrustalliance.net/hitrust-academy/)
Rights (OCR), O. for C. (2012, September 7). Methods for De-identification of PHI. Retrieved September 11, 2019, from HHS.gov website: [https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html)
-Fearon, Dave. (2019). “Guides: Protecting Identifiers in Human Subjects Data.” Accessed February 7, 2020. []().
+# Appendix B: Links to sources on consent documentation
-# Appendix B Links to sources on consent documentation
+#### DCN
+Hunt, Shanda; Hofelich Mohr, Alicia; and Woodbrook, Rachel. (2021). [Curation of Data Collected by Informed Consent Data Curation Primer](https://github.com/DataCurationNetwork/data-primers/blob/master/Consent%20Forms%20Data%20Curation%20Primer/consent-forms-data-curation-primer.md). Data Curation Network GitHub Repository.
#### ICPSR
@@ -339,7 +342,7 @@ Fearon, Dave. (2019). “Guides: Protecting Identifiers in Human Subjects Data.
- [IRB-HSBS Biospecimen Consent Template](https://research-compliance.umich.edu/new-irb-hsbs-biospecimen-consent-template) with data sharing language
- [IRB-HSBS General Informed Consent Template](https://research-compliance.umich.edu/new-irb-hsbs-general-informed-consent-template) with data sharing language
-# Appendix C Human Participant CURATED checklist
+# Appendix C: Human Participant CURATED checklist
Adapted from the [CURATED steps and checklists](https://datacurationnetwork.org/outputs/workflows/) by Data Curation Network.