Learning Data Language

We at Alberta First Nations Information Governance Centre are compiling a list of glossary terms often used in our industry that can be a bit confusing to the average person. Every sector can get caught up in its own jargon and forget that others do not use the same terms every day.

These first few terms have to do with data and information privacy. They help you understand what happens to your information when you fill a regional health survey or participate in research as a First Nations person in Alberta.

But first, a little story to show these terms in action.

When Francis participated in a Regional Health Survey (RHS), she was excited to know about the importance of her contribution because the survey aimed to gather crucial data on the health and well-being of First Nations communities across the province. As she filled out the questionnaire, Francis shared details about her health status, lifestyle, and access to medical services. Although she shared personal and sensitive information, she was reassured by the research team’s commitment to data privacy and de-identification.

De-identification: De-identified data refers to information from which personally identifiable details have been removed, making it difficult or impossible to trace the data back to an individual. This process typically involves stripping out or masking names, addresses, registration numbers, and other unique identifiers. De-identification is a critical step in protecting privacy while still allowing for the use and analysis of the data for research, statistical, or other purposes. For example, medical records can be de-identified so that researchers can study health trends without compromising patient privacy.

How your privacy is managed

After Francis submitted her responses, the research team went to work. They anonymized the data by removing all personally identifiable information, such as her name, address, and any unique identifiers. Instead of listing her specific age, the data recorded her age range. Her responses were then combined with those of thousands of other participants on a community level, on a regional level and on a national level. This process ensured that no single individual’s data could be traced back to them. The anonymized data was crucial for identifying health trends, disparities, and areas needing improvement without compromising the privacy of individuals like Francis.

Anonymization: Like de-identification, anonymization is the process of removing or altering personal information from a data set so that individuals cannot be identified directly or indirectly. The goal of anonymization is to protect individual privacy while allowing the data to be used for research, analysis, and other purposes. Unlike pseudonymization, anonymization is intended to be irreversible, ensuring that once the data is anonymized, it is not possible to re-identify individuals. For example, in a health study, anonymized data might include patient age ranges and general health conditions, but not specific dates of birth or medical record numbers.

When we conduct the Alberta Regional Health Survey, each participant is given a Respondent Unique Identifier so that if anyone chooses to have their responses removed later on, we can pull out and destroy their data sets. This means that RHS data is pseudonymized.

Pseudonymization: Pseudonymization is a data processing technique where personally identifiable information within a data set is replaced with artificial identifiers or pseudonyms. This process makes it more difficult to identify individuals from the data without additional information that is kept separately. Unlike anonymization, which aims to remove all identifiable information permanently, pseudonymization allows data to be re-identified, if necessary, by using separate information. This technique enhances data privacy while enabling data to be used for research, analysis, and other purposes where direct identification of individuals is not required. For instance, in a medical study, patient names and addresses might be replaced with unique codes, allowing researchers to analyze the data without knowing the identities of the patients.

In short, De-identification and Anonymization both mean removing personal information from data sets and Pseudonymization means replacing private data with fake identifiers to protect privacy.

So, what happened to Francis’ data?

The anonymized data was then used to create comprehensive reports and analyses. These reports helped healthcare providers and policymakers understand the unique health challenges faced by First Nations communities in Alberta and across Canada. They informed decisions on resource allocation, the development of targeted health programs, and the advocacy for necessary funding. For Francis, knowing that her pseudonymized data contributed to meaningful improvements in community health was empowering. It highlighted the importance of participating in such surveys and the trustworthiness of the processes in place to protect her privacy.

Stay tuned for more data language articles to come. If there are terms you always wanted to know more about with context and illustration, let us know!