For the CANDID-4-AI project, you can upload different clinical data files which undergo a date offset, as captured in MRN/Base Exam Date above. As a result of that data offset and anonymization, the anonymization preview will include offset dates and other anonymized data per the project data protocol.
The date offsets and anonymizations can only be applied if clinical data file templates specific to this project are used to upload data.
Comorbidities
For Comorbidities clinical data files, the headers should be MRN, DIAGNM, ICD10CODE, CONTACTDATECOM. The contents of the CONTACTDATECOM field must be formatted as a date for the offset to apply.
CT Exams
For CT Exams clinical data files, the headers should be MRN, LURADS, DOE, ENCOUNTERID, STUDYDESC, EXAMREASON, EXAMTYPE. The contents of the DOE field must be formatted as a date for the offset to apply.
ICD03 Codes
For ICD03 Codes clinical data files, the headers should be MRN, ICDO3, CONTACTDATEICDO3. The contents of the CONTACTDATEICDO3 field must be formatted as a date for the offset to apply.
Demographics
For Demographics clinical data files, the headers should be MRN, COHORT, PRIORT, SITEID, DOB, SEX, ADDR, RACE, ETHN, EDU, DD, STG-CP, STG, LFU, SS, PY, YSQ. The content of the DOB field must be formatted as a date for the offset to apply.
Uploading the Demographics file initiates API calls to retrieve publicly available socioeconomic data (SES) from Census. These calls are done prior to anonymization and calculation of the date offset and use the patient’s address in the ADDR column to return 7 SES variables. These variables are inserted into the file:
- Median Household Income Value
- Median House Value
- Median Gross Rent
- % Below 150% Poverty
- Education Index
- % Unemployed
- % Working Class
The contents of the ADDR field are removed except for the State and the column header is changed from “ADDR” to “STATE”. If the contents of the ADDR field are empty, the API call retrieves no data. In these instances, the contents are marked “Not Found”
Contents of the date of birth field (“DOB”) are anonymized to only show the year of birth
Diagnosis date is anonymized using the date offset
An “AGE” column is added to capture the patient’s age at the time of diagnosis. The Rule of 89 is applied. If the patient is determined to be 89 years or older at the time of diagnosis, by default the patient’s AGE is set to “90” and the DOB is set to “1934”. NOTE: This year will increment by “1” each year.