Combining knowledge- and data-driven methods for de-identification of clinical narratives

Dehghan A.; Kovačević, Aleksandar; Karystianis G.; Keane J.; Nenad, Grba

Please use this identifier to cite or link to this item: https://open.uns.ac.rs/handle/123456789/5928

DC Field	Value	Language
dc.contributor.author	Dehghan A.	en
dc.contributor.author	Kovačević, Aleksandar	en
dc.contributor.author	Karystianis G.	en
dc.contributor.author	Keane J.	en
dc.contributor.author	Nenad, Grba	en
dc.date.accessioned	2019-09-30T08:51:18Z	-
dc.date.available	2019-09-30T08:51:18Z	-
dc.date.issued	2015-12-01	en
dc.identifier.issn	15320464	en
dc.identifier.uri	https://open.uns.ac.rs/handle/123456789/5928	-
dc.description.abstract	© 2015 Elsevier Inc. A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven (machine learning) methods with a large range of features to address de-identification of specific named entities. In addition, we have devised a two-pass recognition approach that creates a patient-specific run-time dictionary from the PHI entities identified in the first step with high confidence, which is then used in the second pass to identify mentions that lack specific clues. The proposed method achieved the overall micro F1-measures of 91% on strict and 95% on token-level evaluation on the test dataset (514 narratives). Whilst most PHI entities can be reliably identified, particularly challenging were mentions of Organizations and Professions. Still, the overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies.	en
dc.relation.ispartof	Journal of Biomedical Informatics	en
dc.title	Combining knowledge- and data-driven methods for de-identification of clinical narratives	en
dc.type	Journal/Magazine Article	en
dc.identifier.doi	10.1016/j.jbi.2015.06.029	en
dc.identifier.pmid	58	en
dc.identifier.scopus	2-s2.0-84939864819	en
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/84939864819	en
dc.relation.lastpage	S59	en
dc.relation.firstpage	S53	en
dc.relation.volume	58	en
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
crisitem.author.dept	Departman za računarstvo i automatiku	-
crisitem.author.dept	Departman za hemiju, biohemiju i zaštitu životne sredine	-
crisitem.author.parentorg	Fakultet tehničkih nauka	-
crisitem.author.parentorg	Prirodno-matematički fakultet	-
Appears in Collections:	PMF Publikacije/Publications

Show simple item record

SCOPUS^TM
Citations

42

checked on May 3, 2024

Page view(s)

36

Last Week
6

Last month
8

checked on May 3, 2024

Google Scholar^TM

Check

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM