Record Linkage - Introduction, Recent Advances, and Privacy Issues

The aim is to make this tutorial as accessible as possible to a wide ranging audience from various backgrounds. The content will focus on concepts and techniques rather than details of algorithms. Basic understanding in databases, algorithms, and probabilities will be beneficial but not required. The tutorial will be based on the book “Data Matching – Concepts and techniques for Record Linkage, Entity Resolution and Duplicate Detection” (Springer, 2012) written by the presenter.

Peter Christen is a Professor at the Research School of Computer Science at the Australian National University. He received his Diploma in Computer Science Engineering from ETH Zurich in 1995 and his PhD in Computer Science from the University of Basel in 1999.

His research interests are in data mining and data matching (record linkage), with a special focus on privacy-preserving techniques for record linkage, data integration, and data mining. He has published over 140 articles in these areas, including in 2012 the book `Data Matching' published by Springer. He is the principle developer of the Febrl (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system.

For more details see:

Record Linkage - Introduction, Recent Advances, and Privacy Issues

Fecha 2 de julio de 2019
Hora 11:00 a 15:00
Idioma Inglés
Lugar de celebración Salón de actos
Sede del Instituto de Estadística y Cartografía de Andalucía
Pabellón de Nueva Zelanda
C/ Leonardo Da Vinci, 21. Isla de La Cartuja
Inscripción Entrada libre mediante inscripción previa en el siguiente formulario.
Cómo llegar Líneas de autobuses urbanos TUSSAM: Líneas C1 y C2. Parada en Facultad de Comunicación.

Tutorial outline:

  • Part 1: Record linkage introduction, short history of record linkage, applications, and the record linkage process (overview of the main steps)
  • Part 2: Detailed discussion of all steps of the record linkage process (data cleaning and standardisation, indexing/blocking, field and record comparisons, classification, and evaluation), and core techniques used in these steps.
  • Part 3: Advanced record linkage techniques with a focus on linking databases containing personal information (such as those occurring in health and national census), including collective, group and graph linking techniques, as well as advanced indexing techniques that enable large-scale record linkage, and if time-permitting linking temporal and dynamic data, as well as real-time record linkage.
  • Part 4: Major concepts, protocols and challenges used in privacy-preserving record linkage with the aim to link databases across organisations without revealing any private or confidential information.