A Novel Temporal Clustering Technique and Quality Evaluation Measure for Group Record Linkage

Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains need to be linked to allow advanced data analysis. A popular type of data used in such a context are historical registries containing birth, death, and marriage certificates. Once such data sets are linked, pedigrees for full populations can be constructed. These will facilitate novel studies to, for example, investigate how education, health, mobility, and employment have influenced the lives of people over several generations. In this talk we will present our recently developed novel temporal clustering approach which is aimed at linking records for a group of individuals, such as all births by the same mother, and where temporal constraints need to be enforced, such as intervals between births. We then present a novel cluster quality evaluation measure that categorises each individual record according to the quality of the cluster the record has been linked into. Experiments on a real Scottish data set show the superiority of our novel temporal clustering approach over a previous approach for group record linkage. These experiments also highlight the need for novel quality evaluation measures for group record linkage. This work was conducted with Ms Charini Nanayakkara and Dr Thilina Ranbaduge.

For details about the temporal techniques see:

Peter Christen is a Professor at the Research School of Computer Science at the Australian National University. He received his Diploma in Computer Science Engineering from ETH Zurich in 1995 and his PhD in Computer Science from the University of Basel in 1999. His research interests are in data mining and record linkage, with a focus on machine learning and privacy-preserving techniques for record linkage. He has published over 140 articles in these areas, including in 2012 the book `Data Matching' published by Springer.

Charini Nanayakkara is currently working as a PhD student at the Australian National University (ANU), where the focus of her research is on record linkage techniques for complex historical birth, marriage, death, and census data. She received her BSc (Hons) degree in Computer Science from the University of Colombo School of Computing, Sri Lanka, in 2016. Prior to joining the ANU as a PhD student in March 2018, she was employed as a Software Engineer at WSO2 Lanka Pvt. Ltd for two years. Charini's research is part of the Digitising Scotland project (

Thilina Ranbaduge is a research fellow at the Australian National University (ANU) Research School of Computer Science. His research interests are in data mining, and in multidatabase and privacy-preserving record linkage. He received his PhD in Computer Science from the ANU in 2018 and completed his PG.Dip and BSc (Hons)at the University of Moratuwa, Sri Lanka, in 2013 and 2009,respectively.

A Novel Temporal Clustering Technique and Quality Evaluation Measure for Group Record Linkage

Fecha 2 de julio de 2019
Hora 9:00
Idioma Inglés
Lugar de celebración Salón de actos
Sede del Instituto de Estadística y Cartografía de Andalucía
Pabellón de Nueva Zelanda
C/ Leonardo Da Vinci, 21. Isla de La Cartuja
Inscripción Entrada libre mediante inscripción previa en el siguiente formulario.
Cómo llegar Líneas de autobuses urbanos TUSSAM: Líneas C1 y C2. Parada en Facultad de Comunicación.