clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data.

Pubmed ID: 37859643

Pubmed Central ID: PMC10584088

Journal: Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

Publication Date: Jan. 1, 2023

Grants: U24 AA026969, R01 HL095086, R01 NS103475, U54 GM115458

Authors: Zhang Y, Zhou J, Tu W

Cite As: Zhou J, Zhang Y, Tu W. clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data. J Comput Graph Stat 2023;32(3):1131-1144. Epub 2023 Jan 12.

Studies:

Abstract

Longitudinal data clustering is challenging because the grouping has to account for the similarity of individual trajectories in the presence of sparse and irregular times of observation. This paper puts forward a hierarchical agglomerative clustering method based on a dissimilarity metric that quantifies the cost of merging two distinct groups of curves, which are depicted by <i>B</i>-splines for the repeatedly measured data. Extensive simulations show that the proposed method has superior performance in determining the number of clusters, classifying individuals into the correct clusters, and in computational efficiency. Importantly, the method is not only suitable for clustering multivariate longitudinal data with sparse and irregular measurements but also for intensely measured functional data. Towards this end, we provide an R package for the implementation of such analyses. To illustrate the use of the proposed clustering method, two large clinical data sets from real-world clinical studies are analyzed.