On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set.

Pubmed ID: 26757198

Pubmed Central ID: PMC4761292

Journal: Annals of human genetics

Publication Date: March 1, 2016

MeSH Terms: Humans, Algorithms, Computer Simulation, Blood Glucose, Atherosclerosis, Software, Models, Genetic, Genetic Variation, Genetic Association Studies, Exome, Glucose-6-Phosphatase

Grants: HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, HHSN268201100012C, HHSN268201100001C, HHSN268201100002C, RC2 HL102419, HHSN268201000010C, R01 CA134848, GM083345, CA134848, R01 GM083345, HHSN268201100009I, HHSN268201100005G, HHSN268201100008I, HHSN268201100011I, HHSN268201100005I, HHSN268201100007I, HHSN268201100001I, HHSN268201100002I, HHSN268201000012C, HHSN268201000011C

Authors: Pankow JS, Guan W, Wu B

Cite As: Wu B, Guan W, Pankow JS. On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set. Ann Hum Genet 2016 Mar;80(2):123-35. Epub 2016 Jan 12.

Studies:

Abstract

The objective of this paper is to discuss and develop alternative computational methods to accurately and efficiently calculate significance P-values for the commonly used sequence kernel association test (SKAT) and adaptive sum of SKAT and burden test (SKAT-O) for variant set association. We show that the existing software can lead to either conservative or inflated type I errors. We develop alternative and efficient computational algorithms that quickly compute the SKAT P-value and have well-controlled type I errors. In addition, we derive an alternative and simplified formula for calculating the significance P-value of SKAT-O, which sheds light on the development of efficient and accurate numerical algorithms. We implement the proposed methods in the publicly available R package that can be readily used or adapted to large-scale sequencing studies. Given that more and more large-scale exome and whole genome sequencing or re-sequencing studies are being conducted, the proposed methods are practically very important. We conduct extensive numerical studies to investigate the performance of the proposed methods. We further illustrate their usefulness with application to associations between rare exonic variants and fasting glucose levels in the Atherosclerosis Risk in Communities (ARIC) study.