ژنـ پروتکل، مرجع جدیدترین یافته های علمی، روشها، پروتکل ها، کتاب ها و اطلس ها

  • geneprotocols@gmail.com

ناهمگونی در ژنتیک آماری - نحوه ارزیابی و محاسبه اختلاط در مطالعات همخوانی Heterogeneity in Statistical Genetics

  • ۳
ناهمگونی در ژنتیک آماری - نحوه ارزیابی و محاسبه اختلاط در مطالعات همخوانی Heterogeneity in Statistical Genetics

فهرست مطالب:

1 Introduction to Heterogeneity in Statistical Genetics . . . . . . . . . . . . . . . 1

1.1 Different Types of Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 A Note on Definitions and Notation Throughout This Book . . . . . . . 7

1.3 Hardy–Weinberg Equilibrium (HWE) Proportions and Their

Importance in Gene-Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Determination of Conditional Genotype Frequencies . . . . . . . . . . . . 9

1.4.1 Genetic Model-Free Approaches . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.1.1 Locus Genotype Frequencies Follow HWE

Proportions in Both Populations . . . . . . . . . . . . . . . . 10

1.4.1.2 Locus Genotype Frequencies Follow HWE

Proportions in One Population but Not Both . . . . . 11

1.4.1.3 Locus Genotype Frequencies Follow HWE

Proportions in Neither Population . . . . . . . . . . . . . . 12

1.4.2 Genetic Model-Based Approach Through the Use

of Genotype Relative Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4.2.1 Genetic Model-Based Approach Through

the Use of Logistic Model . . . . . . . . . . . . . . . . . . . . . 18

1.5 The Box (and Whiskers) Plot as a Tool for Visualizing

Empirical Data Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.6 Power and Minimum Sample Size (MSSN) for Different

Statistical Tests of Genetic Association . . . . . . . . . . . . . . . . . . . . . . . . 19

1.6.1 Contingency Table for Organizing Categorical

Phenotype and Genomic-Data . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.6.1.1 Formula for Chi-Square Test

of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.6.1.2 (Cochran-Armitage) Test of Trend . . . . . . . . . . . . . . 23

1.6.1.3 The Transmission Disequilibrium Test

for Detecting Linkage in the Presence

of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.6.1.4 Computing Power and MSSN for Tests

of Genetic Association . . . . . . . . . . . . . . . . . . . . . . . . 25

1.7 The Expectation–Maximization (EM) Algorithm . . . . . . . . . . . . . . . . 28

xvxvi Contents

1.7.1 Example Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.7.1.1 Implementation of the Algorithm . . . . . . . . . . . . . . . 30

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Overview of Genomic Heterogeneity in Statistical Genetics . . . . . . . . . 53

2.1 Heterogeneity Due to SNP Genotype Misclassification . . . . . . . . . . . 53

2.2 Examples of How Genotype Misclassification May Arise

in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.3 Mathematical Models of Genotype Misclassification . . . . . . . . . . . . . 57

2.4 Genotype Misclassification for Genomic Data with Three

or More Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.5 Effects of Misclassification on Statistical Tests . . . . . . . . . . . . . . . . . . 59

2.5.1 Non-differential Misclassification Error . . . . . . . . . . . . . . . . . 59

2.5.2 Differential Misclassification Error . . . . . . . . . . . . . . . . . . . . . 61

2.5.3 Non-differential Misclassification in Family-Based

Tests of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

2.6 Errors in Next-Generation Sequencing (NGS) . . . . . . . . . . . . . . . . . . 71

2.6.1 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.6.1.1 What Are Estimated NGS Probabilities

for Empirical Data? . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.6.2 Mathematical Model for NGS Data . . . . . . . . . . . . . . . . . . . . . 82

2.6.3 Empirical Type I Error for Test Statistics Applied

to NGS Data with Sequence Error—Simulation

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

2.7 Non-misclassification Forms of Heterogeneity . . . . . . . . . . . . . . . . . . 86

2.7.1 Mathematical Model for Heterogeneity . . . . . . . . . . . . . . . . . . 86

2.7.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

2.7.1.2 Mathematical Model for Locus

Heterogeneity—Equations . . . . . . . . . . . . . . . . . . . . 88

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3 Phenotypic Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.1 Phenotype Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3.2 How Phenotype Misclassification May Arise in Practice . . . . . . . . . 101

3.2.1 Lack of Access to Gold-Standard Classification . . . . . . . . . . 101

3.2.2 Variability of Phenotype Expression over Time . . . . . . . . . . . 102

3.2.3 Variable Age of Onset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.2.4 Incomplete Knowledge of Gold-Standard

Classifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.2.5 Model Misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

3.3 Effects of Misclassification on Statistical Tests . . . . . . . . . . . . . . . . . . 104

3.3.1 Non-differential Misclassification Error Example

for Single-Stage Genetic Association . . . . . . . . . . . . . . . . . . . 104

3.3.2 Why Do We Observe Such Large Power Loss/MSSN

Increase for Phenotype Misclassification? . . . . . . . . . . . . . . . 106Contents xvii

3.3.3 Multi-stage Phenotype Classification and Limits

of Observed Genotype Frequencies . . . . . . . . . . . . . . . . . . . . . 109

3.3.3.1 Conditional Genotype Frequencies

in Presence of Conditionally Independent

Phenotype Classification . . . . . . . . . . . . . . . . . . . . . . 110

3.3.3.2 Conditional Genotype Frequencies

in the Presence of Biased Phenotype

Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.4 Non-misclassification Forms of Heterogeneity . . . . . . . . . . . . . . . . . . 116

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4 Association Tests Allowing for Heterogeneity . . . . . . . . . . . . . . . . . . . . . 129

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.2 Statistical Tests that Use Genotype Data . . . . . . . . . . . . . . . . . . . . . . . 130

4.2.1 Likelihood Ratio Test that Allows for Random

Phenotype and Genotype Misclassification Error

(LRTae) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.2.1.1 Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . 130

4.2.1.2 Log-Likelihoods of the Observed Data . . . . . . . . . . 133

4.2.1.3 Test Statistic—Likelihood Ratio Test

Allowing for Error (LRTae) . . . . . . . . . . . . . . . . . . . . 137

4.2.1.4 Example Application . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.2.2 Trend Statistic that Allows for Random Phenotype

and Genotype Misclassification Error . . . . . . . . . . . . . . . . . . . 144

4.2.2.1 Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . 145

4.2.2.2 Log-Likelihoods of the Observed Data . . . . . . . . . . 146

4.2.2.3 Test Statistic—(Linear) Test of Trend

Allowing for Error (LTTae) . . . . . . . . . . . . . . . . . . . . 151

4.2.3 Likelihood Ratio Statistic for Family-Based

Association that Incorporates Genotype

Misclassification Errors (TDTae) . . . . . . . . . . . . . . . . . . . . . . . 151

4.2.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

4.2.3.2 Determination of the Bayesian Posterior

Probabilities (BPPs) τ ( (r abc ) )(xyz) . . . . . . . . . . . . . . . . . 157

4.2.3.3 TDTae Parameter Estimates . . . . . . . . . . . . . . . . . . . . 157

4.2.3.4 Log-Likelihood of Observed Data . . . . . . . . . . . . . . 160

4.2.3.5 The TDTae Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.3 Statistical Tests that Consider Heterogeneity Other Than

Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.3.1 Mixture Likelihood Ratio Test (MLRT) for Genetic

Association in the Presence of Locus Heterogeneity . . . . . . . 161

4.3.1.1 Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . 161

4.3.1.2 Log-Likelihoods of the Observed Data . . . . . . . . . . 163

4.3.1.3 Example Application . . . . . . . . . . . . . . . . . . . . . . . . . 165xviii Contents

4.3.1.4 Computing the MLRT Statistic

for the Example Data . . . . . . . . . . . . . . . . . . . . . . . . . 172

4.3.2 Transmission Disequilibrium Test that Allows

for Locus Heterogeneity (TDT-HET) . . . . . . . . . . . . . . . . . . . 174

4.3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

4.3.2.2 Determination of the BPPs τ (r)

(m)(abc) . . . . . . . . . . . . . 176

4.3.2.3 TDT-HET Parameter Estimates . . . . . . . . . . . . . . . . 177

4.3.2.4 Log-Likelihood of Observed Data . . . . . . . . . . . . . . 178

4.3.2.5 Computing the TDT-HET Statistic . . . . . . . . . . . . . . 179

4.3.2.6 Example Calculation . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.3.2.7 How TDT-HET Permutation p-Values Are

Computed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

4.3.2.8 A Proof of the Robustness of the TDT-HET

Statistic’s Type I Error When Null Data Are

Drawn from Multiple Sub-populations . . . . . . . . . . 183

4.3.3 Tests that Incorporate Phenotype Heterogeneity . . . . . . . . . . 185

4.3.3.1 Analysis of Data with R (Greater Than

Two) Phenotypes and C (Greater Than One)

Genomic Data Categories . . . . . . . . . . . . . . . . . . . . . 186

4.3.3.2 Example Application of Chi-Square Test

of Independence to Multiple Phenotypes . . . . . . . . 187

4.3.3.3 Does Modeling Phenotypic Heterogeneity

Increase Power for Detecting Association?

Results from Example Data . . . . . . . . . . . . . . . . . . . 193

4.3.3.4 Other Methods for Addressing Phenotype

Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

4.3.3.5 Morton’s M-Test for Heterogeneity Applied

to Different Groups of Phenotypes . . . . . . . . . . . . . . 197

4.4 Statistical Tests that Use Sequence Data . . . . . . . . . . . . . . . . . . . . . . . 203

4.4.1 Single-Variant and Multiple Variant Tests of Trend

for Genetic Association that Allows for Random

and Differential NGS Error LTTae,NGS


 . . . . . . . . . . . . . . . . . 203

4.4.1.1 Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . 204

4.4.1.2 Log-Likelihood of the Observed Data . . . . . . . . . . . 207

4.4.1.3 LTTae,NGS Parameter Estimates . . . . . . . . . . . . . . . . 208

4.4.1.4 LTTae,NGS Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

4.4.2 Transmission Disequilibrium Test that Allows

for Next-Generation Sequence Error (TDT1-NGS) . . . . . . . . 209

4.4.2.1 Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . 210

4.4.2.2 Log-Likelihood of the Observed Data . . . . . . . . . . . 214

4.4.2.3 TDT1-NGS Parameter Estimates . . . . . . . . . . . . . . . 215

4.4.2.4 TDT1-NGS Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . 217

4.4.2.5 Example Calculations . . . . . . . . . . . . . . . . . . . . . . . . 217

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237Contents xix

5 Designing Genetic Linkage and Association Studies

that Maintain Desired Statistical Power in the Presence

of Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

5.1 Parameter Settings, for Example, Calculations . . . . . . . . . . . . . . . . . . 247

5.1.1 Example Parameter Settings to Compute Power

for a Fixed Sample Size and Significance Level . . . . . . . . . . 248

5.1.2 Example Parameter Settings to Compute MSSN

for a Fixed Power and Significance Level . . . . . . . . . . . . . . . . 249

5.2 Statistical Tests that Use Genotype Data . . . . . . . . . . . . . . . . . . . . . . . 250

5.2.1 Power and MSSN for Population-Based Data

in the Presence of Non-differential Genotype

Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

5.2.1.1 Example Power Calculation . . . . . . . . . . . . . . . . . . . 252

5.2.2 Power and MSSN for Population-Based Data

in the Presence of Non-differential Phenotype

Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

5.2.2.1 Example MSSN Calculation . . . . . . . . . . . . . . . . . . . 254

5.2.3 Likelihood Ratio Test that Allows for Random

Phenotype and Genotype Misclassification Error

(LRTae)—Empirical Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

5.2.3.1 Genetic Model Parameters Determined

Using Two Loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

5.2.3.2 Conditional Two-Locus Genotype

Frequencies Based on Affection Status . . . . . . . . . . 256

5.2.3.3 Results of Simulations for LRTae Under

Two-Locus Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

5.2.4 Trend Statistic that Allows for Random Phenotype

and Genotype Misclassification Error . . . . . . . . . . . . . . . . . . . 276

5.2.5 Family-Based Tests of Association—Analytic

Solution to Increase in Rejection Rate for TDT

in the Presence of Genotype Misclassification Errors . . . . . . 277

5.2.5.1 Non-centrality Parameter and Inflation

in Rejection Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

5.2.6 Family-Based Tests of Association—Analytic

Solution to Increase in Rejection Rate for TDT

in the Presence of Phenotype Misclassification Errors . . . . . 284

5.2.6.1 Example MSSN Calculations for TDT

in the Presence of Phenotype

Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

5.3 Statistical Tests that Consider Heterogeneity Other Than

Misclassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

5.3.1 Sample Size Calculations in the Presence of Locus

Heterogeneity—Population-Based Tests of Genetic

Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

5.3.1.1 Example Power Calculation—Test of Trend . . . . . . 287xx Contents

5.3.1.2 Factors that Most Significantly Influence

MSSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

5.3.2 Power and Sample Size Calculations for Chi-Square

Tests of Independence on Allele and Genotype Data

for Phenotype Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

5.3.3 Family-Based Test of Linkage/Association . . . . . . . . . . . . . . . 293

5.3.3.1 Example MSSN Calculations for TDT

in the Presence of Locus Heterogeneity . . . . . . . . . 294

5.4 Power Calculations in the Presence of NGS Misclassification . . . . . 295

5.4.1 Test of Trend Applied to Multiple NGS Data for SNP

Loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

5.4.2 Increase in Interest for NGS Statistics . . . . . . . . . . . . . . . . . . . 297

5.4.3 Empirical Null and Power Simulations

for the LTTae,NGS Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

5.4.3.1 Empirical Type I Error (Null) Results . . . . . . . . . . . 299

5.4.3.2 Empirical Power Results . . . . . . . . . . . . . . . . . . . . . . 302

5.4.3.3 Additional Investigation of Three Factors . . . . . . . . 304

5.4.4 Factors that Most Significantly Affect Power

for NGS-Based TDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

6 Threshold-Selected Quantitative Trait Loci and Pleiotropy . . . . . . . . . 323

6.1 Quantitative Trait Locus with Single Phenotype . . . . . . . . . . . . . . . . . 323

6.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

6.1.2 Conditional Genotype Frequencies

for Threshold-Selected Phenotypes . . . . . . . . . . . . . . . . . . . . . 325

6.1.3 Example Sample Size Calculation

for Threshold-Selected Phenotypes . . . . . . . . . . . . . . . . . . . . . 326

6.1.4 Why Use Threshold-Selected Dichotomous

Phenotypes as Compared with Quantitative

Phenotypes? Power Comparison with ANOVA . . . . . . . . . . . 328

6.2 Quantitative Trait Locus with Multiple Phenotypes . . . . . . . . . . . . . . 329

6.2.1 Notation for Multivariate Quantitative Traits . . . . . . . . . . . . . 330

6.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

6.2.3 Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

6.2.4 Example MSSN Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

6.2.5 A Final Note on Advantages of the Threshold-Selected

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349

مشخصات فایل

عنوان (Title): Heterogeneity in Statistical Genetics_ How to Assess, Address, and Account for Mixtures in Association Studies
نام فایل (File name): 599-www.GeneProtocols.ir-Heterogeneity in Statistical Genetics_ How to Assess, Address, and Account for Mixtures in Association Studies(2020).pdf
عنوان فارسی (Title in Persian): ناهمگونی در ژنتیک آماری - نحوه ارزیابی و محاسبه اختلاط در مطالعات همخوانی
ایجاد کننده: Derek Gordon, Stephen J. Finch, Wonkuk Kim
زبان (Language): انگلیسی English
سال انتشار: 2020
شابک ISBN: 3030611205, 9783030611200
نوع سند (Doc. type): کتاب
فرمت (File extention): PDF
حجم فایل (File size): 6.22
تعداد صفحات (Book length in pages): 366
نظرات: (۰) نظر خود را ارسال کنید
ارسال نظر آزاد است، اما اگر قبلا در بیان ثبت نام کرده اید می توانید ابتدا وارد شوید.
شما میتوانید از این تگهای html استفاده کنید:
<b> یا <strong>، <em> یا <i>، <u>، <strike> یا <s>، <sup>، <sub>، <blockquote>، <code>، <pre>، <hr>، <br>، <p>، <a href="" title="">، <span style="">، <div align="">
تجدید کد امنیتی