跳到主要內容區塊

國立臺灣大學統計與數據科學研究所

最新消息

【演講公告】9/22專題討論|廖振鐸教授:Composition and sample size determination for training set in genomic prediction

講者:廖振鐸教授(台大農藝學系教授/統計所合聘教師)
時間:111年9月22日(星期四)13:30
地點:台大次震宇宙館601室
講題:Composition and sample size determination for training set in genomic prediction
摘要:Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a GP model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals with genotypic data along. For a specified test set, we develop a highly efficient algorithm to determine an optimal subset from a candidate set in which the individuals have been genotyped but not phenotyped yet. The chosen subset serves as the training set to be phenotyped, and then the GP model is built using its phenotypic and genotypic data. In this study, we propose an optimality criterion, called as r-score, to determine the required training set. The r-score criterion is derived directly from Pearson’s correlation between GEBVs and phenotypic values of the test set. The proposed method is shown to be advantageous over existing ones, mainly because that it fully uses the genomic relationship between the test set and the training set by taking into account both the variance and bias for predicting the GEBVs. By applying the logistic growth curve to draw a connection between r-score and the training set size, a practical approach is proposed to determine the sample size of the optimal training set. Some real genome datasets are used to illustrate the proposed approach.