Input/Output/Program flow

 

  • ONETOOL performs the pedigree information (PEDINFO) analysis when no options are specified except two input files.
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf
  • Using PLINK binary files instead of .vcf file
  • onetool --fam test_miss0.fam --bed test_miss0.bed --bim test_miss0.bim onetool --bfile test_miss0
  • Analysing a trait from the alternative phenotype file
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp
  • Including covariate(s) from alternative phenotype file
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --cname t2d onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --cname t2d,weight
  • Using a script file including all command-line options
  • onetool --script test.txt
  • Specifying a root name for all output files of a run
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --out tout
  • Pedigree plot generation (only available in R-linked ONETOOL)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --plot
  • When sex or case/control status is coded as (0=control; 1=case) instead of (1=control; 2=case)
  • onetool --fam test_miss0_1sex.fam --vcf test_miss0.vcf --1sex onetool --fam test_miss0_1case.fam --vcf test_miss0.vcf --1case onetool --fam test_miss0_1sex1case.fam --vcf test_miss0.vcf --1sex --1case
  • Specifying a genetic model (additive, dominant, recessive, or multiplicative)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --model dominant
  • Specifying a number of thread to use a run
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --thread 5

Important checkpoints

 

  • Before-analysis checkpoints
    • Is the dataset compatible with ONETOOL?
    • What kind of analysis needs to be performed?
    • Is the dataset contains related individuals or unrelated individuals?
    • Is the analysis requires genotype / phenotype / both?
    • What kind of phenotype needs to be analyzed? Continuous or binary?
    • Are there some additional variables for adjustment? (=covariate adjustment)
    • What options should be specified to perform the analysis?
    • If the analysis requires additional input, are they formatted in ONETOOL's required format?
    • Is the output prefix (--out) assigned?
    • Is the output path exists?
  • After-analysis checkpoints
    • Is ONETOOL finished without any error message?
    • Is there any message that starts "WARN" in the log file?
    • Is the analysis result generated with the output prefix?

InfoQC analysis

 

  • Computing pricipal components
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pca --npc 10
  • Computing the relationship matrix using genetic relationship matrix (GRM)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --makecor
  • Computing the relationship matrix using kinship matrix
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --makecor --kinship
  • Computing the relationship matrix using identity-by-state (IBS) matrix
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --makecor --ibs
  • Plotting pedigrees
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --plot --out tout

  • What is "individual relatedness" in ONETOOL?
  • Individual relatedness in ONETOOL specfies how individuals in the dataset is related each other. It is denoted by a N*N symmetric matrix, where N is the number of individuals in the dataset. Each element in relationship matrix indicates the genetic similarity between individuals. For genetic analysis, relationship matrix is utilized to parameterize the phenotypic correlation and if it is misspeified, type-1 error, type-2 erros or both cannot be appropriately controlled. Incorporation of different relationship matrix to the same statistic can have different meaning and thus it should be carefully selected. The following statistical analyses can be conducted after the relationship matrix is chosen:
    • Heritability estimation
    • Estimation of phenotypic variance attributable to the observed genotypes
    • Association analysis under population stratification
    • Family-based association analysis
    ONETOOL automatically calculates the individual relatedness in several ways. In default, Genetic relationship matrix (GRM) is automatically calculated and used for individual realtedness, unless other individual realtedness is explicitly specified.
  • Using kinship coefficient as individual relatedness and generate the result
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --kinship --makecor
  • Using IBS as individual relatedness
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --ibs [other_options...]
  • Using user-defined individual relatedness
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --cor user_own_indiv_rel.txt [other_options...]

Trait analysis

 

  • Computing familial correlations of a trait from the phenotype file
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --fcor
  • Don't compute the standard errors of familial correlations to decrease the running time
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --fcor --fcorStdErrOff
  • Computing heritability using GRM
  • If there are family-based samples, the estimates becomes heritability and if there are only independent samples (grm between any pairs of subjects are less than 0.05), the estimates indicate the SNP-based heritability (Yang et al, Am J Hum Genet, 2011: GCTA) onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --heritability
  • Computing heritability using kinship matrix
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --heritability --kinship
  • Computing heritability using ibs matrix
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --heritability --ibs
  • Running a comingling analysis
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --segreg
  • Running a segregation analysis
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --segreg --par segreg.par

Linkage analysis

 

  • Running model-based two-point linkage analysis (LODLINK), need to specify the name of type probability file from segreg analysis
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --lodlink --typ test_segreg.typ
  • Perform the linkage test using sex-specific recombination fraction
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --lodlink --typ test_segreg.typ --lodlinkLinkageSexSpecific
  • Running model-free multipoint linkage analysis (MERLIN), need to specify the name of map file
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --merlin --map test_miss0_alt.map

Association analysis, common variant (for independent samples)

 

  • Simple logistic regression analysis
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --regression
  • Simple linear regression analysis
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --regression
  • Logistic regression analysis with covariates
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --cname weight,sbp --regression
  • Linear regression analysis with covariates
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --cname weight,t2d --regression

Association analysis, common variant (for related/family samples)

 

  • MQLS test for a binary trait, need to specify the prevalence value (See Thornton et al., 2007 AJHG for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --kinship --mqls --prevalence 0.1 onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --kinship --mqls --prevalence 0.1
  • FQLS test for a binary trait, need to specify the prevalence and heritability values (See Park et al., 2015 BMC Med Genet for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --kinship --fqls --prevalence 0.1 --heri 0.4
  • FQLS test for a continuous trait, need to specify the heritability value (See Park et al., 2015 BMC Med Genet for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --kinship --fqls --heri 0.4
  • GEMMA test for a continuous trait (See Zhou et al., 2012 Nat Genet for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --gemma
  • Multi-FQLS test for both binary/continuous trait (See Won et al., 2016, BMC Bioinfo for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --set test_gene.txt --pname sbp,dbp --multifqls

Association analysis, rare variants (gene-based, for independent samples)

 

In order to apply ONETOOL to an analysis of unrelated (or independent) samples, or treat the dataset as the dataset of unrelated samples, use --indep option.

  • Running ONETOOL default gene-based association test (CMC and Collapsing tests) with gene-wise summary output
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --genetest --genesummary --indep --set test_gene.txt
  • SKAT for a continuous trait (See Wu et al, 2011 AJHG for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --genetest --skat --indep --set test_gene.txt
  • SKAT for a continuous trait with covariates (See Wu et al, 2011 AJHG for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --cname height --genetest --skato --indep --set test_gene.txt
  • VT for binary trait (See Price et al., 2010 AJHG for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --indep --pheno test_miss0_phen.txt --pname t2d --genetest --vt --set test_gene.txt
  • KBAC for binary trait (See Liu et al., 2010 PLoS Genet for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --indep --pheno test_miss0_phen.txt --pname t2d --genetest --kbac --set test_gene.txt

Association analysis, rare variants (gene-based, for related/family samples)

 

  • Running ONETOOL default gene-based association test with gene-wise summary output
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --genetest --genesummary --set test_gene.txt
  • PEDCMC for a binary trait (See Zhu and Xiong, 2012 Am J Hum Genet for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --genetest --pedcmc --set test_gene.txt
  • FARVAT for a continuous trait (See Choi et al, 2014 Bioinformatics for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --genetest --farvat --set test_gene.txt
  • FARVAT for a continuous trait with covariates (See Choi et al, 2014 Bioinformatics for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --cname height,t2d --genetest --farvat --set test_gene.txt
  • Multi-FQLS test for multiple SNPs (See Won et al., 2016, BMC Bioinfo for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --set test_gene.txt --pname weight,height --multifqls
  • mFARVAT for both binary/continuous trait (See Wang et al., 2016 Genet Epidemiol for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname t2d --genetest --mfarvat --set test_gene.txt
  • FARVAT-X for association analysis of X-linked SNVs (See Choi et al., 2016 Genet Epidemiol for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname height --genetest --farvatx --set test_gene.txt
  • FB-SKAT for association analysis of binary trait (See Ionita-Laza et al., 2013 EJHG for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --fbskat --set test_gene.txt
  • rv-TDT for association analysis of binary trait (See He et al., 2014 AJHG for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --rvtdt --set test_gene.txt

Epistasis analysis, common variant (for independent samples)

 

  • Running ONETOOL two-order exhaustive epistasis test on a binary phenotype, using Multidimensional Dimensionality Reduction (MDR) (See Ritchie et al., AJHG 2001 for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --mdr --order 2
  • Two-way exhaustive MDR on a binary phenotype, reports only top 10 results based on balanced accuracy (BA)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --mdr --order 2 --top 10
  • Two-way exhaustive Generalized MDR (GMDR), reports only top 10 results (See Lou et al., AJHG 2007 for detailed methods)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --gmdr --order 2 --top 10
  • Two-way exhaustive GMDR, adjusted with covariate, continuous phenotype
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --gmdr --order 2 --top 10 --pheno test_miss0_phen.txt --pname height --cname sbp,dbp

Data management (sample filtering)

 

  • ONETOOL can filtering or selecting samples from the specified input files.
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf [filtering options...]
  • Removing individual `SAMP7_2` and `SAMP8_2`
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --remsamp SAMP7_2,SAMP8_2
  • Selecting individuals only listed in `test_sample_list.txt`
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --selsamp test_sample_list.txt
  • Selecting samples included in the family `FAM_1`, `FAM_4` or `FAM_7`
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --selfam FAM_1,FAM_4,FAM_7
  • Removing individuals whose FID is listed in `test_family_list.txt`
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --remfam test_family_list.txt
  • Removing samples of their genotype caling rate is under 90%
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filgind "<0.9"
  • Selecting samples of their genotype calling rate is >=90% and <99%
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filgind "[0.9,0.99)"
  • Gender-based filtering
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmale onetool --fam test_miss0.fam --vcf test_miss0.vcf --filfemale onetool --fam test_miss0.fam --vcf test_miss0.vcf --filnosex
  • Phenotype-based filtering
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmispheno onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname sbp --filmispheno onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --mispheno NA --pname sbp --filmispheno onetool --fam test_miss0.fam --vcf test_miss0.vcf --filcase onetool --fam test_miss0.fam --vcf test_miss0.vcf --filcontrol onetool --fam test_miss0.fam --vcf test_miss0.vcf --pheno test_miss0_phen.txt --pname medi01 --1case --filcontrol
  • Randomly selecting 80% of samples
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --sampresize 0.8
  • Randomly selecting eleven of samples
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --sampresize 11
  • Removing/selecting samples by Mendelian error rate
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmendelfam ">=0.5" onetool --fam test_miss0.fam --vcf test_miss0.vcf --incmendelfam "(0.1,0.25]" onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmendelsamp "<0.5" onetool --fam test_miss0.fam --vcf test_miss0.vcf --incmendelsamp ">0.1"
  • Remove nonfounders
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filnf
  • Remove missing founders (individuals only shown in the paternal/maternal relationship)
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmf

Data management (variant filtering)

 

  • ONETOOL can filtering or selecting variants from the specified input files.
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf [filtering options...]
  • Removing/selecting variants with minor allele
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filfreq "<0.05" onetool --fam test_miss0.fam --vcf test_miss0.vcf --incfreq "[0.05,0.5]" onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmac "<2" onetool --fam test_miss0.fam --vcf test_miss0.vcf --incmac "[10,100)"
  • Removing/selecting variants with Hardy-Weinberg Equilibrium test
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filhwe "<1e-7" onetool --fam test_miss0.fam --vcf test_miss0.vcf --inchwe "(0.05,1]"
  • Selecting SNVs/indels from dataset
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --snvonly onetool --fam test_miss0.fam --vcf test_miss0.vcf --indelonly
  • VCF-specific removing/selecting variants conditions
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --vcfqc onetool --fam test_miss0.fam --vcf test_miss0.vcf --incfreq ">=50" onetool --fam test_miss0.fam --vcf test_miss0.vcf --filfreq "[0,30)" onetool --fam test_miss0.fam --vcf test_miss0.vcf --phasedonly onetool --fam test_miss0.fam --vcf test_miss0.vcf --unphasedonly
  • Removing variants listed in 'remlist_variant.txt'
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --remvariant remlist_variant.txt onetool --fam test_miss0.fam --vcf test_miss0.vcf --autoonly
  • Removing variants of their genotype caling rate is under 90%
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filgvar "<0.9"
  • Selecting variants of its genotype calling rate is <10% and >=50%
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --incgvar "(0.1,0.5]"
  • Do an analysis after removing variants if the p-value of test < 0.05
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmistest "<0.05"
  • Selecting variants if the p-value of test > 0.05
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --filmistest "(0.05,1]"
  • Selecting variants `rs8385` and `rs93851`
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --varresize 0.1
  • Randomly select one thousand of variants from dataset
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --varresize 1000
  • Selecting a subset of chromosomes from analysis
  • onetool --fam test_miss0.fam --vcf test_miss0.vcf --chr 1-10 onetool --fam test_miss0.fam --vcf test_miss0.vcf --chr 3,5-8,X onetool --fam test_miss0.fam --vcf test_miss0.vcf --sexonly

Data management (LD pruning)

 

  • LD-based pruning and filtering

  • When the dataset is generated by chip-based technology, it may yields a biased distribution of genetic variants. In such situation, some genetic analysis such as calculation of Genetic Relationship Matrix (GRM) can be affected. LD-based pruning can be applied in this situation, by removing variants with high correlation or high variance inflation factor (VIF).
    • 1-1. Perform variant pruning by r2. The below command prunes variants with r^2>0.8 with 100bp of window and 50bp step size. onetool --fam test_miss0.fam --vcf test_miss0.vcf --prunepw 100,50,0.8 --out prune_res
    • 1-2. Perform variant pruning by VIF. The below command prunes variants with VIF>3 with 100bp of window and 50bp step size. onetool --fam test_miss0.fam --vcf test_miss0.vcf --prunevif 100,50,3 --out prune_res
    • 2. Prune out the variants from the analysis onetool --fam test_miss0.fam --vcf test_miss0.vcf --selvariant prune_res.prune.in.lst