Title: | Reconstruction of Causal Networks for Data with Random Genetic Effects |
---|---|
Description: | Implements the pcgen algorithm, which is a modified version of the standard pc-algorithm, with specific conditional independence tests and modified orientation rules. pcgen extends the approach of Valente et al. (2010) <doi:10.1534/genetics.109.112979> with reconstruction of direct genetic effects. |
Authors: | Willem Kruijer, Pariya Behrouzi, Maria Xose Rodriguez-Alvarez |
Maintainer: | Pariya Behrouzi <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2025-01-29 05:10:48 UTC |
Source: | https://github.com/cran/pcgen |
Given output from pcgen or pcgenFast, this function checks whether the estimated graph is consistent with the set of traits having significant genetic variance. The function detects traits that have significant genetic variance but for which there is no partially directed path from G.
checkG(pcgen.output, suffStat, alpha = 0.01, covariates = NULL)
checkG(pcgen.output, suffStat, alpha = 0.01, covariates = NULL)
pcgen.output |
A graph with nodes G (genotype) and a number of traits. Typically output from |
suffStat |
A data.frame, of which the first column is the factor G (genotype), and subsequent columns contain the traits, and optionally some QTLs. The name of the first column should be G. |
alpha |
The significance level used in each conditional independence test. Default is 0.01. |
covariates |
A data.frame containing covariates, to be used in each conditional independence test.
Cannot contain factors. Should be either |
A logical matrix of dimension ,
being the number of traits. Most entries are
FALSE
, except those in the first row and column for which there are conflicts. Entries and
are
TRUE
if the th trait has significant genetic variance, but there is no partially directed path from G towards that trait. The matrix can then be used in a subsequent run of
pcgen
or pcgenFast
, in the fixedEdges
argument. The arguments suffStat
, alpha
and covariates
should stay the same throughout (first run of pcgen
, checkG
, second run of pcgen
).
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
Kruijer, W., Behrouzi, P., Rodriguez-Alvarez, M. X., Wit, E. C., Mahmoudi, S. M., Yandell, B., Van Eeuwijk, F., (2018, in preparation), Reconstruction of networks with direct and indirect genetic effects.
For each pair of traits in suffStat, we fit a bivariate mixed model, and perform a likelihood ratio test for the null-hypothesis of zero genetic covariance.
gencovTest(suffStat, max.iter = 200, out.cor = TRUE)
gencovTest(suffStat, max.iter = 200, out.cor = TRUE)
suffStat |
A data.frame with (p + 1) columns, of which the first column is the factor G (genotype), and subsequent p columns contain traits. It should not contain covariates or QTLs. |
max.iter |
Maximum number of iterations in the EM-algorithm, used to fit the bivariate mixed model |
.
out.cor |
If |
A list with elements pvalues and out.cor, which are both p x p matrices
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
Kruijer, W., Behrouzi, P., Rodriguez-Alvarez, M. X., Wit, E. C., Mahmoudi, S. M., Yandell, B., Van Eeuwijk, F., (2018, in preparation), Reconstruction of networks with direct and indirect genetic effects.
data(simdata) test <- gencovTest(suffStat= simdata, max.iter = 200, out.cor= TRUE )
data(simdata) test <- gencovTest(suffStat= simdata, max.iter = 200, out.cor= TRUE )
Residuals from the best linear unbiased predictor of the genetic effects (GBLUP), which is computed given REML-estimates of the variance components.
getResiduals(suffStat, covariates = NULL, cov.method = "uni", K = NULL)
getResiduals(suffStat, covariates = NULL, cov.method = "uni", K = NULL)
suffStat |
A data.frame, of which the first column is the factor G (genotype), and subsequent columns contain the traits. The name of the first column should be G. |
covariates |
A data.frame containing covariates, that should always be used in each conditional independence test. Should be either |
cov.method |
(A string, specifying which method should be used to compute the GBLUP. Options are |
K |
A genetic relatedness matrix. If |
If cov.method = "uni"
, the GBLUP and the residuals are computed separately for each trait in suffStat. The covariance of each trait is then assumed to be
where is a binary incidence matrix, assigning plants or plots to genotypes.
is based on the first column in
suffStat
. If there is a single observation per genotype (typically a genotypic mean), is the identity matrix, and the relatedness matrix
should be specified. If there are replicates for at least some of the genotypes, and no
is provided, independent genetic effects are assumed (
will be the identity matrix). It is also possible to have replicates and specify a non-diagonal
.
Whenever
is specified, sommer (mmer2) will be used; otherwise lmer (lme4). The mmer2 is also used when
cov.method = "us"
, in which case the multivariate GBLUP is computed, for all traits in suffStat
simultaneously. This is only possible for a limited number of traits.
A data-frame with the residuals.
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
Covarrubias-Pazaran, G., 2016. Genome-assisted prediction of quantitative traits using the R package sommer. PloS one, 11(6), p.e0156744.
data(simdata) rs <- getResiduals(suffStat= simdata)
data(simdata) rs <- getResiduals(suffStat= simdata)
Reconstruction of directed networks with random genetic effects, based on phenotypic observations. The pcgen algorithm is a modification of the pc-stable algorithm of Colombo & Maathuis (2014) . It is assumed that there are replicates, and independent genetic effects.
pcgen(suffStat, covariates = NULL, QTLs = integer(), alpha = 0.01, m.max = Inf, fixedEdges = NULL, fixedGaps = NULL, verbose = FALSE, use.res = FALSE, res.cor = NULL, max.iter = 50, stop.if.significant = TRUE, return.pvalues = FALSE)
pcgen(suffStat, covariates = NULL, QTLs = integer(), alpha = 0.01, m.max = Inf, fixedEdges = NULL, fixedGaps = NULL, verbose = FALSE, use.res = FALSE, res.cor = NULL, max.iter = 50, stop.if.significant = TRUE, return.pvalues = FALSE)
suffStat |
A data.frame, of which the first column is the factor G (genotype), and subsequent columns contain the traits, and optionally some QTLs. The name of the first column should be G. Should not contain covariates. |
covariates |
A data.frame containing covariates, that should always be used in each conditional independence test Should be either |
QTLs |
Column numbers in |
alpha |
The significance level used in each conditional independence test. Default is 0.01 |
m.max |
Maximum size of the conditioning sets |
fixedEdges |
A logical matrix of dimension |
fixedGaps |
A logical matrix of dimension |
verbose |
If |
use.res |
If |
res.cor |
If |
max.iter |
Maximum number of iterations in the EM-algorithm, used to fit the bivariate mixed model (when |
stop.if.significant |
If |
return.pvalues |
If |
The pcgen
function is based on the pc
function from the pcalg package (Kalisch et al. (2012) and Hauser and Buhlmann (2012)).
If return.pvalues = FALSE
, the output is a graph (an object with S3 class "pcgen"
). If return.pvalues = TRUE
, the output is a list with elements gr
(the graph) and pMax
(a matrix with the p-values).
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
1. Kruijer, W., Behrouzi, P., Rodriguez-Alvarez, M. X., Wit, E. C., Mahmoudi, S. M., Yandell, B., Van Eeuwijk, F., (2018, in preparation), Reconstruction of networks with direct and indirect genetic effects.
2. Colombo, D. and Maathuis, M.H., 2014. Order-independent constraint-based causal structure learning. The Journal of Machine Learning Research, 15(1), pp.3741-3782.
3. Kalisch, M., Machler, M., Colombo, D., Maathuis, M.H. and Buhlmann, P., 2012. Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11), pp.1-26.
4. Hauser, A. and Buhlmann, P., 2012. Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research, 13(Aug), pp.2409-2464.
data(simdata) out <- pcgen(simdata) data(simdata) rs <- getResiduals(suffStat = simdata) pc.fit1 <- pcgen(suffStat = simdata, alpha = 0.01, verbose = TRUE, use.res = TRUE, res.cor = cor(rs))
data(simdata) out <- pcgen(simdata) data(simdata) rs <- getResiduals(suffStat = simdata) pc.fit1 <- pcgen(suffStat = simdata, alpha = 0.01, verbose = TRUE, use.res = TRUE, res.cor = cor(rs))
The pcgen algorithm starting with a skeleton estimated using the standard pc-algorithm, based on residuals from the GBLUP.
pcgenFast(suffStat, alpha = 0.01, m.max = Inf, res.m.max = Inf, verbose = FALSE, covariates = NULL, fixedEdges = NULL, QTLs = integer(), max.iter = 50, stop.if.significant = TRUE, cov.method = 'uni', use.res = FALSE, return.pvalues = FALSE)
pcgenFast(suffStat, alpha = 0.01, m.max = Inf, res.m.max = Inf, verbose = FALSE, covariates = NULL, fixedEdges = NULL, QTLs = integer(), max.iter = 50, stop.if.significant = TRUE, cov.method = 'uni', use.res = FALSE, return.pvalues = FALSE)
suffStat |
A data.frame, of which the first column is the factor G (genotype), and subsequent columns contain the traits, and optionally some QTLs. The name of the first column should be G. |
alpha |
The significance level used in each conditional independence test. Default is 0.01. |
m.max |
Maximum size of the conditioning set, in the pcgen algorithm. |
res.m.max |
Maximum size of the conditioning set, in the pc-algorithm on the residuals (used for prior screening). |
verbose |
If |
covariates |
A data.frame containing covariates, to be used in each conditional independence test. Cannot contain factors. Should be either |
fixedEdges |
A logical matrix of dimension |
QTLs |
Column numbers in |
max.iter |
Maximum number of iterations in the EM-algorithm, used to fit the bivariate mixed model (when |
stop.if.significant |
If |
cov.method |
A string, specifying which method should be used to compute the GBLUP. Options are |
use.res |
If |
return.pvalues |
If |
If return.pvalues = FALSE
, the output is a graph (an object with S3 class "pcgen"
). If return.pvalues = TRUE
, the output is a list with elements gr
(the graph) and pMax
(a matrix with the p-values).
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
1. Kruijer, W., Behrouzi, P., Rodriguez-Alvarez, M. X., Wit, E. C., Mahmoudi, S. M., Yandell, B., Van Eeuwijk, F., (2018, in preparation), Reconstruction of networks with direct and indirect genetic effects.
2. Colombo, D. and Maathuis, M.H., 2014. Order-independent constraint-based causal structure learning. The Journal of Machine Learning Research, 15(1), pp.3741-3782.
data(simdata) out <- pcgenFast(suffStat = simdata, alpha = 0.01, verbose= FALSE, use.res = TRUE)
data(simdata) out <- pcgenFast(suffStat = simdata, alpha = 0.01, verbose= FALSE, use.res = TRUE)
This performs the conditional independence test used in the pcgen algorithm, assuming there are replicates, and independent genetic effects.
pcgenTest(x, y, S, suffStat, QTLs = integer(), covariates = NULL, alpha = 0.01, max.iter = 50, stop.if.significant = TRUE, use.res = FALSE, res.cor = NULL)
pcgenTest(x, y, S, suffStat, QTLs = integer(), covariates = NULL, alpha = 0.01, max.iter = 50, stop.if.significant = TRUE, use.res = FALSE, res.cor = NULL)
x , y
|
Column numbers in |
S |
vector of integers defining the conditioning set, where the integers refer to column numbers in |
suffStat |
A data.frame, of which the first column is the factor G(genotype), and subsequent columns contain the traits, and optionally some QTLs. The name of the first column should be G. It should not contain covariates. |
QTLs |
Column numbers in |
covariates |
A data.frame containing covariates. It should be either |
alpha |
The significance level used in the test. The test itself of course does not depend on this, but it is used in the EM-algorithm to speed up calculations. When |
max.iter |
Maximum number of iterations in the EM-algorithm, used to fit the bivariate mixed model (when |
stop.if.significant |
If |
use.res |
If |
res.cor |
If |
pcgenTest
tests for conditional independence between and
given
.
It distinguishes 2 situations:
(i) if one of
and
(say
) is the factor G,
pcgenTest
will test if the genetic variance in is zero, given the traits in S. (ii) if
and
are both traits,
pcgenTest
tests if the residual covariance between them is zero, given the traits in and the factor G. The factor G is automatically included in the conditioning set
(
does not need to contain the integer 1). This test is either based on a bivariate mixed model (when
use.res=FALSE
), or on residuals from GBLUP (use.res=T
), obtained with the getResiduals function. In the latter case, res.cor
must be provided.
A p-value
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
Kruijer, W., Behrouzi, P., Rodriguez-Alvarez, M. X., Wit, E. C., Mahmoudi, S. M., Yandell, B., Van Eeuwijk, F., (2018, in preparation), Reconstruction of networks with direct and indirect genetic effects.
data(simdata) rs <- getResiduals(suffStat= simdata) pcgenTest(suffStat= simdata, x= 2, y= 3, S= 4) pcgenTest(suffStat= simdata, x= 2, y= 3, S= c(1,4)) pcgenTest(suffStat= simdata, x= 2, y= 3, S= 4, use.res= TRUE, res.cor= cor(rs)) pcgenTest(suffStat= simdata, x= 2, y= 1, S= 4)
data(simdata) rs <- getResiduals(suffStat= simdata) pcgenTest(suffStat= simdata, x= 2, y= 3, S= 4) pcgenTest(suffStat= simdata, x= 2, y= 3, S= c(1,4)) pcgenTest(suffStat= simdata, x= 2, y= 3, S= 4, use.res= TRUE, res.cor= cor(rs)) pcgenTest(suffStat= simdata, x= 2, y= 1, S= 4)
The standard pc algorithm applied to GBLUP residuals, or to the GBLUP itself.
pcRes(suffStat, alpha= 0.01, K = NULL, m.max = Inf, verbose = FALSE, covariates = NULL, QTLs = integer(), cov.method = "uni", use.GBLUP = FALSE, return.pvalues = FALSE)
pcRes(suffStat, alpha= 0.01, K = NULL, m.max = Inf, verbose = FALSE, covariates = NULL, QTLs = integer(), cov.method = "uni", use.GBLUP = FALSE, return.pvalues = FALSE)
suffStat |
A data.frame, of which the first column is the factor G (genotype), and subsequent columns contain the traits, and optionally some QTLs. The name of the first column should be G. |
alpha |
The significance level used in the test. Default is 0.01. |
K |
A genetic relatedness matrix. If |
m.max |
Maximum size of the conditioning set, in the pc-algorithm on the residuals. |
verbose |
If |
covariates |
A data.frame containing covariates, that should always be used in each conditional independence test. Should be either |
QTLs |
Column numbers in suffStat that correspond to QTLs. |
cov.method |
A string, specifying which method should be used to compute the GBLUP. Options are |
use.GBLUP |
Use the GBLUP itself, instead of the residuals |
return.pvalues |
If |
If use.GBLUP = FALSE
, GBLUP residuals are used as input for the pc-stable algorithm of Colombo and Maathuis (2014). This closely resembles the residual networks of Valente et al., (2010) and Topner et al., (2017) (who used different ways to predict the genetic effects, and applied other causal inference algorithms to the residuals). When use.GBLUP = TRUE
, pc-stable is applied to the GBLUP itself, which resembles the genomic networks of Topner et al., (2017). If cov.method = "uni"
, the GBLUP and the residuals are computed separately for each trait in suffStat. The covariance of each trait is assumed to be
where is a binary incidence matrix, assigning plants or plots to genotypes.
is based on the first column in suffStat. If there is a single observation per genotype (typically a genotypic mean),
is the identity matrix, and the relatedness matrix
should be specified. If there are replicates
for at least some of the genotypes, and no
is provided, independent genetic effects are assumed (
will be the identity matrix). It is also possible to have replicates and specify a non-diagonal
. Whenever
is specified, sommer (mmer2) will be used; otherwise lmer (lme4).
mmer2 is also used when
cov.method = "us"
, in which case the multivariate GBLUP is computed, for all traits in suffStat simultaneously. This is only possible for a limited number of traits.
If return.pvalues = FALSE
, the output is a graph (an object with S3 class "pcgen"
). If return.pvalues = TRUE
, the output is a list with elements gr
(the graph) and pMax
(a matrix with the p-values).
Willem Kruijer and Pariya Behrouzi. Maintainers: Willem Kruijer [email protected] and Pariya Behrouzi [email protected]
1. Colombo, D. and Maathuis, M.H., 2014. Order-independent constraint-based causal structure learning. The Journal of Machine Learning Research, 15(1), pp.3741-3782.
2. Kruijer, W., Behrouzi, P., Rodriguez-Alvarez, M. X., Wit, E. C., Mahmoudi, S. M., Yandell, B., Van Eeuwijk, F., (2018, in preparation), Reconstruction of networks with direct and indirect genetic effects.
3. Topner, K., Rosa, G.J., Gianola, D. and Schon, C.C., 2017. Bayesian Networks Illustrate Genomic and Residual Trait Connections in Maize (Zea mays L.). G3: Genes, Genomes, Genetics, pp.g3-117.
4. Valente, B.D., Rosa, G.J., Gustavo, A., Gianola, D. and Silva, M.A., 2010. Searching for recursive causal structures in multivariate quantitative genetics mixed models. Genetics.
data(simdata) out <- pcRes(suffStat = simdata, alpha = 0.01, verbose= FALSE)
data(simdata) out <- pcRes(suffStat = simdata, alpha = 0.01, verbose= FALSE)
Simulated data, for two replicates of genotypes g1,...,g200. Three traits were simulated (Y1, Y2 and Y3), using a structural equation model defined by Y1 -> Y2 -> Y3, and direct genetic effects on Y1 and Y3.
data(simdata)
data(simdata)
A data frame of dimension . The first column is the factor G (genotype); the subsequent columns contain
and
.
data(simdata) out <- pcgen(simdata) out2 <- pcRes(suffStat = simdata, alpha = 0.01, verbose= FALSE)
data(simdata) out <- pcgen(simdata) out2 <- pcRes(suffStat = simdata, alpha = 0.01, verbose= FALSE)