Title: | Network-Based Genome Wide Association Studies |
---|---|
Description: | A multi-core R package that contains a set of tools based on copula graphical models for accomplishing the three interrelated goals in genetics and genomics in an unified way: (1) linkage map construction, (2) constructing linkage disequilibrium networks, and (3) exploring high-dimensional genotype-phenotype network and genotype- phenotype-environment interactions networks. The 'netgwas' package can deal with biparental inbreeding and outbreeding species with any ploidy level, namely diploid (2 sets of chromosomes), triploid (3 sets of chromosomes), tetraploid (4 sets of chromosomes) and so on. We target on high-dimensional data where number of variables p is considerably larger than number of sample sizes (p >> n). The computations is memory-optimized using the sparse matrix output. The 'netgwas' implements the methodological developments in Behrouzi and Wit (2017) <doi:10.1111/rssc.12287> and Behrouzi and Wit (2017) <doi:10.1093/bioinformatics/bty777>. |
Authors: | Pariya Behrouzi [aut, cre]
|
Maintainer: | Pariya Behrouzi <[email protected]> |
License: | GPL-3 |
Version: | 1.14.3 |
Built: | 2025-02-17 05:15:54 UTC |
Source: | https://github.com/cran/netgwas |
The R
package netgwas provides a set of tools based
on undirected graphical models for accomplishing three important
and interrelated goals in genetics: (1) linkage map construction,
(2) reconstructing intra- and inter-chromosomal conditional
interactions (linkage disequilibrium) networks, and (3) exploring
high-dimensional genotype-phenotype network and genotype-phenotype-
environment interactions network. The netgwas can deal with biparental
species with any ploidy level.
The package implemented the recent improvements both for construction
of linkage maps in diploid and polyploid species in Behrouzi and Wit(2017b),
and in reconstructing networks for non-Gaussian data, ordinal data, and
mixed continuous and discrete data in Behrouzi and Wit (2017a). One
application is to uncover epistatic interactions network, where the network
captures the conditionally dependent short- and long-range linkage disequilibrium
structure of a genomes and reveals aberrant marker-marker associations.
In addition, Behrouzi and Wit(2017c) implemented their proposed method to explore
genotype-phenotype networks where nodes are either phenotypes or genotypes, and each
phenotype is connected by an edge to a genotype or a group of genotypes if
there is a direct association between them, given the rest of the variables.
Different phenotypes may also interconnect. The conditionally dependent relationships
between markers on a genome and phenotypes is determined through Gaussian copula graphical model.
We remark that environmental variables can also be included along with genotype-phenotype
input data to reconstruct networks between genotypes, phenotypes, and environment
variables. Beside, the package contains functions for simulation and visualization,
as well as three multivariate datasets taken from literature.
Pariya Behrouzi and Ernst C. Wit
Maintainers: Pariya Behrouzi [email protected]
1. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
2. Behrouzi, P., and Wit, E. C. (2018). De novo construction of polyploid linkage maps using discrete graphical models. Bioinformatics.
3. Behrouzi, P., Arends, D., and Wit, E. C. (2023). netgwas: An R Package for Network-Based Genome-Wide Association Studies. The R journal, 14(4), 18-37.
## Not run: install.packages("netgwas") library(netgwas) ## End(Not run)
## Not run: install.packages("netgwas") library(netgwas) ## End(Not run)
Implements different algorithms for detecting linkage groups and ordering markers in each linkage group.
buildMap( res, opt.index, min.m = NULL, use.comu = FALSE)
buildMap( res, opt.index, min.m = NULL, use.comu = FALSE)
res |
An object with S3 class "netgwasmap" |
opt.index |
An index of a desired regularization parameter. |
min.m |
Expected minimum number of markers in a chromosome. Optional |
use.comu |
Using community detection algorithm to detect linkage groups. Default is FALSE. |
This function determines linkage groups and order markers within each linkage group for class "netgwasmap".
An object with S3 class "netgwasmap"
is returned:
map |
Constructed linkage map associated with |
opt.index |
The index of a desired 3-D map to construct linkage map. |
cross |
The specified cross type by user. |
allres |
A list containing results for different regularization parameter. Belongs to class "netgwas". To visualize a path of different 3D maps consider function |
man |
stays TRUE. |
Pariya Behrouzi and Ernst C.Wit
Maintainer: Pariya Behrouzi [email protected]
1. Behrouzi, P., and Wit, E. C. (2018). De novo construction of polyploid linkage maps using discrete graphical models. Bioinformatics.
2. Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
## Not run: data(CviCol) #Randomly change the order of markers across the genome cvicol <- CviCol[ ,sample(ncol(CviCol))] #Constructing linkage map for Cvi x Col genotype data out <- netmap(cvicol, cross= "inbred", ncores=1); out plot(out) map <- out$map; map #Visualization of other networks plot(out$allres) #Constructing a linkage map for 5th network bm <- buildMap(out, opt.index=5); bm plot(bm, vis= "summary") #or plot(bm, vis= "interactive", label.vertex="all") ## End(Not run)
## Not run: data(CviCol) #Randomly change the order of markers across the genome cvicol <- CviCol[ ,sample(ncol(CviCol))] #Constructing linkage map for Cvi x Col genotype data out <- netmap(cvicol, cross= "inbred", ncores=1); out plot(out) map <- out$map; map #Visualization of other networks plot(out$allres) #Constructing a linkage map for 5th network bm <- buildMap(out, opt.index=5); bm plot(bm, vis= "summary") #or plot(bm, vis= "interactive", label.vertex="all") ## End(Not run)
Calculation of genetic map distances for an estimated markers order from either net.map
or buildMap
functions. This function is only for diploid populations. We note that the output of net.map
and buildMap
functions include estimated linkage groups and estimated markers order within each linkage group.
cal.pos (netgwasmap, pop.type= NULL , map.func = "haldane", chr )
cal.pos (netgwasmap, pop.type= NULL , map.func = "haldane", chr )
netgwasmap |
A |
pop.type |
Character string specifying the population type of the genotype data. Accepted values are "DH" (doubled haploid), "BC" (backcross), "RILn" (non-advanced RIL population with n generations of selfing) and "ARIL" (advanced RIL) (see Details). |
map.func |
Character string defining the distance function used for calculation of genetic distances. Options are "kosambi", "haldane", and "morgan". Default is "haldane". |
chr |
A character string of linkage group names that require calculating of their genetic map distances. |
In qtl package, the genotype data for a backcross is coded as NA = missing, 1 = AA, 2 = AB. For an F2 intercross, the coding is NA = missing, 1 = AA, 2 = AB, 3 = BB, 4 = not BB (i.e. AA or AB), 5 = not AA (i.e. AB or BB).
If pop.typ = "RILn"
the number of generations of selfing is limited to 20 to ensure sensible input. The constructed object is returned as a R/qtl cross
object with the appropriate class structure. For "RILn"
populations the constructed object is given the class "bcsft"
by using the qtl package conversion function convert2bcsft
with arguments F.gen = n
and BC.gen = 0
. For "ARIL"
populations the constructed object is given the class "riself"
.
This function uses the Viterbi algorithm implemented in argmax.geno
of the qtl package to estimate genetic distances. Initial conservative estimates of the map distances are calculated from inverting recombination fractions outputted from est.rf
. These are then passed to argmax.geno
and imputation of missing allele scores is performed along with re-estimation of map distances. This is an adapted version of quickEst
function from ASMap package.
The netgwas
constructed linkage map is returned as a R/qtl
cross object. The object is a list with usual components "pheno"
and "geno"
.
geno |
The |
pheno |
Character string containing the genotype names. |
Pariya Behrouzi
Maintainer: Pariya Behrouzi [email protected]
## Not run: sim <- simRIL(d=25, n=200, g=5, cM=100, selfing= 2) # to use the same genotyping coding as qtl package (See details) sim$data <- (sim$data) + 1 #Estimate linkage groups and order markers within each LG out <- netmap(sim$data, cross = "inbred") map <- out$map; map plot(out) #Calculate map positions and convert the map to cross object from qtl package pos.map <- cal.pos(netgwasmap = out, pop.type= "RIL2", map.func = "haldane" ) plotMap(pos.map) ## End(Not run)
## Not run: sim <- simRIL(d=25, n=200, g=5, cM=100, selfing= 2) # to use the same genotyping coding as qtl package (See details) sim$data <- (sim$data) + 1 #Estimate linkage groups and order markers within each LG out <- netmap(sim$data, cross = "inbred") map <- out$map; map plot(out) #Calculate map positions and convert the map to cross object from qtl package pos.map <- cal.pos(netgwasmap = out, pop.type= "RIL2", map.func = "haldane" ) plotMap(pos.map) ## End(Not run)
cross
object to netgwas
data frameConverts cross
object from R/qtl package to netgwas
dataframe
cross2netgwas (cross.obj)
cross2netgwas (cross.obj)
cross.obj |
An object of class |
An () matrix corresponds to a genotype data matrix (
is the sample size and
is the number of variables). This matrix can be as an input data for
netmap
, and netsnp
functions.
Pariya Behrouzi
Maintainer: Pariya Behrouzi [email protected]
Calculates cut-points of ordinal variables with respect to the Gaussian copula.
cutoffs(y)
cutoffs(y)
y |
An ( |
The relationship between th variable and
th latent variable is expressed through this set of cut-points.
cutoffs |
A |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi <[email protected]>
1. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
2. Behrouzi, P., and Wit, E. C. (2018). De novo construction of polyploid linkage maps using discrete graphical models. Bioinformatics.
3. Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
lower.upper
, simgeno
and netgwas-package
.
D <- simgeno(p = 100, n = 50, k = 3) cutoffs(D$data)
D <- simgeno(p = 100, n = 50, k = 3) cutoffs(D$data)
The genotype data of the Cvi-0 Col-0 Recombinant Inbred Line (RIL) population.
data(CviCol)
data(CviCol)
The format is a matrix containing 90 single-nucleotide polymorphism (SNP) markers for 367 individuals.
The Arabidopsis thaliana genotype data is derived from a RIL cross between Columbia-0 (Col-0) and the Cape Verde Island (Cvi-0), where 367 individuals were genotyped for 90 genetic markers. This is a diploid population with three possible genotpe states (k = 3)
, where the genotypes coded as 0, 1, 2
, where 0 and 2 represent the homozygous genotypes and 1 defines the heterozygous genotype.
This data set can be used to detect epistatic selection, short- and long- range linkage disequilibrium between 90 SNP markers.
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Simon, M., et al. "QTL mapping in five new large RIL populations of Arabidopsis thaliana genotyped with consensus SNP markers." Genetics 178 (2008): 2253-2264. It is publicly available at http://publiclines.versailles.inra.fr/page/8
data(CviCol) dim(CviCol) head(CviCol, n=3)
data(CviCol) dim(CviCol) head(CviCol, n=3)
Calculates a LOD score for each genotype, measuring the evidence for genotyping errors. This uses calc.errorlod
function from R/qtl package.
detect.err(netgwas.map, err.prob= 0.01, cutoff= 4, pop.type= NULL, map.func= "haldane")
detect.err(netgwas.map, err.prob= 0.01, cutoff= 4, pop.type= NULL, map.func= "haldane")
netgwas.map |
An object of class |
err.prob |
Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype | true genotype). |
cutoff |
Only those genotypes with error LOD scores above this cutoff will be listed. |
pop.type |
Character string specifying the population type of the genotype data. Accepted values are "DH" (doubled haploid), "BC" (backcross), "RILn" (non-advanced RIL population with n generations of selfing) and "ARIL" (advanced RIL) (see Details). |
map.func |
Character string defining the distance function used for calculation of genetic distances. Options are "kosambi", "haldane", and "morgan". Default is "haldane". |
A data.frame with 4 columns, whose rows correspond to the genotypes that are possibly in error. The four columns give the chromosome number, individual number, marker name, and error LOD score.
## Not run: sim <- simRIL(d=25, n=200, g=5, cM=100, selfing= 2) # to use the same genotyping coding as R/qtl package (See details) sim$data <- (sim$data) + 1 #Estimate linkage groups and order markers within each LG out <- netmap(sim$data, cross = "inbred") map <- out$map; map plot(out) # A list of genotyoing error detect.err(out, pop.type = "RIL2") ## End(Not run)
## Not run: sim <- simRIL(d=25, n=200, g=5, cM=100, selfing= 2) # to use the same genotyping coding as R/qtl package (See details) sim$data <- (sim$data) + 1 #Estimate linkage groups and order markers within each LG out <- netmap(sim$data, cross = "inbred") map <- out$map; map plot(out) # A list of genotyoing error detect.err(out, pop.type = "RIL2") ## End(Not run)
Calculates lower and upper bands for each data point, using a set of cut-points which is obtained from the Gaussian copula.
lower.upper(y)
lower.upper(y)
y |
An ( |
lower |
A |
upper |
A |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi <[email protected]>
Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
cutoffs
and netgwas-package
.
D <- simgeno(p = 100, n = 50, k = 3) lower.upper(D$data)
D <- simgeno(p = 100, n = 50, k = 3) lower.upper(D$data)
Convertes netgwasmap
object from net.map
or buildMap
functions to cross
object from R/qtl package.
netgwas2cross(netgwasmap, pop.type= NULL, map.func = "haldane")
netgwas2cross(netgwasmap, pop.type= NULL, map.func = "haldane")
netgwasmap |
A |
pop.type |
Character string specifying the population type of the genotype data. Accepted values are "DH" (doubled haploid), "BC" (backcross), "RILn" (non-advanced RIL population with n generations of selfing) and "ARIL" (advanced RIL). |
map.func |
Character string defining the distance function used for calculation of genetic distances. Options are "kosambi", "haldane", and "morgan". Default is "haldane". |
If pop.typ = "RILn"
the number of generations of selfing is limited to 20 to ensure sensible input. The constructed object is returned as a R/qtl cross
object with the appropriate class structure. For "RILn"
populations the constructed object is given the class "bcsft"
by using the qtl package conversion function convert2bcsft
with arguments F.gen = n
and BC.gen = 0
. For "ARIL"
populations the constructed object is given the class "riself"
.
In R/qtl package, the genotype data for a backcross is coded as NA = missing, 1 = AA, 2 = AB. For an F2 intercross, the coding is NA = missing, 1 = AA, 2 = AB, 3 = BB, 4 = not BB (i.e. AA or AB), 5 = not AA (i.e. AB or BB).
The netgwas.map
object is returned as a cross
object form R/qtl. The object is a list with usual components "pheno"
and "geno"
.
geno |
The |
pheno |
Character string containing the genotype names. |
Pariya Behrouzi
Maintainer: Pariya Behrouzi [email protected]
## Not run: sim <- simRIL(d=25, n=200, g=5, cM=100, selfing= 2) # to use the same genotyping coding as R/qtl package (See details) sim$data <- (sim$data) + 1 #Estimate linkage groups and order markers within each LG out <- netmap(sim$data, cross = "inbred") map <- out$map; map plot(out) #Calculate map positions and convert the map to cross object from qtl package map <- netgwas2cross(netgwasmap = out, pop.type= "RIL2", map.func = "haldane" ) plotMap(map) ## End(Not run)
## Not run: sim <- simRIL(d=25, n=200, g=5, cM=100, selfing= 2) # to use the same genotyping coding as R/qtl package (See details) sim$data <- (sim$data) + 1 #Estimate linkage groups and order markers within each LG out <- netmap(sim$data, cross = "inbred") map <- out$map; map plot(out) #Calculate map positions and convert the map to cross object from qtl package map <- netgwas2cross(netgwasmap = out, pop.type= "RIL2", map.func = "haldane" ) plotMap(map) ## End(Not run)
This is one of the main functions of netgwas package. This function reconstructs linkage maps for biparental diploid and polyploid organisms using three methods.
netmap(data, method = "npn", cross= NULL, rho = NULL, n.rho = NULL, rho.ratio = NULL, min.m= NULL, use.comu= FALSE, ncores = "all", em.iter = 5, verbose = TRUE)
netmap(data, method = "npn", cross= NULL, rho = NULL, n.rho = NULL, rho.ratio = NULL, min.m= NULL, use.comu= FALSE, ncores = "all", em.iter = 5, verbose = TRUE)
data |
An ( |
method |
Three available methods to construct linkage map: "gibbs", "approx", and "npn". Default is "npn" |
rho |
A decreasing sequence of non-negative numbers that control the sparsity level. Leaving the input as |
n.rho |
The number of regularization parameters. The default value is |
rho.ratio |
Determines distance between the elements of |
cross |
To be specified either "inbred" or "outbred". |
min.m |
Expected minimum number of markers in a chromosome. Optional |
use.comu |
Use community detection algorithm to detect linkage groups. Default is FALSE. |
ncores |
The number of cores to use for the calculations. Using |
em.iter |
The number of EM iterations. The default value is 5. |
verbose |
Providing a detail message for tracing output. The default value is |
Constructing linkage maps for diploid and polyploid organisms. Diploid organisms contain two sets of chromosomes, one from each parent, whereas polyploids contain more than two sets of chromosomes. Inbreeding is mating between two parental lines where they have recent common biological ancestors. If they have no common ancestors up to roughly e.g. 4-6 generations, this is called outcrossing. In both cases the genomes of the derived progenies are random mosaics of the genome of the parents. However, in the case of inbreeding parental alleles are distinguishable in the genome of the progeny; in outcrossing this does not hold.
An object with S3 class "netgwasmap"
is returned:
map |
Constructed linkage map. |
opt.index |
The index of selected graph using model selection. |
cross |
The pre-specified cross type. |
allres |
A list containing results for different regularization parameter. Belongs to class "netgwas". To visualize a path of different 3D maps consider function |
man |
Stays FALSE. |
Pariya Behrouzi and Ernst C. Wit
Maintainers: Pariya Behrouzi [email protected]
1. Behrouzi, P., and Wit, E. C. (2018). De novo construction of polyploid linkage maps using discrete graphical models. Bioinformatics.
2. Behrouzi, Pariya, and Ernst C. Wit. "netgwas: An R Package for Network-Based Genome-Wide Association Studies." arXiv preprint arXiv:1710.01236 (2017).
3. Guo, Jian, Elizaveta Levina, George Michailidis, and Ji Zhu. "Graphical models for ordinal data." Journal of Computational and Graphical Statistics 24, no. 1 (2015): 183-204.
4. Liu, Han, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. "High-dimensional semiparametric Gaussian copula graphical models." The Annals of Statistics 40, no. 4 (2012): 2293-2326.
5. Witten, Daniela M., Jerome H. Friedman, and Noah Simon. "New insights and faster computations for the graphical lasso." Journal of Computational and Graphical Statistics 20, no. 4 (2011): 892-900.
## Not run: data(CviCol) #Randomly change the order of markers across the genome cvicol <- CviCol[ ,sample(ncol(CviCol))] #Constructing linkage map using gibbs method out <- netmap(cvicol, cross= "inbred", ncores=1); out #Estimated linkage map map <- out$map; map #Plot the associated network plot(out) #Visualizing the path networks plot(out$allres) #Build a linkage map for 5th networks bm <- buildMap(out, opt.index=5); bm #################### #Constructing linkage map using approx method out2 <- netmap(cvicol, method="approx", cross= "inbred", ncores=1); out2 #Estimated linkage map map2 <- out2$map; map2 #Plot the related network plot(out2) #Visualize other networks plot(out2$allres) #Build a linkage map for 5th network bm2 <- buildMap(out2, opt.index=5); bm2 #Constructing linkage map using npn method out3 <- netmap(cvicol, method="npn", cross= "inbred", ncores=1); out3 #Estimated linkage map map3 <- out3$map; map3 #Plot the related network plot(out3) ## End(Not run)
## Not run: data(CviCol) #Randomly change the order of markers across the genome cvicol <- CviCol[ ,sample(ncol(CviCol))] #Constructing linkage map using gibbs method out <- netmap(cvicol, cross= "inbred", ncores=1); out #Estimated linkage map map <- out$map; map #Plot the associated network plot(out) #Visualizing the path networks plot(out$allres) #Build a linkage map for 5th networks bm <- buildMap(out, opt.index=5); bm #################### #Constructing linkage map using approx method out2 <- netmap(cvicol, method="approx", cross= "inbred", ncores=1); out2 #Estimated linkage map map2 <- out2$map; map2 #Plot the related network plot(out2) #Visualize other networks plot(out2$allres) #Build a linkage map for 5th network bm2 <- buildMap(out2, opt.index=5); bm2 #Constructing linkage map using npn method out3 <- netmap(cvicol, method="npn", cross= "inbred", ncores=1); out3 #Estimated linkage map map3 <- out3$map; map3 #Plot the related network plot(out3) ## End(Not run)
This is one of the main functions of the netgwas package. This function reconstructs a conditional independence network between genotypes and phenotypes for diploids and polyploids. Three methods are available to reconstruct networks, namely (i) Gibbs sampling, (ii) approximation method, and (iii) nonparanormal approach within the Gaussian copula graphical model. The first two methods are able to deal with missing genotypes. The last one is computationally faster.
netphenogeno(data, method = "gibbs", rho = NULL, n.rho = NULL, rho.ratio = NULL, ncores = 1, em.iter = 5, em.tol=.001, verbose = TRUE)
netphenogeno(data, method = "gibbs", rho = NULL, n.rho = NULL, rho.ratio = NULL, ncores = 1, em.iter = 5, em.tol=.001, verbose = TRUE)
data |
An ( |
method |
Reconstructing both genotype-phenotype interactions network and genotype-phenotype-environment interactions network with three methods: "gibbs", "approx", and "npn". For a medium (~500) and a large number of variables we recommend to choose "gibbs" and "approx", respectively. Choosing "npn" for a very large number of variables (> 2000) is computationally efficient. The default method is "gibbs". |
rho |
A decreasing sequence of non-negative numbers that control the sparsity level. Leaving the input as |
n.rho |
The number of regularization parameters. The default value is |
rho.ratio |
Determines distance between the elements of |
ncores |
The number of cores to use for the calculations. Using |
em.iter |
The number of EM iterations. The default value is 5. |
em.tol |
A criteria to stop the EM iterations. The default value is .001. |
verbose |
Providing a detail message for tracing output. The default value is |
This function reconstructs both genotype-phenotype network and genotype-phenotype-environment interactions network. In genotype-phenotype networks nodes are either markers or phenotypes; each phenotype is connected by an edge to a marker if there is a direct association between them given the rest of the variables. Different phenotypes may also interconnect. In addition to markers and phenotypes information, the input data can include environmental variables. Then, the interactions network shows the conditional dependence relationships between markers, phenotypes and environmental factors.
An object with S3 class "netgwas"
is returned:
Theta |
A list of estimated p by p precision matrices that show the conditional independence relationships patterns among measured items. |
path |
A list of estimated p by p adjacency matrices. This is the graph path corresponding to |
ES |
A list of estimated p by p conditional expectation corresponding to |
Z |
A list of n by p transformed data based on Gaussian copula. |
rho |
A |
loglik |
A |
data |
The |
This function estimates a graph path . To select an optimal graph please refer to selectnet
.
Pariya Behrouzi and Ernst C. Wit
Maintainers: Pariya Behrouzi [email protected]
1. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
2. Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
3. D. Witten and J. Friedman. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, to appear, 2011.
4. Guo, Jian, et al. "Graphical models for ordinal data." Journal of Computational and Graphical Statistics 24.1 (2015): 183-204.
data(thaliana) head(thaliana, n=3) #Construct a path for genotype-phenotype interactions network in thaliana data res <- netphenogeno(data = thaliana); res plot(res) #Select an optimal network sel <- selectnet(res) #Plot selected network and the conditional correlation (CI) relationships plot(sel, vis="CI") plot(sel, vis="CI", n.mem = c(8, 56, 31, 33, 31, 30), w.btw =50, w.within= 1) #Visualize interactive plot for the selected network #Color "red" for 8 phenotypes, and different colors for each chromosome. cl <- c(rep("red", 8), rep("white",56), rep("tan1",31), rep("gray",33), rep("lightblue2",31), rep("salmon2",30)) #The IDs of phenotypes and SNPs to be shown in the network id <- c("DTF_LD","CLN_LD","RLN_LD","TLN_LD","DTF_SD","CLN_SD","RLN_SD", "TLN_SD","snp15","snp16","snp17","snp49","snp50","snp60","snp75", "snp76","snp81","snp83","snp84","snp86","snp82", "snp113","snp150", "snp155","snp159","snp156","snp161","snp158","snp160","snp162","snp181") plot(sel, vis="interactive", n.mem = c(8, 56, 31, 33, 31, 30), vertex.color= cl, label.vertex= "some", sel.nod.label= id, edge.color= "gray", w.btw= 50, w.within= 1) #Partial correlations between genotypes and phenotypes in the thaliana dataset. library(Matrix) image(sel$par.cor, xlab="geno-pheno", ylab="geno-pheno", sub="")
data(thaliana) head(thaliana, n=3) #Construct a path for genotype-phenotype interactions network in thaliana data res <- netphenogeno(data = thaliana); res plot(res) #Select an optimal network sel <- selectnet(res) #Plot selected network and the conditional correlation (CI) relationships plot(sel, vis="CI") plot(sel, vis="CI", n.mem = c(8, 56, 31, 33, 31, 30), w.btw =50, w.within= 1) #Visualize interactive plot for the selected network #Color "red" for 8 phenotypes, and different colors for each chromosome. cl <- c(rep("red", 8), rep("white",56), rep("tan1",31), rep("gray",33), rep("lightblue2",31), rep("salmon2",30)) #The IDs of phenotypes and SNPs to be shown in the network id <- c("DTF_LD","CLN_LD","RLN_LD","TLN_LD","DTF_SD","CLN_SD","RLN_SD", "TLN_SD","snp15","snp16","snp17","snp49","snp50","snp60","snp75", "snp76","snp81","snp83","snp84","snp86","snp82", "snp113","snp150", "snp155","snp159","snp156","snp161","snp158","snp160","snp162","snp181") plot(sel, vis="interactive", n.mem = c(8, 56, 31, 33, 31, 30), vertex.color= cl, label.vertex= "some", sel.nod.label= id, edge.color= "gray", w.btw= 50, w.within= 1) #Partial correlations between genotypes and phenotypes in the thaliana dataset. library(Matrix) image(sel$par.cor, xlab="geno-pheno", ylab="geno-pheno", sub="")
This is one of the main functions of the netgwas package. This function can be used to reconstruct the intra- and inter-chromosomal interactions among genetic loci in diploids and polyploids. The input data can be belong to any biparental genotype data which contains at least two genotype states. Two methods are available to reconstruct the network, namely (1) approximation method, and (2) gibbs sampling within the Gaussian copula graphical model. Both methods are able to deal with missing genotypes.
netsnp(data, method = "gibbs", rho = NULL, n.rho = NULL, rho.ratio = NULL, ncores = 1, em.iter = 5, em.tol = .001, verbose = TRUE)
netsnp(data, method = "gibbs", rho = NULL, n.rho = NULL, rho.ratio = NULL, ncores = 1, em.iter = 5, em.tol = .001, verbose = TRUE)
data |
An ( |
method |
Reconstructs intra- and inter- chromosomal conditional interactions (epistatic selection) network with three methods: "gibbs", "approx", and "npn". For a medium (~500) and a large number of variables we would recommend to choose "gibbs" and "approx", respectively. For a very large number of variables (> 2000) choose "npn". The default method is "gibbs". |
rho |
A decreasing sequence of non-negative numbers that control the sparsity level. Leaving the input as |
n.rho |
The number of regularization parameters. The default value is |
rho.ratio |
Determines the distance between the elements of |
ncores |
The number of cores to use for the calculations. Using |
em.iter |
The number of EM iterations. The default value is 5. |
em.tol |
A criteria to stop the EM iterations. The default value is .001. |
verbose |
Providing a detail message for tracing output. The default value is |
Viability is a phenotype that can be considered. This function detects the conditional dependent short- and long-range linkage disequilibrium structure of genomes and thus reveals aberrant marker-marker associations that are due to epistatic selection. This function can be used to estimate conditional independence relationships between partially observed data that not follow Gaussianity assumption (e.g. continuous non-Gaussian, discrete, or mixed dataset).
An object with S3 class "netgwas"
is returned:
Theta |
A list of estimated p by p precision matrices that show the conditional independence relationships patterns among genetic loci. |
path |
A list of estimated p by p adjacency matrices. This is the graph path corresponding to |
ES |
A list of estimated p by p conditional expectation corresponding to |
Z |
A list of n by p transformed data based on Gaussian copula. |
rho |
A |
loglik |
A |
data |
The |
This function estimates a graph path . To select an optimal graph please refer to selectnet
.
Pariya Behrouzi and Ernst C. Wit
Maintainers: Pariya Behrouzi [email protected]
1. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
2. Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
3. D. Witten and J. Friedman. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, to appear, 2011.
4. Guo, Jian, et al. "Graphical models for ordinal data." Journal of Computational and Graphical Statistics 24.1 (2015): 183-204.
data(CviCol) out <- netsnp(CviCol); out plot(out) #select optimal graph epi <- selectnet(out) plot(epi, vis="CI", xlab="markers", ylab="markers", n.mem = c(24,14,17,16,19), vertex.size=4) #Visualize interactive plot of the selected network #Different colors for each chromosome cl <- c(rep("red", 24), rep("white",14), rep("tan1",17), rep("gray",16), rep("lightblue2",19)) plot(epi, vis="interactive", vertex.color= cl) #Partial correlations between markers on genome image(as.matrix(epi$par.cor), xlab="markers", ylab="markers", sub="")
data(CviCol) out <- netsnp(CviCol); out plot(out) #select optimal graph epi <- selectnet(out) plot(epi, vis="CI", xlab="markers", ylab="markers", n.mem = c(24,14,17,16,19), vertex.size=4) #Visualize interactive plot of the selected network #Different colors for each chromosome cl <- c(rep("red", 24), rep("white",14), rep("tan1",17), rep("gray",16), rep("lightblue2",19)) plot(epi, vis="interactive", vertex.color= cl) #Partial correlations between markers on genome image(as.matrix(epi$par.cor), xlab="markers", ylab="markers", sub="")
Plot the graph path which is the output of two functions netsnp
, netphenogeno
.
## S3 method for class 'netgwas' plot( x, n.markers=NULL , ... )
## S3 method for class 'netgwas' plot( x, n.markers=NULL , ... )
x |
An object from "netgwas" class. |
n.markers |
A vector containing number of variables/markers in each group/chromosome. For example, the CviCol dataset that is provided in the package contains 5 chromosomes/ groups which the total number of markers is |
... |
System reserved (No specific usage) |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
Plot the graph associated with constructed linkage map via function netmap
.
## S3 method for class 'netgwasmap' plot(x, vis= NULL, layout= NULL, vertex.size= NULL, label.vertex = "none", label.size= NULL, vertex.color= NULL, edge.color = "gray29", sel.ID = NULL, ... )
## S3 method for class 'netgwasmap' plot(x, vis= NULL, layout= NULL, vertex.size= NULL, label.vertex = "none", label.size= NULL, vertex.color= NULL, edge.color = "gray29", sel.ID = NULL, ... )
x |
An object from "netgwasmap" class. |
vis |
Visualizing in four options: (i) "summary" plots the related network, conditional dependence relationships between markers before and after ordering markers; (ii) "interactive" plots the associated network, where it opens a new windows with interactive graph drawing facility; (iii) "unordered markers" plots the conditional dependence relationships between markers before ordering markers; (iv) "ordered markers" plots conditional dependence relationships between markers after ordering markers. Default is "summary". |
layout |
The vertex placement algorithm which is according to igraph package. The default layout is Fruchterman-Reingold layout. Other possible layouts are, for example, layout_with_kk, circle, and Reingold-Tilford graph in igraph package. |
vertex.size |
Optional integer to adjust vertex size in graph G. Default is 5. |
label.vertex |
Assign names to the vertices. There are three options: "none", "some", "all". (i) Specifying "none" omits vertex labels in the graph, (ii) using |
label.size |
Optional integer to adjust the size of node's label in graph G. Applicable when vertex.label is TRUE. Default is 0.8. |
vertex.color |
Optional integer vectors giving colors to the vertices. |
edge.color |
Optional integer vectors giving colors to edges. |
sel.ID |
ONLY applicable when |
... |
ONLY applicable when |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
1. Behrouzi, P., and Wit, E. C. (2018). De novo construction of polyploid linkage maps using discrete graphical models. Bioinformatics.
2. Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
Plot the optimal graph by model selection
## S3 method for class 'select' plot(x, vis= NULL, xlab= NULL, ylab= NULL, n.mem= NULL, vertex.label= FALSE , ..., layout= NULL, label.vertex= "all", vertex.size= NULL, vertex.color= NULL, edge.color= "gray29", sel.nod.label= NULL, label.size = NULL, w.btw= 800, w.within = 10, sign.edg= TRUE, edge.width= NULL, edge.label= NULL, max.degree= NULL, layout.tree= NULL, root.node= NULL, degree.node= NULL, curve= FALSE, delet.v =TRUE, pos.legend= "bottomleft", cex.legend= 0.8, iterl = NULL, temp = NULL, tk.width = NULL, tk.height= NULL)
## S3 method for class 'select' plot(x, vis= NULL, xlab= NULL, ylab= NULL, n.mem= NULL, vertex.label= FALSE , ..., layout= NULL, label.vertex= "all", vertex.size= NULL, vertex.color= NULL, edge.color= "gray29", sel.nod.label= NULL, label.size = NULL, w.btw= 800, w.within = 10, sign.edg= TRUE, edge.width= NULL, edge.label= NULL, max.degree= NULL, layout.tree= NULL, root.node= NULL, degree.node= NULL, curve= FALSE, delet.v =TRUE, pos.legend= "bottomleft", cex.legend= 0.8, iterl = NULL, temp = NULL, tk.width = NULL, tk.height= NULL)
x |
An object with S3 class "select" |
vis |
Visualizing the results as a graph (network) or as a matrix. There are 4 options to visulize the selected graph: (i) "CI": plotting conditional independence (CI) relationships between variables, (ii) "interactive": plotting the conditional independence network, where opens a new windows with interactive graph drawing facility, and (iii) "parcor.network": plots the estimated graph based on partial correlation values. (iv) "parcor.interactive": plots the estimated graph based on partial correlation matrix with an interactive graph drawing facility. Default is "CI". |
xlab |
ONLY applicable when |
ylab |
ONLY applicable when |
n.mem |
A vector of memberships. For example, the CviCol dataset, which is provided in the package, contain 5 chromosomes which the total number of markers is |
vertex.label |
ONLY applicable when |
... |
ONLY applicable when |
layout |
ONLY applicable when |
label.vertex |
ONLY applicable when |
vertex.size |
Optional. The size of vertices in the graph visualization. The default value is 7. |
vertex.color |
ONLY applicable when |
edge.color |
ONLY applicable when |
sel.nod.label |
ONLY applicable when |
label.size |
ONLY applicable for |
w.btw |
Distance between nodes from different memberships of |
w.within |
Distance of nodes within one membership of |
sign.edg |
Optional. ONLY applicable when |
edge.width |
Optional. ONLY applicable when |
edge.label |
Optional. ONLY applicable when |
max.degree |
Optional. ONLY applicable when |
layout.tree |
Optional. ONLY applicable when |
root.node |
Optional. ONLY applicable when |
degree.node |
Optional. ONLY applicable when |
curve |
Optional. ONLY applicable when |
delet.v |
Delete vertices with no edges. Default is "TRUE". |
pos.legend |
Applicable when |
cex.legend |
Applicable when |
iterl |
Optional. ONLY applicable when |
temp |
Optional. ONLY applicable when |
tk.width |
Optional. The size of the drawing area of interactive plot. |
tk.height |
Optional. The size of the drawing area of interactive plot. |
An object with S3 class "select"
is returned:
network |
Plot of a selected graph, when |
adjacency |
Conditional independence (CI) relationships between variables, when |
network |
Interactive plot of a selected graph with .eps format, when |
Pariya Behrouzi
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
#simulate data data(CviCol) out <- netsnp(CviCol) sel <- selectnet(out) cl <- c(rep("palegoldenrod", 24), rep("white",14), rep("tan1",17), rep("gray",16), rep("lightblue2",19)) plot(sel, vis= "parcor.network", sign.edg = TRUE, layout = NULL, vertex.color = cl)
#simulate data data(CviCol) out <- netsnp(CviCol) sel <- selectnet(out) cl <- c(rep("palegoldenrod", 24), rep("white",14), rep("tan1",17), rep("gray",16), rep("lightblue2",19)) plot(sel, vis= "parcor.network", sign.edg = TRUE, layout = NULL, vertex.color = cl)
S3
class "simgeno"
Visualizes the pattern of the true graph, the adjacency matrix, precison matrix and the covariance matrix of the simulated data.
## S3 method for class 'simgeno' plot(x, layout = layout.fruchterman.reingold, ...)
## S3 method for class 'simgeno' plot(x, layout = layout.fruchterman.reingold, ...)
x |
An object of |
layout |
The default is |
... |
System reserved (No specific usage) |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
## Not run: # Generating discrete ordinal data with "genome-like" graph structure data.sim <- simgeno(alpha = 0.01, beta = 0.02) plot( data.sim ) ## End(Not run)
## Not run: # Generating discrete ordinal data with "genome-like" graph structure data.sim <- simgeno(alpha = 0.01, beta = 0.02) plot( data.sim ) ## End(Not run)
Print a summary of results from function netsnp
, netphenogeno
.
## S3 method for class 'netgwas' print(x, ...)
## S3 method for class 'netgwas' print(x, ...)
x |
An object with S3 class "netgwas" |
... |
System reserved (No specific usage) |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
Print a summary of results from function netmap
.
## S3 method for class 'netgwasmap' print(x, ...)
## S3 method for class 'netgwasmap' print(x, ...)
x |
An object with S3 class "netgwasmap" |
... |
System reserved (No specific usage) |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
Print function for selectnet
.
## S3 method for class 'select' print(x, ...)
## S3 method for class 'select' print(x, ...)
x |
An object with S3 class "select" |
... |
System reserved (No specific usage) |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
Print a summary of simulated data from function simgeno
.
## S3 method for class 'simgeno' print(x, ...)
## S3 method for class 'simgeno' print(x, ...)
x |
An object with S3 class |
... |
System reserved (No specific usage) |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
Behrouzi, P., and Wit, E. C. (2017c). netgwas: An R Package for Network-Based Genome-Wide Association Studies. arXiv preprint, arXiv:1710.01236.
This function implements the approximation method within the Gaussian copula graphical model to estimate the conditional expectation for the data that not follow Gaussianity assumption (e.g. ordinal, discrete, continuous non-Gaussian, or mixed dataset).
R.approx(y, Z = NULL, Sigma=NULL, rho = NULL, ncores = NULL )
R.approx(y, Z = NULL, Sigma=NULL, rho = NULL, ncores = NULL )
y |
An ( |
Z |
A ( |
Sigma |
The covariance matrix of the latent variable given the data. If |
rho |
A (non-negative) regularization parameter to calculate |
ncores |
If |
ES |
Expectation of covariance matrix( diagonal scaled to 1) of the Gaussian copula graphical model. |
Z |
New transformation of the data based on given or default |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi [email protected]
1. Behrouzi, P., Arends, D., and Wit, E. C. (2023). netgwas: An R Package for Network-Based Genome-Wide Association Studies. The R journal, 14(4), 18-37.
2. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
## Not run: D <- simgeno(p = 90, n = 50, k = 3) R.approx(D$data) ## End(Not run)
## Not run: D <- simgeno(p = 90, n = 50, k = 3) R.approx(D$data) ## End(Not run)
This function implements the Gibbs sampling method within Gaussian copula graphical model to estimate the conditional expectation for the data that not follow Gaussianity assumption (e.g. ordinal, discrete, continuous non-Gaussian, or mixed dataset).
R.gibbs(y, theta, gibbs.iter = 1000, mc.iter = 500, ncores = NULL, verbose = TRUE)
R.gibbs(y, theta, gibbs.iter = 1000, mc.iter = 500, ncores = NULL, verbose = TRUE)
y |
An ( |
theta |
A |
gibbs.iter |
The number of burn-in for the Gibbs sampling. The default value is 1000. |
mc.iter |
The number of Monte Carlo samples to calculate the conditional expectation. The default value is 500. |
ncores |
If |
verbose |
If |
This function calculates using Gibbs sampling method within the E-step of EM algorithm, where
which is the number of sample size and
is the latent variable which is obtained from Gaussian copula graphical model.
ES |
Expectation of covariance matrix ( diagonal scaled to 1) of the Gaussian copula graphical model |
Pariya Behrouzi, Danny Arends and Ernst C. Wit
Maintainers: Pariya Behrouzi [email protected]
1. Behrouzi, P., Arends, D., and Wit, E. C. (2023). netgwas: An R Package for Network-Based Genome-Wide Association Studies. The R journal, 14(4), 18-37.
2. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
D <- simgeno(p = 100, n = 50, k = 3) R.gibbs(D$data, ncores=1)
D <- simgeno(p = 100, n = 50, k = 3) R.gibbs(D$data, ncores=1)
Estimate the optimal regularization parameter at EM convergence based on different information criteria .
selectnet(netgwas.obj, opt.index= NULL, criteria= NULL, ebic.gamma=0.5, ncores= NULL, verbose= TRUE)
selectnet(netgwas.obj, opt.index= NULL, criteria= NULL, ebic.gamma=0.5, ncores= NULL, verbose= TRUE)
netgwas.obj |
An object with S3 class "netgwas" |
opt.index |
The program internally determines an optimal graph using |
criteria |
Model selection criteria. "ebic" and "aic" are available. BIC model selection can be calculated by fixing |
ebic.gamma |
The tuning parameter for ebic. The |
ncores |
The number of cores to use for the calculations. Using |
verbose |
If |
This function computes extended Bayesian information criteria (ebic), Bayesian information criteria, Akaike information criterion (aic) at EM convergence based on observed or joint log-likelihood. The observed log-likelihood can be obtained through
Where can be calculated from
netmap
, netsnp
, netphenogeno
function and H function is
The "ebic" and "aic" model selection criteria can be obtained as follow
where refers to the number of non-zeros offdiagonal elements of
, and
. Typical value for for
ebic.gamma
is 1/2, but it can also be tuned by experience. Fixing ebic.gamma = 0
results in bic model selection.
An obj with S3 class "selectnet" is returned:
opt.adj |
The optimal graph selected from the graph path |
opt.theta |
The optimal precision matrix from the graph path |
opt.sigma |
The optimal covariance matrix from the graph path |
ebic.scores |
Extended BIC scores for regularization parameter selection at the EM convergence. Applicable if |
opt.index |
The index of optimal regularization parameter. |
opt.rho |
The selected regularization parameter. |
par.cor |
A partial correlation matrix. |
V.names |
Variables name whose are not isolated. |
and anything else that is included in the input netgwas.obj
.
Pariya Behrouzi and Ernst C.Wit
Maintainer: Pariya Behrouzi [email protected]
1. BBehrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
2. Behrouzi, P., Arends, D., and Wit, E. C. (2023). netgwas: An R Package for Network-Based Genome-Wide Association Studies. The R journal, 14(4), 18-37.
3. Ibrahim, Joseph G., Hongtu Zhu, and Niansheng Tang. (2012). Model selection criteria for missing-data problems using the EM algorithm. Journal of the American Statistical Association.
4. D. Witten and J. Friedman. (2011). New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, to appear.
5. J. Friedman, T. Hastie and R. Tibshirani. (2007). Sparse inverse covariance estimation with the lasso, Biostatistics.
6. Foygel, R. and M. Drton. (2010). Extended bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems, pp. 604-612.
#simulate data D <- simgeno(p=50, n=100, k= 3, adjacent = 3, alpha = 0.06 , beta = 0.06) plot(D) #explore intra- and inter-chromosomal interactions out <- netsnp(D$data, n.rho= 5, ncores= 1) plot(out) #different graph selection methods sel.ebic1 <- selectnet(out, criteria = "ebic") plot(sel.ebic1, vis = "CI") sel.aic <- selectnet(out, criteria = "aic") plot(sel.aic, vis = "CI") sel.bic <- selectnet(out, criteria = "ebic", ebic.gamma = 0) plot(sel.bic, vis = "CI")
#simulate data D <- simgeno(p=50, n=100, k= 3, adjacent = 3, alpha = 0.06 , beta = 0.06) plot(D) #explore intra- and inter-chromosomal interactions out <- netsnp(D$data, n.rho= 5, ncores= 1) plot(out) #different graph selection methods sel.ebic1 <- selectnet(out, criteria = "ebic") plot(sel.ebic1, vis = "CI") sel.aic <- selectnet(out, criteria = "aic") plot(sel.aic, vis = "CI") sel.bic <- selectnet(out, criteria = "ebic", ebic.gamma = 0) plot(sel.bic, vis = "CI")
Generating discrete ordinal data based on underlying "genome-like" graph structure. The procedure of simulating data relies on a continues variable, which can be simulated from either multivariate normal distribution, or multivariate t-distribution with d
degrees of freedom.
simgeno( p = 90, n = 200, k = NULL, g = NULL, adjacent = NULL, alpha = NULL , beta = NULL, con.dist = "Mnorm", d = NULL, vis = TRUE)
simgeno( p = 90, n = 200, k = NULL, g = NULL, adjacent = NULL, alpha = NULL , beta = NULL, con.dist = "Mnorm", d = NULL, vis = TRUE)
p |
The number of variables. The default value is 90. |
n |
The number of sample size (observations). The default value is 200. |
k |
The number of states (categories). The default value is 3. |
g |
The number of groups (chromosomes) in the graph. The default value is about |
adjacent |
The number of adjacent variable(s) to be linked to a variable. For example, if |
alpha |
A probability that a pair of non-adjacent variables in the same group is given an edge. The default value is 0.01. |
beta |
A probability that variables in different groups are linked with an edge. The default value is 0.02. |
con.dist |
The distribution of underlying continuous variable. If |
d |
The degrees of freedom of the continuous variable, only applicable when
|
vis |
Visualize the graph pattern and the adjacency matrix of the true graph structure. The default value is TRUE. |
The graph pattern is generated as below:
genome-like: p
variables are evenly partitions variables into g
disjoint groups; the adjacent variables within each group are linked via an edge. With a probability alpha
a pair of non-adjacent variables in the same group is given an edge. Variables in different groups are linked with an edge with a probability of beta
.
An object with S3 class "simgeno" is returned:
data |
The generated data as an |
Theta |
A |
adj |
A |
Sigma |
A |
n.groups |
The number of groups. |
groups |
A vector that indicates each variable belongs to which group. |
sparsity |
The sparsity levels of the true graph. |
Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi <[email protected]>
1. Behrouzi, P., Arends, D., and Wit, E. C. (2023). netgwas: An R Package for Network-Based Genome-Wide Association Studies. The R journal, 14(4), 18-37.
2. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.
netsnp
, and netgwas-package
#genome-like graph structure sim1 <- simgeno(alpha = 0.01, beta = 0.02) plot(sim1) #genome-like graph structure with more edges between variables in a same or different groups sim2 <- simgeno(adjacent = 3, alpha = 0.02 , beta = 0.03) plot(sim2) #simulate data D <- simgeno(p=50, n=100, g=5, k= 3, adjacent = 3, alpha = 0.06 , beta = 0.08) plot(D) #Reconstructing intra- and inter-chromosomal conditional interactions (LD) network out <- netsnp(data = D$data, n.rho= 4, ncores= 1) plot(out) #Select an optimal graph sel <- selectnet(out) plot(sel, vis= "CI" )
#genome-like graph structure sim1 <- simgeno(alpha = 0.01, beta = 0.02) plot(sim1) #genome-like graph structure with more edges between variables in a same or different groups sim2 <- simgeno(adjacent = 3, alpha = 0.02 , beta = 0.03) plot(sim2) #simulate data D <- simgeno(p=50, n=100, g=5, k= 3, adjacent = 3, alpha = 0.06 , beta = 0.08) plot(D) #Reconstructing intra- and inter-chromosomal conditional interactions (LD) network out <- netsnp(data = D$data, n.rho= 4, ncores= 1) plot(out) #Select an optimal graph sel <- selectnet(out) plot(sel, vis= "CI" )
Generating genotype data from a recombinant inbred line (RIL) population.
simRIL( d = 25, n = 200, g = 5, cM = 100, selfing=2 )
simRIL( d = 25, n = 200, g = 5, cM = 100, selfing=2 )
d |
The number of markers per chromosome. The default value is 25. |
n |
The number of sample size (observations). The default value is 200. |
g |
The number of linkage groups (chromosomes). The default value is 5. |
cM |
The length of each chromosome based on centiMorgan. |
selfing |
The number of selfing in RIL population. |
data |
The generated RIL genotype data as an |
map |
The genetic map of the data. |
Pariya Behrouzi
Maintainer: Pariya Behrouzi <[email protected]>
netmap
, netsnp
, and netgwas-package
#genome-like graph structure ril <- simRIL(g = 5, d = 25, cM = 100, n = 200, selfing = 2) geno <- ril$data; image(geno, xlab= "individuals", ylab="markers") map <- ril$map
#genome-like graph structure ril <- simRIL(g = 5, d = 25, cM = 100, n = 200, selfing = 2) geno <- ril$data; image(geno, xlab= "individuals", ylab="markers") map <- ril$map
Tetraploid potato (Solanum tuberosum L.) genotype data.
data(tetraPotato)
data(tetraPotato)
The format is a matrix containing 1972 single-nucleotide polymorphism (SNP) markers for 156 individuals.
The full-sib mapping population MSL603 consists of 156 F1 plants resulting from a cross between female parent "Jacqueline Lee" and male parent "MSG227-2". The obtained genotype data contain 1972 SNP markers with five allele dosages. This genotype data can be used to construct linkage map for tetraploid potato (see below example).
Massa, Alicia N., Norma C. Manrique-Carpintero, Joseph J. Coombs, Daniel G. Zarka, Anne E. Boone, William W. Kirk, Christine A. Hackett, Glenn J. Bryan, and David S. Douches. "Genetic linkage mapping of economically important traits in cultivated tetraploid potato (Solanum tuberosum L.)." G3: Genes, Genomes, Genetics 5, no. 11 (2015): 2357-2364.
data(tetraPotato) #Shuffle the order of markers potato <- tetraPotato[ , sample(ncol(tetraPotato))] #Constructing linkage map for tetraploid potato out <- netmap(potato, cross = "outbred"); out potato.map <- out$map; potato.map #plot(out)
data(tetraPotato) #Shuffle the order of markers potato <- tetraPotato[ , sample(ncol(tetraPotato))] #Constructing linkage map for tetraploid potato out <- netmap(potato, cross = "outbred"); out potato.map <- out$map; potato.map #plot(out)
The genotype data of the Kend-L x Col Recombinant Inbred Line (RIL) population along with flowering time and leaf numbers phenotype information.
data(thaliana)
data(thaliana)
The format is a matrix containing 181 single-nucleotide polymorphism (SNP) markers and 8 phenotypes information for 197 individuals.
The accession Kend-L (Kendalville-Lehle; Lehle-WT-16-03) is crossed to the common lab strain Col (Co\-lum\-bi\-a).
The resulting lines were taken through six rounds of selfing without any intentional selection. The resulting 282 KendC (Kend-L Col)
lines were genotyped at 181 markers. The flowering time was measured for 197 lines of this population in both long days, which promote rapid
flowering in many A. thaliana strains, and in short days. Flowering time was measured using days to flowering (DTF) as well as the
total number of leaves (TLN), partitioned into rosette and cauline leaves. In total eight phenotypes have been measured, namely days to
flowering (DTF), cauline leaf number (CLN), rosette leaf number (RLN), and total leaf number (TLN) in long days (LD), and DTF, CLN, RLN,
and TLN in short days (SD). Thus, the final dataset consist of 197 observations for 189 variables (8 phenotypes and 181 genotypes - SNP markers)
This data set can be used to reconstruct network among SNP markers and the measured phenotypes.
Balasubramanian, Sureshkumar, et al. (2009). "QTL mapping in new Arabidopsis thaliana advanced intercross-recombinant inbred lines." PLoS One 4.2: e4318.
## Not run: data(thaliana) # Graph path out <- netphenogeno(thaliana, ncores=1) plot(out) sel <- selectnet(out) plot(sel, vis= "interactive") ## End(Not run)
## Not run: data(thaliana) # Graph path out <- netphenogeno(thaliana, ncores=1) plot(out) sel <- selectnet(out) plot(sel, vis= "interactive") ## End(Not run)