In this post, we run BrainXcan with an example.
First of all, please make sure you’ve cloned Brainxcan code repository and installed all the BrainXcan software dependencies.
To perform BrainXcan analysis, we need to prepare the following files:
The BrainXcan database and meta files can be downloaded from brainxcan database zenodo link. The directory structure is
brainxcan_data/
|-- bxcan_vis
|-- geno_covar
|-- idp_gwas
|-- idp_weights
`-- mr
There is no need to change the structure and it is designed to be used asis.
See some detailed notes on the BrainXcan database at “Looking into BrainXcan database”.
In this example, we provide a pre-formatted GWAS file at here: preview link and direct download link, which is adapted from a schizophrenia GWAS by Ripke et al. (2020) (medRxiv link).
First, let’s download and take a look at it.
# download
wget https://uchicago.box.com/shared/static/s2yg1gvxm5vv1q3yijf5h389o3hkadwv.gz \
-O formatted.SCZ_PGC_2020.tsv.gz
# look at the first few rows
zcat < formatted.SCZ_PGC_2020.tsv.gz | head
# returns
variant_id chromosome non_effect_allele effect_allele effect_size standard_error zscore sample_size frequency
rs3131972 chr1 G A -0.0020020026706730793 0.0111 -0.18439957 71533 0.1652
rs3131969 chr1 G A -0.006601743634562364 0.0118 -0.56070304 71533 0.1339
rs3131967 chr1 C T -0.005796768847177342 0.0119 -0.48651794 71533 0.1171
rs1048488 chr1 T C 0.002696361547742533 0.0117 0.2303755 71533 0.1667
rs12562034 chr1 G A -0.003902375817241093 0.0156 -0.24843223 66899 0.1027
rs12124819 chr1 A G -0.011800104115750618 0.0181 -0.64967835 58941 0.2243
rs4040617 chr1 A G -0.0010993954433010642 0.0127 -0.09036144 71843 0.1295
rs4970383 chr1 C A 0.02810116511090993 0.0111 2.5433347 64548 0.1964
rs4475691 chr1 C T 0.012102946067745635 0.0121 0.9998151 65154 0.1429
TLDR: The configuration file is in YAML format and the one for example run can be downloaded from here. And there are a few fields that you need to fill in.
If you are not in a hurry, we recommend going through the following sections which explain the configuration file in details.
The configuration file contains three main parts:
We will go over them one by one.
datadir
: Path to the BrainXcan database folder.outdir
: Output directory name (if it does not exist, it
will be generated).prefix
: Prefix of the output files (to distinguish
between results in the same output folder).brainxcan_path
: Path to the BrainXcan code
repository.In the configuration YAML, we have:
datadir: 'path-to-braixcan-database/brainxcan_data'
outdir: 'brainxcan_test'
prefix: 'SCZ'
brainxcan_path: 'path-to-brainxcan-repository/brainxcan'
Here we want to tell the software how to understand the GWAS file.
gwas
: Path to GWAS file (formats:
csv(.gz)
, tsv(.gz)
,
parquet
).snpid
: Column name of GWAS SNP rsIDeffect_allele
: Column name of GWAS effect allelenon_effect_allele
: Column name of GWAS non-effect
allelechr
: Column name of chromosomeposition
: Column name of positionAlong with Group 1 or 2 (see details at here):
effect_size
: Column name of GWAS effect sizeeffect_size_se
: Column name of GWAS effect size
standard errorWhen both groups are specified, group 1 has high priority. For the example run, we go with group 1.
In the configuration YAML, we have:
gwas: 'formatted.SCZ_PGC_2020.tsv.gz'
snpid: 'variant_id'
effect_allele: 'effect_allele'
non_effect_allele: 'non_effect_allele'
chr: 'chromosome'
effect_size: 'effect_size'
effect_size_se: 'standard_error'
position: 'position'
There are several parameters underlying a BrainXcan run. They are associated with three aspects:
model_type
: IDP prediction model type:
ridge
or elastic_net
(Default
= ridge
)idp_type
: IDP sets to use: original
or
residual
(after PC adjustment) (Default =
residual
)spearman_cutoff
: CV Spearman cutoff on models (only
models passing this criteria will be shown) (Default =
0.1
)pheno_h2
: GWAS trait heritability (SNP heritability -
e.g. it can be estimated from LDSC regression)gwas_n
: GWAS effective sample sizebxcan_vc_z
: Set to True when want run this
adjustmentbxcan_vc_phi
: Path to the phi value filebxcan_ldblock_perm
: If want to run this adjustment, set
the path to the BED file (TAB-delimited base0 with header chr, start,
stop)bxcan_ldblock_perm_seed
: Random seedbxcan_ldblock_perm_nrepeat
: Number of permutation
across all IDPscorrection_factor_perm
: Correcting the standard error
of the simulated z-scores by multiplying with this factor (default =
1.1)bxcan_empirical_z
: Whether to run this adjustment or
notbxcan_empirical_z_seed
: Random seedbxcan_empirical_z_nrepeat
: Number of simulated
IDPscorrection_factor_emp
: Correcting the standard error of
the simulated z-scores by multiplying with this factorfdr_cutoff
: FDR cutoff to call an IDP significant
(Default = 0.1). This will overwrite
signif_pval
below.signif_pval
: P-value cutoff to call an IDP significant
(Default = 1e-5
)signif_max_idps
: Maximun number of IDPs to run MR
(discard by p-value if too many IDPs passing p-value filter)
(Default = 10
)ld_clump_yaml
: Parameters of LD clumping for defining
the instrument SNPs in MR (Default =
{datadir}/mr/ld_clump.yaml
)\(\dagger\)rscript_exe
: Path to the Rscript executable
(Default = Rscript
)python_exe
: Path to the Python executable
(Default = python
)plink_exe
: Path to the plink v1.9 executable
(Default = plink
)\(\dagger\) {datadir}
will be interpreted as
path-to-braixcan-database/brainxcan_data
internally since
we set
datadir: path-to-braixcan-database/brainxcan_data
.
In the configuration YAML, we have:
model_type: 'ridge'
idp_type: 'residual'
spearman_cutoff: 0.1
fdr_cutoff: 0.1
signif_pval: 0.01
signif_max_idps: 5
gwas_pop: 'EUR'
ld_clump_yaml: '{datadir}/mr/ld_clump.yaml'
rscript_exe: 'Rscript'
python_exe: 'python'
plink_exe: 'plink'
The full list of parameters available in a BrainXcan configuration YAML is explained (with default values) at here
Now we are ready to run BrainXcan pipeline. The general command is
# conda activate brainxcan
REPO=path-to-brainxcan-repository
CONFIG=path-to-config-file
snakemake \
-s $REPO/Snakefile \
--configfile $CONFIG \
[rule-name]
[rule-name]
can take the following four options:
rule.name | Association | Visualization | Mendelian.randomization | Automated.report |
---|---|---|---|---|
SBrainXcanOnly | x | |||
SBrainXcanAndVis | x | x | ||
SBrainXcanAndMR | x | x | ||
SBrainXcan | x | x | x | x |
Let’s say we want to run through everything, i.e. replacing
[rule-name]
with SBrainXcan
. So, we can
do:
snakemake \
-s $REPO/Snakefile \
--configfile $CONFIG \
-n -p \
SBrainXcan
We use -n
options in snakemake
to perform a
dry-run to see what will be run (-p
is to print more
information about each step). The MR runs won’t be printed since we’ve
not known which IDPs will be significant but these will be updated
automatically during the computation.
If everything looks good, we could start running the command
(-j1
tells snakemake to use one core).
snakemake \
-s $REPO/Snakefile \
--configfile $CONFIG \
-j1 -p \
SBrainXcan
It takes about 9 minutes with 4 Gb memory. The output files are put
in folder brainxcan_test/
with prefix SCZ
.
The output files of the above run can be downloaded from here. Here is an overview of the files.
brainxcan_test
├── SCZ.report.Rmd
├── SCZ.report.html
├── SCZ.sbrainxcan.csv
├── SCZ.vis.T1.pdf
├── SCZ.vis.dMRI.pdf
└── SCZ_files
├── MR_results
│ ├── dmri_IDP-25687.MR.rds
│ ├── dmri_IDP-25687.MR_sumstats.tsv
│ ├── dmri_IDP-25687.MR_vis.pdf
│ ├── dmri_PC-FA-TBSS-1.MR.rds
│ ├── dmri_PC-FA-TBSS-1.MR_sumstats.tsv
│ ├── dmri_PC-FA-TBSS-1.MR_vis.pdf
│ ├── dmri_PC-ICVF-ProbTrack-1.MR.rds
│ ├── dmri_PC-ICVF-ProbTrack-1.MR_sumstats.tsv
│ ├── dmri_PC-ICVF-ProbTrack-1.MR_vis.pdf
│ ├── dmri_PC-ICVF-TBSS-1.MR.rds
│ ├── dmri_PC-ICVF-TBSS-1.MR_sumstats.tsv
│ ├── dmri_PC-ICVF-TBSS-1.MR_vis.pdf
│ ├── t1_IDP-25020.MR.rds
│ ├── t1_IDP-25020.MR_sumstats.tsv
│ └── t1_IDP-25020.MR_vis.pdf
└── logs
├── dmri_IDP-25687.mr.log
├── dmri_IDP-25687.mr_vis.log
└── [more MR log files]
[prefix].sbrainxcan.csv
: BrainXcan association results
in CSV format.[prefix].vis.T1.pdf
: Visualization of associations of
structural IDPs.[prefix].vis.dMRI.pdf
: Visualization of associations of
diffusion IDPs (only TBSS IDPs).[prefix]_files/MR_results
: Mendelian randomization for
each significant IDPs
*.MR_vis.pdf
: Plotting effect sizes of instrument SNPs
in MR.*.MR_sumstats.tsv
: Results of MR tests summarized in a
table (TSV format).*.MR.rds
: Raw data used for MR test.[prefix].report.html
: The automated HTML report
summarizing the association and MR results (for diffusion IDPs, only
TBSS ones are shown).[prefix]_files/logs/
: Log files of each analysis
module.Let’s take a quick look at
brainxcan_test/SCZ.sbrainxcan.csv
(first two rows). The
descriptions of each columns are at the bottom of the table.
Error in dplyr::bind_rows()
: ! Can’t combine
..1$bhat
..2$bhat
Let’s take a quick look at
brainxcan_test/SCZ_files/MR_results/dmri_PC-FA-TBSS-1.MR_sumstats.tsv
(first two rows). The descriptions of each columns are at the bottom of
the table.
direction | method | nsnp | b | pval | Q | Q_df | Q_pval |
---|---|---|---|---|---|---|---|
IDP -> Phenotype | MR Egger | 48 | 0.105 | 0.462 | 228 | 46 | 6.66e-26 |
IDP -> Phenotype | Weighted median | 48 | 0.0324 | 0.258 | NA | NA | NA |
MR direction | MR method | Number of SNPs used in MR | Slope in MR | P-value in MR | Heterogenity test Q | Heterogenity test degree of freedom | Heterogenity test p-value |
We provide interactive HTMLs showing regions in the visualization:
These HTMLs can be generated along the BrainXcan visualization by
setting bxcan_region_vis=True
in the configuration
(default: False). The output files will be put inside the folder called
[prefix].vis.region_vis/
.
snakemake
The BrainXcan software is essentially a data pipeline containing
multiple steps and intermediate files. It is implemented using
snakemake
Mölder et al.
(2021) and the features provided by snakemake
are
directly applicable here.
For instance, the parameters in the configuration file can be
replaced by --config
, e.g. if we want to change the prefix
of the output, we can do it without changing the config file:
snakemake \
-s $REPO/Snakefile \
--configfile $CONFIG \
-j1 -p \
SBrainXcan \
--config prefix=another_test
More over, if the intermediate files are already present, snakemake
will not re-generate them. In other word, if you run
SBrainXcanOnly
first and then run
SBrainXcanAndVis
, it won’t run BrainXcan association again.
If you want to enforce re-runing, use the flag
--forceall
.
BrainXcan software is written with snakemake v6. For more cool features of snakemake, please refer to snakemake documentation.