In the last couple of weeks we received quite a lot of requests to provide the newly released protein-folding software package AlphaFold and its alternative RoseTTAFold. As these packages are at this point only published as container images or conda environments, respectively, we are working on a cluster-wide installation suitable for an HPC envrionment.

Yellow

Container

Native

AlphaFold

Status

colour	Green
title	AVAIL

Status


colour	Green
title	AVAIL

Alphafold 2.1.1 (Multimer)

Status


colour	GreenYellow
title	WIPAVAIL

RoseTTAFold

Status

colour	Yellow
title	WIP

Status

colour

Green
title	AVAIL

WIP

Genetic Databases

Version	skylake (gpuv100)	zen3 (gpu2080, gputitanrtx, gpu3090, gpuv100, gpuhgx )
2.0.0
2.1.1
2.1.2		module load palma/2021a module load foss/2021a module load AlphaFold/2.1.2

AlphaFold

Detailed information can be found at: https://github.com/deepmind/alphafold

Genetic Databases

Alphafold and RoseTTAFold are using distinct data bases optimized for the corresponding algorithms. The Alphafold database can be found hereCan be used for both, AlphaFold and RoseTTAFold, and are located at the following path:

Codeblock

language	bash
theme	Midnight

/Applic.HPC/data/alphafold/ 
|-- bfd
|-- mgnify
|-- params
|-- pdb70
|-- pdb_mmcif
|-- pdb_seqres
|-- small_bfd
|-- uniclust30
|-- uniprot
`-- uniref90

The complete database size is around 2.2TB 5TB. It takes more than 24h 50h to download and unpack them. Therefore: PLEASE DO NOT DOWNLOAD THESE DATABASES AGAIN!

Native

Interactive session

Alphafold has been updated to the latest version 2.1.1 including the multimer feature and compiled for the skylake-GPU as well as Zen3 nodes.

Before you start, do the following steps

Create a suitable directory for your calculations on scratch, e.g. /scratch/tmp/$USER/AlphaFold/
Create sub-directories for any locations you additionally want to use inside the container (here we create a results folder as well as a folder for storing the initial fasta file)

For an interactive session on the GPGPU Node the Alphafold module can be loaded:

Codeblock

language	bash
theme	Midnight

module load palma/2020b
module load fosscuda
module load AlphaFold

...

/2.1.1

For executing Alphafold(2.1.1) you need to create a folder in your scratch directory and copy your sequence file such as fasta into it.

Submission to the batch system

For submission to the batch system, the following Script can be adapted:

Hinweis
Adjust the job script for your data! Don't just copy-paste it and expect it to work.

Codeblock

language	bash
theme	Midnight

#!/bin/bash
#SBATCH --partition=gpuv100
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --gpus=1
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=60G
#SBATCH --time=1-23:59:00
#SBATCH --job-name=alphafold
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

module load palma/2021a
module load foss
module load ml AlphaFold/2.1.1-CUDA-11.3.1
wait 
export ALPHAFOLD_DATA_DIR=/Applic.HPC/data/alphafold  
alphafold \
    --fasta_paths=Input_path \
    --model_preset=multimer \ 								#Default is Monomer
    --output_dir=/scratch/tmp/$USER/Alphafold/Results \
    --max_template_date=2021-11-25 \
    --is_prokaryote_list=false \
    --db_preset=reduced_dbs \
    --data_dir=/Applic.HPC/data/alphafold \

Container

The execution of Docker containers on the cluster is not allowed due to security reasons. Therefore we provide a container image for Singularity (a containerization software for HPC purposes):

...

We created an AlphaFold module, automatically loading Singularity and setting the environment variable $ALPHAFOLD_SIFIMAGE to point to the correct path.

Starting AlphaFold

You can find an example job script of how to run AlphaFold on PALMA below. Before you start, do the following stepts

...

Bereichsverknüpfungen

Seitenhierarchie

Versionen im Vergleich

Alte Version 17

Neue Version Aktuell

Schlüssel

Genetic Databases

AlphaFold

Genetic Databases

Native

Interactive session

Submission to the batch system

Container

Starting AlphaFold

Bereichsverknüpfungen

Seitenhierarchie

Seitenhistorie

Versionen im Vergleich

Alte Version 17

Neue Version Aktuell

Schlüssel

Genetic Databases

AlphaFold

Genetic Databases

Native

Interactive session

Submission to the batch system

Container

Starting AlphaFold