In the last couple of weeks we received quite a lot of requests to provide the newly released protein-folding software package AlphaFold and its alternative RoseTTAFold. As these packages are at this point only published as container images or conda environments, respectively, we are working on a cluster-wide installation suitable for an HPC envrionment.

	Container	Native
AlphaFold	AVAIL	WIP
RoseTTAFold	WIP	WIP

Genetic Databases

Can be used for both, AlphaFold and RoseTTAFold, and are located at the following path:

/Applic.HPC/data/alphafold/
|-- bfd
|-- mgnify
|-- params
|-- pdb70
|-- pdb_mmcif
|-- small_bfd
|-- uniclust30
`-- uniref90

The complete database size is around 2.2TB. It takes more than 24h to download and unpack them. Therefore: PLEASE DO NOT DOWNLOAD THESE DATABASES AGAIN!

AlphaFold

Detailed information can be found at: https://github.com/deepmind/alphafold

Container

The execution of Docker containers on the cluster is not allowed due to security reasons. Therefore we provide a container image for Singularity (a containerization software for HPC purposes):

for skylake nodes (normal, gpuv100): /Applic.HPC/container/alphafold_2.0.0.sif
for ivybridge/sandybridge nodes (gputitanrtx, gpu2080): /Applic.HPC/container/alphafold_ivybridge-2.0.0.sif

We created an AlphaFold module, automatically loading Singularity and setting the environment variable $ALPHAFOLD_SIFIMAGE to point to the correct path.

Starting AlphaFold

You can find an example job script of how to run AlphaFold on PALMA below. Before you start, do the following stepts

Create a suitable directory for your calculations on scratch, e.g. /scratch/tmp/$USER/AlphaFold/
Create sub-directories for any locations you additionally want to use inside the container (here we create a results folder as well as a folder for storing the initial fasta file)
- Those directories have to be bind-mounted into the container! (The -B flag in the singularity run command)
Create a Job-Script or use an interactive SLURM session to request resources on the cluster. You should request a minimum of 8 cores and 64GB of memory. GPUs are supported as well.

Adjust the job script for your data! Don't just copy-paste it and expect it to work.

AlphaFold Example Job Script

#!/bin/bash

#SBATCH --partition=gpu2080
#SBATCH --nodes=1
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=24
#SBATCH --mem=170G
#SBATCH --time=12:00:00
#SBATCH --job-name=alphafold


module load AlphaFold/2.0.0-singularity

singularity run \
    --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 \
    -B /Applic.HPC/data/alphafold:/data \
    -B .:/etc \
    -B ./results:/results \
    -B ./fasta:/fasta \
    --pwd /app/alphafold \
    --nv $ALPHAFOLD_SIFIMAGE \
    --fasta_paths /fasta/Chitin-synthase-deacetylase.fasta \
    --output_dir /results/ \
    --max_template_date 2021-07-31 \
    --data_dir /data/ \
    --uniref90_database_path /data/uniref90/uniref90.fasta \
    --mgnify_database_path /data/mgnify/mgy_clusters.fa \
    --small_bfd_database_path /data/small_bfd/bfd-first_non_consensus_sequences.fasta \
    --pdb70_database_path /data/pdb70/pdb70 \
    --template_mmcif_dir /data/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path /data/pdb_mmcif/obsolete.dat \
    --model_names model_1,model_2,model_3,model_4,model_5 \
    --preset reduced_dbs

Bereichsverknüpfungen

Seitenhierarchie

Genetic Databases

AlphaFold

Container

Starting AlphaFold

Bereichsverknüpfungen

Seitenhierarchie

AlphaFold / RoseTTAFold

Genetic Databases

AlphaFold

Container

Starting AlphaFold