DISOPRED, initially published by Jones and Ward in 2003, is a tool that predicts intrinsically disordered regions (IDS) in proteins. The third major release of this software, aka DISOPRED 3, was made public in 2015 (Jones & Cozzetto). This great piece of software does not only operate as a standalone predictor, it is also incorporated as essential part of other useful bioinformatics tools, e.g. ActiveDriver.
According to the author, DISOPRED 3, compared with previous versions, offered a significant increase in the prediction accuracy and efficiency. Although an README.txt is provided, adopting DISOPRED 3 is not a piece of cake because several critical steps are missed out in the original installation guide.
After numerous attempts and failures, with the aid of these two helpful articles (1, 2), I eventually streamlined the process of installing DISOPRED 3 on a typical Mac or Linux workstation. Here I present the complete guide to proper installing and configuring DISOPRED 3.
Please feel free to let me know via response if there is any question.
BLAST and Database
Note: Please make sure you are not installing BLAST+ as it is NOT yet supported by DISOPRED 3.
- Download the BLAST 2.2.26 from NCBI. Make sure you download the correct binary for your operating system.
- Unzip the compressed file to a new folder called blast-2.2.26.
Note: please be patient as formatting will take a while to complete.
- Create a new folder called blastdb to store all resources
- Download the newest Uniref90 (fasta format) from Uniprot
- Unzip the compressed file to a new folder named uniref90
- Format the database using formatdb tool in the BLAST toolkit: ```bash # Change directory to the folder with the unzipped database file cd uniref90 # Change Uniref90.fasta to file name of which you just downloaded formatdb -i uniref90.fasta ```
- Download matrices for blastpgp from NCBI to a new folder named data.
- Create a configuration file called ~/.ncbirc for BLAST.
- Put in the required parameters. Make sure to change your_name to the name of your home directory.
; Start the section for BLAST configuration [BLAST] ; Specifies the path where BLAST databases are installed BLASTDB=/home/your_name/blastdb/uniref90 BLASTMAT=/home/your_name/blastdb/data ; Specifies the data sources to use for automatic resolution ; for sequence identifiers DATA_LOADERS=blastdb [NCBI] data=/home/your_name/blastdb/data
- Download the latest DISOPRED 3 and dso_lib from UCL.
- Unzip these compressed files.
- Replace the DISOPRED/dso_lib with the newest ones.
- If you are ==running macOS==, please modify the source code:
- In src/disord_pred.c, find these lines
#ifdef unix #define CLKRATE 1000000 #endif
and replace with
#ifdef unix #define CLKRATE 1000000 #endif #ifdef __MACH__ #define CLKRATE 1000000 #endif
install: mkdir ../bin/ mv -t ../bin disopred2 diso_neu_net diso_neighb combine svm-predict
and replace with
install: mkdir ../bin/ mv disopred2 diso_neu_net diso_neighb combine svm-predict ../bin/
Install and Test
- Compile the source code.
- Configure the run_disopred.pl file with locations of BLAST and database and number of cores to use.
Make sure to change ==your_name== to the name of your home directory.
- Run the DISOPRED 3 with example data.
cd DISOPRED/src make clean make make install
## IMPORTANT: Set the paths to folder with the NCBI executables and to the ## sequence database my $NCBI_DIR = "/Users/your_name/Desktop/blast-2.2.26/bin/"; my $SEQ_DB = "/Users/your_name/Desktop/blastdb/uniref50.fasta"; ## IMPORTANT: Changing these flags will alter the behaviour of blastpgp ## You may want to use -a n to speed-up the search using n processors, if available my $PSIBLAST_PAR = "-a 1 -b 0 -j 3 -h 0.001";
- Jones, D. T., & Ward, J. J. (2003). Prediction of disordered regions in proteins from position specific score matrices. Proteins: Structure, Function, and Bioinformatics, 53(S6), 573–578.
- Jones, D. T., & Cozzetto, D. (2015). DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics,31(6), 857–863.