Readme for NCBI blast ftp site
Last updated on February 15, 2004
This file lists the subdirectories and files found on the NCBI BLAST
ftp site (ftp://ftp.ncbi.nlm.nih.gov/blast/). It provides the basic
information on file content, and on how the files should be used.
1. Introduction
NCBI BLAST ftp site provides standalone blast, client server blast,
and wwwblast packages for different platforms. It also provides
commonly used blast databases in preformatted as well as FASTA format.
Some documents on the blast executables and other related subjects are
also provided.
2. File list and content
A description of the files are listed in the tables below, one table
for each directory or subdirectory.
2.1 ftp://ftp.ncbi.nlm.nih.gov/blast/ directory content
The blast ftp directory contains several subdirectories each for a
specific set of files.
+------------------+-------------------------------------------------+
|Name |Content |
+------------------+-------------------------------------------------+
blastftp.txt this file
db subdirectory with database, in preformatted or
FASTA form
demo demonstration programs and documents from blast
developers
documents documents for programs in standalone blast,
netblast, and wwwblast programs
executables archives for binary distribution of blast programs
matrices protein and nucleotide score matrices, only a
subset are supported by blast
temp temporary directory for miscellaneous files
+------------------+-------------------------------------------------+
2.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/ subdirectory
Databases larger than two gigabytes (2 GB) are formatted in multiple
volumes, which are named using the “database.##.tar.gz” convention.
All relevant volumes are required. An alias file is provided so that
the database can be called using the alias name without the extension
(.nal or .pal). For example, to call est database, simply use “–d est”
option in the commandline (without the quotes).
Certain databases are subsets of a larger parental database. For those
databases, mask files, rather than actual databases, are provided. The
mask file needs the parent database to function properly. The parent
databases should be generated on the same day as the mask file. For
example, to use swissprot preformatted database, swissprot.tar.gz, one
will need to get the nr.tar.gz with the same date stamp.
To use the preformatted blast database file, first inflate the file
using gzip (unix, linux), WinZip (window), or StuffIt Expander (Mac),
then extract the component files out from the resulting tar file using
tar (unix, linux), WinZip (Window), or StuffIt Expander (Mac). The
resulting files are ready for BLAST.
+---------------------+----------------------------------------------+
|Name |Content |
+---------------------+----------------------------------------------+
FASTA subdirectory with databases in FASTA format
blastdb.txt content list of the blast database
est.00.tar.gz first volume of the est database
est.01.tar.gz second volume of the est database
est.02.tar.gz third volume of the est database
all volumes are needed to reconstitute
complete est database
est_human.tar.gz human est database, a mask file requires both
volumes of est to work
est_mouse.tar.gz mouse est database, a maks file needs both
volumes of est to work
est_others.tar.gz est database without human/mouse entries, a
mask file reqires both volumes of est
gss.tar.gz genomic survery sequence database
htgs.00.tar.gz first volume of the htgs database
htgs.01.tar.gz second volume of the htgs database
htgs.02.tar.gz all volumes are needed to reconstitute
htgs.03.tar.gz complete htgs database
human_genomic.tar.gz human chromosome database containing
concatenated contigs with adjusted gaps
represented by N's
nr.tar.gz non-redundant protein database
nt.00.tar.gz first volume of the nucleotide nr database
nt.01.tar.gz second volume of the nucleotide nr database
nt.02.tar.gz all volumes are needed to reconstitute
complete nt database
other_genomic.tar.gz chromosome database for organisms other than
human
pataa.tar.gz patent protein database
patnt.tar.gz patent nucleotide database
pdbaa.tar.gz protein sequence database for pdb entries. It
is mask file and requires nr.tar.gz
pdbnt.tar.gz nucleotide sequence database for pdb entries.
They are not coding sequences for the
corresponding protein structure entries!
sts.tar.gz sequence tag site database
swissprot.tar.gz swissprot sequence database, last major
release. It is mask file and requires
nr.tar.gz to work properly
taxdb.tar.gz taxonomy id database for use with new version
of blast database (not fully implemented yet)
wgs.00.tar.gz first volume of wgs assembly database
wgs.01.tar.gz second volume of the wgs assembly database.
wgs.02.tar.gz third volume of the wgs assembly database.
wgs.03.tar.gz fourth volume of the wgs assembly database.
wgs.04.tar.gz fifth volume of the wgs assembly database.
wgs.05.tar.gz sixth volume of the wgs assembly database.
all volumes are needed.
+--------------------+-----------------------------------------------+
2.2.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA
subdirectory
he FASTA database files are now stored in this subdirectory, it does
contain some additional databases that are not available via the NCBI
BLAST pages. Due to file size issues, the full est database is not
provided. One needs to get the three subsets and concatenate them
together to get the complete est database.
These databases will need to be formatted using formatdb program found
in the standalone blast executable package. The recommended
commandlines to use are:
formatdb –i input_db –p F –o T for nucleotide
formatdb –i input_db –p T –o T for protein
For additional information on formatdb, please see the formatdb.txt
document under /blast/documents/ directory.
+------------------+--------------------------------------------------+
|Name |Content |
+------------------+--------------------------------------------------+
alu.a.gz proteins translated from alu.n
alu.n.gz alu repeat sequences
drosoph.aa.gz Drosophila protein from genome annotation
drosoph.nt.gz Drosophila genome
ecoli.aa.gz E.coli K-12 proteins from genome annotation
ecoli.nt.gz E.coli K-12 genomic contigs
est_human.gz human subset of the est database
est_mouse.gz mouse subset of the est database
est_others.gz subset of est other than human or mouse entries
gss.gz Genomic Survey Sequences (mostly BAC ends)
htgs.gz High Throughput Genomic Sequences
human_genomic.gz Human chromosomes formed by concatenating genomic
contig assemblies (NT_######) and adjusting the
gaps with N’s
igSeqNt.gz Immunoglobulin nucleotide sequences
igSeqProt.gz Immunoglobulin protein sequences
mito.aa.gz protein from the annotated mitochondrial genomes
mito.nt.gz mitochondrial genomes
month.aa.gz
protein
sequences released or updated in the past 30 days
month.est_human.gz human subset of EST released/updated in the past
30 days
month.est_mouse.gz mosue subset of EST released/updated in the past
30 days
month.est_others.gz EST, wihtout entries from human or mouse, released
or updated in the past 30 days
month.gss.gz gss entries released/updated in the past 30 days
month.htgs.gz htgs entries released/updated in the past 30 days
month.nt.gz subset of nt released/updated in the past 30 days
nr.gz non-redundant protein sequence database
nt.gz nucleotide database from GenBank excluding the
batch division htgs, est, gss,sts, pat divisions,
and wgs entries. Not non-redundant.
other_genomic.gz Chromosome entries other than human
pataa.gz Patent protein sequence database
patnt.gz Patent nucleotide sequence database
pdbaa.gz protein sequences for pdb entries
pdbnt.gz nucleotide entries for pdb entries. They are NOT
the coding sequence forthe corresponding
protein entries
sts.gz Sequence Tag Sites database
swissprot.gz swissprot database, last major release
vector.gz vector sequences from synthetic (syn) division
of GenBank
wgs.gz Whole Genome Shotgun sequence assembly
yeast.aa.gz protein translations from yeast genome annotation
yeast.nt.gz yeast genomic sequence
+------------------+----------------------------------------------------+
2.3 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/demo/ directory
This directory contains some technical presentations from the BLAST
developers along with some demo tools or documentation relevant to BLAST.
+------------------------+-----------------------------------------------+
|Name |Content |
+------------------------+-----------------------------------------------+
README.blast_demo readme for blast_demo package
README.first readme for this directory
README.parse_blast_xml readme for parse_blast_xml package
blast_demo.tar.gz blast_demo package on blast db, blast object,
and reformating blast alignment from
blastobj file
blast_exercises.doc blast exercise questions answers
blast_programming.ppt PowerPoint presentation on BLAST programing
blast_talk.ppt PowerPoint presentation (O'Reilly conference)
ieee_blast.final.ppt PowerPoint presentation (IEEE conference)
ieee_talk.pdf Above IEEE presentation in PDF format
parse_blast_xml.tar.gz demo package on parsing xml styled blast output
splitd.ppt PowerPoint presentation on NCBI BLAST server’s
splitd implementation
test_suite.tar.gz test package
+------------------------+-----------------------------------------------+
2.4 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/documents/ directory
This directory contains copies of the documentation on different BLAST
programs distributed from this ftp site under the /blast/executables/
directory. blast.txt also contains detailed release history.
+------------------------+-----------------------------------------------+
|Name |Content |
+------------------------+-----------------------------------------------+
blast.txt readme for blastall and blastpgp
blastclust.txt readme for blastclust
developer subdirectory with additional documentation
blast_seqalign.txt describing seqalign function
readdb.txt describing readdb function
urlapi.txt a short introduction on BLAST URL API which
supersedes the blasturl
formatdb.txt readme for formatdb program
impala.txt readme for impala
megablast.txt readme for megablast
netblast.txt readme for netblast (blastcl3)
rpsblast.txt readme for rpsblast
xml subdirectory with .dtd and .mod field
description files for blast xml output
xml/NCBI_BlastOutput.dtd dtd file for blast xml output
xml/NCBI_BlastOutput.mod mod file for blast xml output
xml/NCBI_Entity.mod mod file for NCBI xml file
xml/README.blxml readme on blast xml output
+------------------------+-----------------------------------------------+
2.5 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/
directory
This directory contains several subdirectories each for a specific
subsets of executable BLAST programs:
/LATEST-BLAST subdirectory contains the standalone blast binaries from
the latest major versioned release.
/LATEST-NETBLAST sudirectory contains the netblast binaries from the
latest major versioned release.
/LATEST-WWWBLAST subdirectory contains the wwwblast binaries from the
latest major versioned release.
/release different releases, with the last one linked to LATEST
directories
/snapshot subdirectory contains patches or intermediate updates put up in
between major releases. For previous releases, go to release
subdirectory, where the old major releases are archived back to
version 2.0.10.
2.5.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST-BLAST,
/LATEST-NETBLAST, and /LATEST-WWWBLAST subdirectories
All these three subdirectories link to the latest release directory,
which contains the standalone BLAST executables package (blast initialed
archives), blastcl3 client (netblast initialed archives), and server blast
(wwwblast initialed archives).
The standalone archive is needed to set up BLAST locally on user's own
machine. It also provides the tools necessary to prepare custom databases
and retrieve sequences from these prepared databases. Different archives
for commonly used platforms are available.
The blast client archive contains the blastcl3 program which functions by
formulating BLAST search locally first and forwarding the search to NCBI
blast server for process. The search results returned by NCBI BLAST server
is saved to an user-specified file on local computer disk.
The server blast archive contains the web pages with embedded blast search
forms similar to that of NCBI that can process the BLAST search request against
local set of databases and return the result to a browser window. wwwblast
is now in sync with the NCBI toolkit and the two above two packages.
+------------------------------------+-------------------------------+
|Name |Content |
+------------------------------------+-------------------------------+
MD5SUM.txt
blast-2.2.8-alpha-osf1.tar.gz Standalone for COMPAQ/HP alpha
machine (OSF 5.1 and above)
blast-2.2.8-amd64-linux.tar.gz Standalone for AMD 64-bits PC
running linux
blast-2.2.8-ia32-freebsd.tar.gz Standalone for intel Pentium PC
running freeBSD
blast-2.2.8-ia32-linux.tar.gz Standalone for intel Pentium PC
running Linux
blast-2.2.8-ia32-win32.exe Standalone for intel Pentium PC
running Windows
blast-2.2.8-ia64-linux.tar.gz Standalone for intel Itanium PC
running Linux
blast-2.2.8-mips-irix-32-bit.tar.gz Standalone for 32-bits SGI
blast-2.2.8-mips-irix.tar.gz Standalone for 64-bits SGI
blast-2.2.8-powerpc-macosx.tar.gz Standalone for MacOSX (terminal)
blast-2.2.8-sparc-solaris.tar.gz Standalone for Sun Sparc station
running Solaris
netblast-2.2.8-alpha-osf1.tar.gz netblast for COMPAQ/HP alpha
machine (OSF 5.1 and above)
netblast-2.2.8-amd64-linux.tar.gz netblast for AMD 64-bits PC
running Linux
netblast-2.2.8-ia32-freebsd.tar.gz netblast for intel Pentium PC
running freeBSD
netblast-2.2.8-ia32-linux.tar.gz netblast for intel Pentium PC
running Linux
netblast-2.2.8-ia32-win32.exe netblast for for intel Pentium
PC running Windows
netblast-2.2.8-ia64-linux.tar.gz netblast for for intel Itanium PC
running Linux
netblast-2.2.8-mips-irix.tar.gz netblast for SGI 32-bits system
netblast-2.2.8-powerpc-macosx.tar.gz netblast for MacOSX
netblast-2.2.8-sparc-solaris.tar.gz netblast for Sun Sparc station
running Solaris
wwwblast-2.2.8-alpha-osf1.tar.gz wwwblast for COMPAQ/HP alpha
machine (OSF 5.1 and above)
wwwblast-2.2.8-amd64-linux.tar.gz wwwblast for AMD 64-bits PC
running Linux
wwwblast-2.2.8-ia32-freebsd.tar.gz wwwblast for Intel Pentium PC
running Linux
wwwblast-2.2.8-ia32-linux.tar.gz wwwblast for Intel Pentium PC
running Linux
wwwblast-2.2.8-ia64-linux.tar.gz wwwblast for Intel Itanium PC
running Linux
wwwblast-2.2.8-mips-irix.tar.gz wwwblast for SGI 32-bits system
wwwblast-2.2.8-powerpc-macosx.tar.gz wwwblast for MacOSX
wwwblast-2.2.8-sparc-solaris.tar.gz wwwblast for Sun Sparc station
running Solaris
+------------------------------------+-------------------------------+
2.5.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release
subdirectory
This directory contains past major releases of BLAST, as far back as
version 2.0.10. Each release is in its own subdirectory.
2.5.3 File content for ftp.ncbi.nlm.nih.gov/blast/executables/snapshot
subdirectory
This subdirectory contains intermediate enhanced or patched archives
released after the last major release. They are organized according
to the date and only contains the binaries for the affected platforms.
2.5.4 File content for ftp.ncbi.nlm.nih.gov/blast/executables/special
subdirectory
From time to time, we make binaries for some rare platforms under
special circumstances. Those files are archived here.
2.6 File content ftp://ftp.ncbi.nlm.nih.gov/blast/matrices directory
This directory contains the scoring matrices, which are files that can
be used by BLAST alignment assessment. The file are text files with
special format that can be viewed directly by a browser.
For valid statistical analysis, blastn uses only identity matrix and
blastp only supports a limited subset of the BLOSUM and PAM matrices:
BLOSUM 45, 62, 80, plus PAM30 and 70.
2.7 File content of the ftp://ftp.ncbi.nlm.nih.gov/blast/temp
subdirectory
An left-over subdirectory of miscellaneous files or tools.
3. Techinical Support
Additional questions/comments on this ftp site should be directed to
NCBI blast-help group at:
blast-help@ncbi.nlm.nih.gov
Other questions on general NCBI resources should be directed to:
info@ncbi.nlm.nih.gov