File Name  ↓ File Size  ↓ Date  ↓ 
Parent directory/--
2017_01/-15-Feb-2017 16:00
2017_05/-25-May-2017 12:16
2018_01/-11-Jan-2018 18:15
current_release/-11-Jan-2018 18:15
LICENSE 20K15-Feb-2017 15:55

Metaclust Downloads

Bioinformatic Methods

Metaclust was created by clustering and assembleing 1.59 billion protein sequence fragments predicted by Prodigal in ~2200 metagenomic and metatranscriptomic datasets.

We offer two fasta files, (1) a non-redundant (nr) where we eliminated subfragments which could be aligned to a longer sequence with 99% of their residues and a sequence identity of 95% and (2) a version clustered to 50% sequence identity at 90% converage.

We clustered the data using Linclust. Linclust has been integrated into our free GPLv3-licenced MMseqs2 software suite. The source code and binaries for Linclust can be download on Github.

File format

Each file contains the representative sequences of every cluster in FASTA format. Each FASTA entry has an unique numeric identifier.