The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation

Giuseppe Profiti; Pier Luigi Martelli; Rita Casadio

doi:10.1093/nar/gkx330

The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation

Nucleic Acids Res. 2017 Jul 3;45(W1):W285-W290. doi: 10.1093/nar/gkx330.

Authors

Giuseppe Profiti¹, Pier Luigi Martelli¹, Rita Casadio¹

Affiliation

¹ Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy.

Abstract

BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Internet
Molecular Sequence Annotation*
Proteins / chemistry
Proteins / physiology
Sequence Analysis, Protein*
Software*

Substances

Proteins