Data Mining Approaches to Identify Biomineralization Related Sequences

Françoise Immel,Frédéric Marin

Data Mining Approaches to Identify Biomineralization Related Sequences

2016

Proteomics is an efficient high throughput technique developed to identify proteins from a crude extract using sequence homology. Advances in Next Generation Sequencing (NGS) have led to increase knowledge of several non-model species. In the field of calcium carbonate biomineralization, the paucity of available sequences (such as the ones of mollusc shells) is still a bottleneck in most proteomic studies. Indeed, this technique needs proteins databases to find homology. The aim of this study was to perform different data mining approaches in order to identify novel shell proteins. To this end, we disposed of several publicly non-model molluscs databases. Previously identified molluscan shell matrix sequences were used as keyword to query annotated databases. BLAST tools and KASS program (KEGG Automatic Annotation Server) were developed to analyse other non-annotated databases. Our results suggest that the efficiency of these methods depends on the quality of the shared data. Finally, an in-house shell matrix protein database has been generated.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations