Wednesday, July 26, 2017

On Very Large Databases


The American Society for Biochemistry and Molecular Biology has this prescription for a periodic table of proteins, organizing protein complexes based on simple rules, tens of thousands of protein complexes each with their own 3-d structures, let us recall the hypothetical smell network of all possible smells as they occur to all people – the Lingua Anosmia.

There is a strong connection between olfaction and the growing databases of bioinformatics, because smells are organic entities themselves.

There is another database I envy, the human metabolome. It contains 40,000 entries, all the metabolites that exist within and among the human body. This one has even closer affinity with olfaction, because lots of metabolites smell; and if they don't smell, they are the molecules that eventually separate and combine to make something that does smell. Knowing the relationships among the molecules associated with smelly activity can help to organize the resulting smells of said metabolic activity. Your body odor does not come from your body - unless we consider our microbiome to be part of our body. Molecules that exit your body via sweat are deposited on the skin, a buffet plate for the colonies of bacteria that live with us. They eat your sweat and shit the body odor that you tend to consider yours. The smell of the beach is a secondary metabolite of seaweed, which means the same thing – sea bacteria eat the waste, or the metabolites, of seaweed.

Yes, that beautiful, intoxicating, deep and alluring scent of the seashore is to the ocean what body odor is to our bodies.

In conclusion, metabolites, and many things biological, and in their new supersized databasable format, are a step closer to the realization of the hypothetical smell network, the Lingua Anosmia.

I’d like to ask the driven and capable reader to hook-up this human metabolome with some smell data; I’d love to see it. Had I the time and expertise, I'd like to hook it up myself, but alas; it's on my list.

“We’re bringing a lot of order into the messy world of protein complexes”
-Sebastian Ahnert

Long form description of the Human Metabolomic Database:
The database is designed to contain or link three kinds of data: 1) chemical data, 2) clinical data, and 3) molecular biology/biochemistry data. The database contains 41,993 metabolite entries including both water-soluble and lipid soluble metabolites as well as metabolites that would be regarded as either abundant (> 1 uM) or relatively rare (< 1 nM). Additionally, 5,701 protein sequences are linked to these metabolite entries. Each MetaboCard entry contains more than 110 data fields with 2/3 of the information being devoted to chemical/clinical data and the other 1/3 devoted to enzymatic or biochemical data. Many data fields are hyperlinked to other databases (KEGG, PubChem, MetaCyc, ChEBI, PDB, UniProt, and GenBank) and a variety of structure and pathway viewing applets. The HMDB database supports extensive text, sequence, chemical structure and relational query searches. Four additional databases, DrugBank, T3DB, SMPDB andFooDB are also part of the HMDB suite of databases. DrugBank contains equivalent information on ~1600 drug and drug metabolites, T3DB contains information on ~3600 common toxins and environmental pollutants, SMPDB contains pathway diagrams for ~700 human metabolic and disease pathways, whileFooDB contains equivalent information on ~28,000 food components and food additives.

Citing the Human Metabolome Database:
1. Wishart DS, Tzur D, Knox C, et al., HMDB: the Human Metabolome Database. Nucleic Acids Res. 2007 Jan;35(Database issue):D521-6. 17202168
2. Wishart DS, Knox C, Guo AC, et al., HMDB: a knowledgebase for the human metabolome.Nucleic Acids Res. 2009 37(Database issue):D603-610. 18953024

3. Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, et al., HMDB 3.0 — The Human Metabolome Database in 2013. Nucleic Acids Res. 2013. Jan 1;41(D1):D801-7. 23161693

1 comment: