Thursday, January 23, 2020

The Dream of Olfaction Prediction



You might want to take this post in doses, because it's a mouthful. I tried to help by adding some totally unrelated but beautiful images from Richard Pousette-Dart, a founder of the New York School of art.

I've been pushing this off for years now, waiting for my schedule to allow me to dive in and give it the respect it deserves.

We're looking at the DREAM challenge, a science and technology research consortium that set their sights on olfactory perception a couple years ago. *Dialogue on Reverse Engineering Assessment and Methods (DREAM).

Forever, olfaction has been an unruly member of the human sensory suite, refusing to offer any insight into how we perceptually organize odors. Colors have a spectrum and sounds have frequencies, but smells are simply un-organizable.

I'll take a portion of the abstract from the winning team, because they've written a concise, comprehensive explanation of the problem of olfactory recognition:
The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction.
 Some structurally similar compounds display distinct odor profiles, whereas some dissimilar molecules exhibit almost the same smell. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules.

But, some recent advancements in the field have made it worth trying again. Number one is the Dragon software. It's a database of chemicals big enough to be worthy of the Big Data era. Each of its hundreds of odorous chemicals has thousands of features like functional group, boiling point, etc. It's a lot easier to find patterns in the chemicals when you have this much correlating data.

The number two development is a new set of odor words. Just about all olfactory perception science since the 1980's has been using one specific set of odor/names, called the Dravnieks set. Some use the Arctander set, but the Dravnieks has ASTM behind it, so it's usually the main one. The thing is, it's now almost 40 years old. And that means a lot when it comes to smells, because the language of smell is a very dynamic thing.

I'll give a quick example. The first commercial toothpaste ever invented, Pepsodent, was called "minty," but you know what it was made with? Sasparilla, like Root Beer. Who knew "root beer" and "minty" were the same thing? They were at that time and in that place. And that's how smells work. The language we use to talk about smells is not so much related to the molecules themselves but to our experiences with them.

You know how baggy pants are popular sometimes (1995), and then later on (2015) they make you look homeless? That's similar to the way our odor lexicon changes. The words themselves are just as fashionable and ephemeral as the fragrance market itself. From the authors of the new study: "Another problem with verbal descriptors is that they are culturally biased. The current standard set of 146 Dravnieks descriptors was developed in the United States in the mid-1980's and is increasingly semantically and culturally obsolete." (Keller 2016 below)

Also, let's not forget that the entire Oceanic/Ozonic/Marine class of fragrance aromas (Cool Water, Acqua Di Gio) didn't exist until the chemical Calone was discovered by a pharmaceutical company researching benzodiazepine derivatives for anti-depression meds circa 1990.

So finally, a bunch of vigilant olfactory enthusiasts got together and generated a killer dataset for smellable molecules and the words we use to describe them (Keller et al 2016). This new set leaves Dravnieks in the dust. It's got 480 molecules tested on 55 subjects. Dravnieks had 146 smell-word combinations and the subjects were all American/Western European. It's important to get the subjects to be as diverse as possible, because whether it's cultural or genetic, we all smell things different from each other and we all use different words to describe those sensations. Pigeon-holing your demographic yields a pretty distorted dataset.

Other ways they out-did the Dravnieks dataset: they use odorless compounds (like water), they included molecules with unfamiliar smells, they include familiarity ratings (we'll see why this is important later), and they extract both population average data AND individual reporting data.

Summary: updated datasets, both on the chemical-feature side and on the odor-descriptor side. Now for the DREAM Challenge itself. This is where crowdsourcing, which I guess is now just another word for "competition," narrows down the best approach to tying together categories of chemical features and the words we use to describe the way they smell.

 

Let's start with a basic pair, to get an idea. Sulfur smells like rotten eggs. Simple, right? If it's got a sulfur molecule, it probably smells 1. bad and 2. like rotten eggs.

Actually, according to this new dataset, it's related more to "garlic"-smell than anything else. ... but that's because "garlic" was one of the pre-determined descriptors that the particpants were allowed to choose from; "rotten egg" was not on that list.

Let's get a bit more complicated. Below I'll give bulleted summaries of the three steps: First is the Challenge itself, then is the new and improved dataset used in the challenge, and finally is the winner of the challenge.

And for the record, I'd really like to see this kind of work done using not just the Dragon database of chemoinformatics, but with the almighty Human Metabolome Database which contains 40,000 entries of all the metabolites that exist within and among the human body. Because that would be interesting to see.


The DREAM Olfaction Prediction Challenge
This challenge aims to develop the most comprehensive computational approach to date to predict olfactory perception based on the physical features of the stimuli.

Teams developed machine learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule.

Predicting human olfactory perception from chemical features of odor molecules. Keller A, Gerkin RC, Guan Y, Dhurandhar A, Turu G, Szalai B, Mainland JD, Ihara Y, Yu CW, Wolfinger R, Vens C, Schietgat L, De Grave K, Norel R, DREAM Olfaction Prediction Consortium., Stolovitzky G, Cecchi GA, Vosshall LB, Meyer P. Science. 2017 Feb 24; 355(6327):820-826.
  


The Dataset Used in the DREAM Challenge
[aka the Rockefeller University Smell Study]
[aka The New Dravnieks]

Their dataset captured the sensory perception of 480 different molecules (249 cyclic molecules, 52 organosulfur molecules, 165 ester molecules) each with 4884 corresponding chemical features, at two different concentrations, experienced by 55 demographically diverse healthy human subjects (really 49 because some were removed). Subjects rated intensity (0-100), pleasantness (0-100), familiarity (did they rate familiarity?), and were asked to apply 20 pre-determined semantic odor quality descriptors to these stimuli, and were offered the option to describe the smell in their own words.

Pre-determined semantic attributes: bakery, sweet, fruit, fish, garlic, spices, cold, sour, burnt, acid, warm, musky, sweaty, ammonia/urinous, decayed, wood, grass, flower, and chemical.

Findings in General

·      Familiarity had a strong effect on the ability of subjects to describe a smell.
·      Many subjects used commercial products to describe familiar odorants, highlighting the role of prior experience in verbal reports of olfactory perception.
·      Nonspecific descriptors like "chemical" were applied frequently to unfamiliar odorants.
·      Unfamiliar odorants were generally rated as neither pleasant nor unpleasant.
·      Many molecules had unfamiliar smells: of the stimuli that subjects could perceive, 70% were rated as unknown and were given low familiarity ratings.
·      Highlights the dominant role of familiarity and experience in assigning verbal descriptors to odorants.

Findings Specific

·      Compounds that contain sulfur or nitrogen (amines) are probably unpleasant
·      Compounds that contain oxygen are probably pleasant
·      If it's got sulfur atoms, there's a good chance someone will call choose "garlic" from the list of descriptors (note "rotten eggs" is not on that list)
·      The number of sulfur atoms in a molecule was correlated with the odor quality descriptors "garlic" "fish" and "decayed"
·      Large and structurally complex molecules were perceived to be more pleasant.
·      Vanillin (and ethyl vanillin) was the most likely to record as pleasant
·      Vanillin likely to be called “edible”, “bakery”, “sweet”
·      Vanillin acetate was rated the “warmest” stimulus
·      (−)-Carvone and various esters were the rest of the pleasant odors
·      Methyl thiobutyrate was the least pleasant, also the most intense
·      Methyl thiobutyrate most likely to be called "Decayed"
·      Isovaleric acid received the highest rating for both “musky” and “sweaty”
·      Others of the least pleasant compounds were sulfur-containing (4 in total) and carboxylic acids (4 in total)
·      Benzenethiol and 3-pentanone and Androstadienone most variable intensity perception
·      The most commonly used descriptor was “chemical”
·      The least frequently used descriptor was “fish” 
·      "Chemical" was used most often for unfamiliar odors
·      "Edible" was used most often for familiar odor
·      Words least likely to be used for the same compound (negatively correlated) were:
o   edible/chemical
o   sweet/musky
o   sweet/sweaty
·      When describing in their own words, participants used often:
o   “sweet”
o   “burnt”
o   “grass”
o   “candy”
o   “vanilla”
·      Women used their own words more than men
·      Commercial names, trade names (like Vicks Vapo-Rub) were used a lot.
·      In concernt w Dravnieks, the most representative descriptor/molecule pairs:
o   “garlic”            (diethyl disulfide)
o   “flower”          (2-phenylethanol)
o   “decayed”       (methyl thiobutyrate)
o   “sweaty”         (isovaleric acid)
o   “spicy”             (eugenol)


Special Note 1

"Only descriptors with an unambiguous reference odorant can be predicted based on molecular features." (For example, garlic means something pretty specific, but chemical is as ambiguous as it gets.)

The winners of the competition in their own paper mentioned this: "The large differences may result from the relative ambiguity of the word “warm” to describe odor." (Hongyang et al 2018)

This is one of the most important conclusions to come out of this study, because it shows us how olfaction and language really work together. You can't name smells you've never smelled before. And you need very specific references to develop a useful lexicon. This has a lot to do with why commercial products are used in these cases (like in the 2016 World CoffeeResearch Sensory Lexicon). It's better to say McDonald's Chicken McNuggets or Vick's Vapo Rub or Hasbro's Play Doh because they are highly controlled substances (in terms of quality not illegality!) and so they are exactly the same every time.

It also suggests that any universal odor lexicon needs to have an ambiguity rating next to each word.

(FYI: Play-Doh is one of the only branded scents, ever, because you can't have copyright protection for smells, and the brand Mama Celeste's microwave pizza is the World Coffee reference standard for "Cardboard" aroma, poor Mama!)


Special Note 2

This last one is great, for me at least, because it echoes many ideas already posted on this weblog. Here, taken from the authors:
However, we also found marked differences in how descriptors were used by our untrained subjects and experts. For example, subjects used “musky” to describe unpleasant body odors. In contrast, experts use “musky” to describe compounds naturally sourced from animal glands or their synthetic analogues. These are often used as base notes in perfumery, and experts associate musks with pleasant descriptors such as “sweet,” “powdery,” and “creamy.” However for our subjects, “musky” had a negative correlation with pleasantness, and was instead correlated with the descriptor “sweaty.”
  
"The molecule rated as most “musky” in this study was isovaleric acid, which experts do not rate as “musky” (Dravnieks). The five molecules that Dravnieks lists as representative of the “musk” descriptor are also rated “fragrant” and “perfumery” by experts (Dravnieks).
  
Therefore, the word “musky” has a colloquial meaning that is different from its technical meaning in perfumery.

Olfactory perception of chemically diverse molecules. Keller A, Vosshall LB. BMC Neurosci. 2016 Aug 8; 17(1):55.

Here is a link to my previous post on the topic, from 2017:


DREAM Challenge Winners

The winners were from the University of Michigan and used a random forest-type machine learning algorithm. It won 1st place for predicting individual responses and 2nd place for predicting population responses.

Right off the bat, one of the important things they do is to combine the (stable) population average with the (highly varied) individual responses. This is a big deal because there is so much variety to individual responses, as explained above, because of either culture or genetics. We all smell things differently, and we all use different words to refer to those smells. And that difference is large enough to make the data messy as heck. So this winning team introduced a weighted value, alpha, to balance the two, and it works like this:

"When α equals 0, only population ratings are considered. Conversely, when α equals 1, only individual ratings are used (see the “Methods”). Surprisingly, a small α = 0.2 achieves the largest Pearson's correlation coefficient (Fig. ​(Fig.3B).3B). Without population information (α = 1.0), the correlation of predicting the 19 semantic descriptors is the lowest. This reveals that population perceptions play a crucial role when individual responses display large fluctuations."

And this is a great improvement, because cultural influence / cultural conditioning is so influential on our own subjective perception (see Greta Garbo and the Vermeer forgeries).

The next thing to note about the work of the winning team's algorithm is that it performs just like you would expect it to in that the results seem to make more sense to a machine than a person.

For anyone familiar with recent examples of machine learning, you hear a lot of 1. it's like a black box and we can't see what it's doing to make its decisions, or 2. it's like the adversarial image hack where they do what looks like absolutely nothing to the image, and yet the network reads it as something wildly different than what it is.

In this case, the algorithm found the most "obvious" patterns in chemical features did not correspond to things we already know about chemicals. Sure, sulfur atoms correlate to bad smells, but the 2nd and 3rd most correlated features had nothing to do with features we would associate with odor.



I guess one of the main reasons for this disconnect is the idea of degenaracy, which is a word that refers to the fact that so many molecules actually have identical or similar values for simple features. So if the point of the algorithm is to predict a smell based on chemical information alone, but then you have a feature that belongs to more than one smell group, then it sure won't help you to predict which smell it's going to be from the chemoinformatics. So let's say a chemical has an oxygen molecule. Well lots of different-smelling chemicals have oxygen in them, so we can't use that simple feature as a way to organize.

After all this work, it should be noted this one point: the top 5 features achieve similar performance as random forest with all 4884 features for almost all olfactory qualities (with the exception of “intensity,” for which the top 15 features are adequate).

Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features. Hongyang Li, Bharat Panwar, Gilbert S Omenn, and Yuanfang Guan. Gigascience. 2018 Feb; 7(2): 1–11.
  
Notes:
Dravnieks A. Atlas of odor character profiles. Philadelphia: ASTM; 1985.
Arctander S. Perfume and flavor chemicals (aroma chemicals). Montclair, NJ: Author; 1969.
Keller A, Vosshall LB. Olfactory perception of chemically diverse molecules. BMC Neurosci. 2016 Aug 8; 17(1):55. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4977894/

Here is a link to my original post on this DREAM Challenge from 2017

No comments:

Post a Comment