You might want to take this post in doses, because
it's a mouthful. I tried to help by adding some totally unrelated but beautiful
images from Richard Pousette-Dart, a founder of the New York School of art.
I've been pushing this off for years now, waiting for my
schedule to allow me to dive in and give it the respect it deserves.
We're looking at the DREAM challenge, a science and
technology research consortium that set their sights on olfactory perception a
couple years ago. *Dialogue on Reverse Engineering Assessment and Methods
Forever, olfaction has been an unruly member of the human
sensory suite, refusing to offer any insight into how we perceptually organize
odors. Colors have a spectrum and sounds have frequencies, but smells are
I'll take a portion of the abstract from the winning team,
because they've written a concise, comprehensive explanation of the
problem of olfactory recognition:
The olfactory stimulus-percept problem has been studied for
more than a century, yet it is still hard to precisely predict the odor given
the large-scale chemoinformatic features of an odorant molecule. A major
challenge is that the perceived qualities vary greatly among individuals due to
different genetic and cultural backgrounds. Moreover, the combinatorial
interactions between multiple odorant receptors and diverse molecules
significantly complicate the olfaction prediction.
Some structurally similar compounds display distinct odor
profiles, whereas some dissimilar molecules exhibit almost the same smell. Many
attempts have been made to establish structure-odor relationships for intensity
and pleasantness, but no models are available to predict the personalized
multi-odor attributes of molecules.
But, some recent advancements in the field have made it
worth trying again. Number one is the Dragon software. It's a database of
chemicals big enough to be worthy of the Big Data era. Each of its hundreds of
odorous chemicals has thousands of features like functional group, boiling
point, etc. It's a lot easier to find patterns in the chemicals when you have
this much correlating data.
The number two development is a new set of odor words. Just
about all olfactory perception science since the 1980's has been using one
specific set of odor/names, called the Dravnieks set. Some use the Arctander
set, but the Dravnieks has ASTM behind it, so it's usually the main one.
The thing is, it's now almost 40 years old. And that means a lot when it comes
to smells, because the language of smell is a very dynamic thing.
I'll give a quick example. The first commercial toothpaste
ever invented, Pepsodent, was called "minty," but you know what it
was made with? Sasparilla, like Root Beer. Who knew "root beer" and
"minty" were the same thing? They were at that time and in that
place. And that's how smells work. The language we use to talk about smells is
not so much related to the molecules themselves but to our experiences with
You know how baggy pants are popular sometimes (1995), and
then later on (2015) they make you look homeless? That's similar to the way our
odor lexicon changes. The words themselves are just as fashionable and
ephemeral as the fragrance market itself. From the authors of the new study:
"Another problem with verbal descriptors is that they are culturally
biased. The current standard set of 146 Dravnieks descriptors was developed in
the United States in the mid-1980's and is increasingly semantically and
culturally obsolete." (Keller 2016 below)
Also, let's not forget that the entire Oceanic/Ozonic/Marine
class of fragrance aromas (Cool Water, Acqua Di Gio) didn't exist until the
was discovered by a pharmaceutical company researching
benzodiazepine derivatives for anti-depression meds circa 1990.
So finally, a bunch of vigilant olfactory enthusiasts got
together and generated a killer dataset for smellable molecules and the words
we use to describe them (Keller et al 2016). This new set leaves Dravnieks in
the dust. It's got 480 molecules tested on 55 subjects. Dravnieks had 146
smell-word combinations and the subjects were all American/Western European.
It's important to get the subjects to be as diverse as possible, because
whether it's cultural or genetic, we all smell things different from each other
and we all use different words to describe those sensations. Pigeon-holing your
demographic yields a pretty distorted dataset.
Other ways they out-did the Dravnieks dataset: they use
odorless compounds (like water), they included molecules with unfamiliar
smells, they include familiarity ratings (we'll see why this is important
later), and they extract both population average data AND individual reporting
Summary: updated datasets, both on the chemical-feature side
and on the odor-descriptor side. Now for the DREAM Challenge itself. This is
where crowdsourcing, which I guess is now just another word for
"competition," narrows down the best approach to tying together
categories of chemical features and the words we use to describe the way they
Let's start with a basic pair, to get an idea. Sulfur smells
like rotten eggs. Simple, right? If it's got a sulfur molecule, it probably
smells 1. bad and 2. like rotten eggs.
Actually, according to this new dataset, it's related more
to "garlic"-smell than anything else. ... but that's because
"garlic" was one of the pre-determined descriptors that the
particpants were allowed to choose from; "rotten egg" was not on that
Let's get a bit more complicated. Below I'll give bulleted
summaries of the three steps: First is the Challenge itself, then is the new
and improved dataset used in the challenge, and finally is the winner of the
And for the record, I'd really like to see this kind of work
done using not just the Dragon database of chemoinformatics, but with the
almighty Human Metabolome Database
contains 40,000 entries of all the metabolites that exist within and among the
human body. Because that would be interesting to see.
The DREAM Olfaction Prediction Challenge
This challenge aims to develop the most comprehensive
computational approach to date to predict olfactory perception based on the
physical features of the stimuli.
Teams developed machine learning algorithms to predict
sensory attributes of molecules based on their chemoinformatic features to
predict the perceptual qualities of virtually any molecule with high accuracy
and also reverse-engineer the smell of a molecule.
Predicting human olfactory
perception from chemical features of odor molecules. Keller A, Gerkin RC, Guan
Y, Dhurandhar A, Turu G, Szalai B, Mainland JD, Ihara Y, Yu CW, Wolfinger R,
Vens C, Schietgat L, De Grave K, Norel R, DREAM Olfaction Prediction
Consortium., Stolovitzky G, Cecchi GA, Vosshall LB, Meyer P. Science. 2017 Feb
The Dataset Used in the DREAM Challenge
[aka the Rockefeller University Smell Study]
Their dataset captured the sensory perception of 480
different molecules (249 cyclic molecules, 52 organosulfur molecules, 165 ester
molecules) each with 4884 corresponding chemical features, at two different
concentrations, experienced by 55 demographically diverse healthy human
subjects (really 49 because some were removed). Subjects rated intensity
(0-100), pleasantness (0-100), familiarity (did they rate familiarity?), and
were asked to apply 20 pre-determined semantic odor quality descriptors to
these stimuli, and were offered the option to describe the smell in their own
Pre-determined semantic attributes: bakery, sweet, fruit,
fish, garlic, spices, cold, sour, burnt, acid, warm, musky, sweaty,
ammonia/urinous, decayed, wood, grass, flower, and chemical.
Familiarity had a strong effect on the ability
of subjects to describe a smell.
Many subjects used commercial products to
describe familiar odorants, highlighting the role of prior experience in verbal
reports of olfactory perception.
Nonspecific descriptors like
"chemical" were applied frequently to unfamiliar odorants.
Unfamiliar odorants were generally rated as
neither pleasant nor unpleasant.
Many molecules had unfamiliar smells: of the
stimuli that subjects could perceive, 70% were rated as unknown and were given
low familiarity ratings.
the dominant role of familiarity and experience in assigning verbal descriptors
Compounds that contain sulfur or nitrogen
(amines) are probably unpleasant
Compounds that contain oxygen are probably
If it's got sulfur atoms, there's a good chance
someone will call choose "garlic" from the list of descriptors (note
"rotten eggs" is not on that list)
The number of sulfur atoms in a molecule was
correlated with the odor quality descriptors "garlic"
"fish" and "decayed"
Large and structurally complex molecules were
perceived to be more pleasant.
Vanillin (and ethyl vanillin) was the most
likely to record as pleasant
Vanillin likely to be called “edible”, “bakery”,
Vanillin acetate was rated the “warmest”
(−)-Carvone and various esters were the rest of
the pleasant odors
Methyl thiobutyrate was the least pleasant, also
the most intense
Methyl thiobutyrate most likely to be called
Isovaleric acid received the highest rating for
both “musky” and “sweaty”
Others of the least pleasant compounds were
sulfur-containing (4 in total) and carboxylic acids (4 in total)
Benzenethiol and 3-pentanone and Androstadienone
most variable intensity perception
The most commonly used descriptor was “chemical”
The least frequently used descriptor was
"Chemical" was used most often for
"Edible" was used most often for
Words least likely to be used for the same
compound (negatively correlated) were:
When describing in their own words, participants
Women used their own words more than men
Commercial names, trade names (like Vicks
Vapo-Rub) were used a lot.
In concernt w Dravnieks, the most representative
"Only descriptors with an unambiguous reference odorant
can be predicted based on molecular features." (For example, garlic means
something pretty specific, but chemical is as ambiguous as it gets.)
The winners of the competition in their own paper mentioned
this: "The large differences may result from the relative ambiguity of the
word “warm” to describe odor." (Hongyang et al 2018)
This is one of the most important conclusions to come out of
this study, because it shows us how olfaction and language really work
together. You can't name smells you've never smelled before. And you need very
specific references to develop a useful lexicon. This has a lot to do with why
commercial products are used in these cases (like in the 2016 World CoffeeResearch Sensory Lexicon
). It's better to say McDonald's Chicken McNuggets or
Vick's Vapo Rub or Hasbro's Play Doh because they are highly controlled
substances (in terms of quality not illegality!) and so they are exactly the
same every time.
It also suggests that any universal odor lexicon needs to
have an ambiguity rating next to each word.
(FYI: Play-Doh is one of the only branded scents, ever,
because you can't have copyright protection for smells, and the brand Mama
Celeste's microwave pizza is the World Coffee reference standard for
"Cardboard" aroma, poor Mama!)
This last one is great, for me at least, because it echoes
many ideas already posted on this weblog. Here, taken from the authors:
However, we also found marked differences in how descriptors were used by our
untrained subjects and experts. For example, subjects used “musky” to describe
unpleasant body odors. In contrast, experts use “musky” to describe compounds
naturally sourced from animal glands or their synthetic analogues. These are
often used as base notes in perfumery, and experts associate musks with
pleasant descriptors such as “sweet,” “powdery,” and “creamy.” However for our
subjects, “musky” had a negative correlation with pleasantness, and was instead
correlated with the descriptor “sweaty.”
"The molecule rated as most “musky” in this study was
isovaleric acid, which experts do not rate as “musky” (Dravnieks). The five
molecules that Dravnieks lists as representative of the “musk” descriptor are
also rated “fragrant” and “perfumery” by experts (Dravnieks).
Therefore, the word “musky” has a colloquial meaning that is
different from its technical meaning in perfumery.
Here is a link to my previous
post on the topic, from 2017:
The winners were from the University of Michigan and used a
random forest-type machine learning algorithm. It won 1st place for predicting
individual responses and 2nd place for predicting population responses.
Right off the bat, one of the important things they do is to
combine the (stable) population average with the (highly varied) individual
responses. This is a big deal because there is so much variety to individual
responses, as explained above, because of either culture or genetics. We all
smell things differently, and we all use different words to refer to those
smells. And that difference is large enough to make the data messy as heck. So
this winning team introduced a weighted value, alpha, to balance the two, and
it works like this:
"When α equals 0, only population ratings are
considered. Conversely, when α equals 1, only individual ratings are used (see
the “Methods”). Surprisingly, a small α = 0.2 achieves the largest Pearson's
correlation coefficient (Fig. (Fig.3B).3B). Without population information (α
= 1.0), the correlation of predicting the 19 semantic descriptors is the
lowest. This reveals that population perceptions play a crucial role when
individual responses display large fluctuations."
The next thing to note about the work of the winning team's
algorithm is that it performs just like you would expect it to in that the
results seem to make more sense to a machine than a person.
For anyone familiar with recent examples of machine
learning, you hear a lot of 1. it's like a black box and we can't see what it's
doing to make its decisions, or 2. it's like the adversarial image hack
where they do what looks like
absolutely nothing to the image, and yet
the network reads it as something wildly different than what it is.
In this case, the algorithm found the most "obvious"
patterns in chemical features did not correspond to things we already know
about chemicals. Sure, sulfur atoms correlate to bad smells, but the 2nd and 3rd most correlated
features had nothing to do with features we would associate with odor.
I guess one of the main reasons for this disconnect is the
idea of degeneracy, which is a word that refers to the fact that so many
molecules actually have identical or similar values for simple features. So if
the point of the algorithm is to predict a smell based on chemical information
alone, but then you have a feature that belongs to more than one smell group,
then it sure won't help you to predict which smell it's going to be from the
chemoinformatics. So let's say a chemical has an oxygen molecule. Well lots of
different-smelling chemicals have oxygen in them, so we can't use that simple
feature as a way to organize.
After all this work, it should be noted this one point: the
top 5 features achieve similar performance as random forest with all 4884
features for almost all olfactory qualities (with the exception of “intensity,”
for which the top 15 features are adequate).
Accurate prediction of
personalized olfactory perception from large-scale chemoinformatic features.
Hongyang Li, Bharat Panwar, Gilbert S Omenn, and Yuanfang Guan. Gigascience.
2018 Feb; 7(2): 1–11.
Dravnieks A. Atlas of odor
character profiles. Philadelphia: ASTM; 1985.
Arctander S. Perfume and
flavor chemicals (aroma chemicals). Montclair, NJ: Author; 1969.
Keller A, Vosshall LB.
Olfactory perception of chemically diverse molecules. BMC Neurosci. 2016 Aug 8;
Here is a link
original post on this DREAM Challenge from 2017