In olfactory research, there's a test called a pairwise
similarity test that's used to measure smells, allowing researchers to
construction of a map of odor perception.
It's hard to make sense out of smells; it doesn't work like
the rest of our sensory system. For a bunch of reasons it's proven quite
difficult to produce a model which predicts how a molecule will be perceived.
It's not broken beyond repair, but it is frustrating because
we can never seem to get an airtight model that works for all smells and for
all people. With hundreds of different receptors, varying over thousands of
alleles, scientists often look somewhere else for the organizing principles --
they look for patterns in the words themselves.
In a study from 2015, distributional
semantics is used to create an odor map. They say it's the first attempt to do
so. This technique rests on the theory that words occuring in similar contexts
are in fact similar. Some of you might remember this as "context
clues;" if you come across a new word while you're reading, use the
surrounding context to help you guess what the word means.
So instead of trying to make a map of molecular features and
receptor actuation potentials, they make a map of the words themselves. They
use large text datasets, i.e., really big books, one of which was the
Sigma-Aldrich Flavors and Fragrances catalog, then score words based on their
co-occurances in the text.
I started this post just so I could paste these lists of
words, so let's get on with it. On a scale of 0-1, how likely is it that these
words can be interchanged?
Similarity Test:
bakery-bread 0.96
grass-lawn 0.96
dog-terrier 0.90
bacon-meat 0.88
oak-wood 0.84
daisy-violet 0.76
daffodil-rose 0.74
Nearest Neighbor Test:
apple - pear, banana, melon, apricot, pineapple
bacon - smoky, roasted, coffee, mesquite, mossy
brandy - rum, whiskey, wine-like, grape, fleshy
cashew - hazlenut, peanut, almond, hawthorne, jam
chocolate - cocoa, sweet, coffee, licorice, roasted
lemon - geranium, grapefruit, tart, floral
cheese - grassy, butter, oily, creamy, coconut
caramel - nutty, roasted, maple, butterscotch, coffee
Notes:
Kiela, D., Bulat, L. & Clark, S. Grounding semantics in
olfactory perception. Assoc. Comput. Linguist. 231–326 (2015).
Distributional Semantics – represents the meanings of words
as vectors in a “semantic space”, relying on the distributional hypothesis: the
idea that words that occur in similar contexts tend to have similar meanings.