Tuesday, May 5, 2020

Semantic Bingo

In olfactory research, there's a test called a pairwise similarity test that's used to measure smells, allowing researchers to construction of a map of odor perception.

It's hard to make sense out of smells; it doesn't work like the rest of our sensory system. For a bunch of reasons it's proven quite difficult to produce a model which predicts how a molecule will be perceived.

It's not broken beyond repair, but it is frustrating because we can never seem to get an airtight model that works for all smells and for all people. With hundreds of different receptors, varying over thousands of alleles, scientists often look somewhere else for the organizing principles -- they look for patterns in the words themselves.

In a study from 2015, distributional semantics is used to create an odor map. They say it's the first attempt to do so. This technique rests on the theory that words occuring in similar contexts are in fact similar. Some of you might remember this as "context clues;" if you come across a new word while you're reading, use the surrounding context to help you guess what the word means.

So instead of trying to make a map of molecular features and receptor actuation potentials, they make a map of the words themselves. They use large text datasets, i.e., really big books, one of which was the Sigma-Aldrich Flavors and Fragrances catalog, then score words based on their co-occurances in the text.

I started this post just so I could paste these lists of words, so let's get on with it. On a scale of 0-1, how likely is it that these words can be interchanged?

Similarity Test:
bakery-bread  0.96
grass-lawn       0.96
dog-terrier      0.90
bacon-meat    0.88
oak-wood        0.84
daisy-violet     0.76
daffodil-rose   0.74

Nearest Neighbor Test:
apple - pear, banana, melon, apricot, pineapple
bacon - smoky, roasted, coffee, mesquite, mossy
brandy - rum, whiskey, wine-like, grape, fleshy
cashew - hazlenut, peanut, almond, hawthorne, jam
chocolate - cocoa, sweet, coffee, licorice, roasted
lemon - geranium, grapefruit, tart, floral
cheese - grassy, butter, oily, creamy, coconut
caramel - nutty, roasted, maple, butterscotch, coffee

Kiela, D., Bulat, L. & Clark, S. Grounding semantics in olfactory perception. Assoc. Comput. Linguist. 231–326 (2015).

Distributional Semantics – represents the meanings of words as vectors in a “semantic space”, relying on the distributional hypothesis: the idea that words that occur in similar contexts tend to have similar meanings.