Tuesday, May 5, 2020

Semantic Bingo



In olfactory research, there's a test called a pairwise similarity test that's used to measure smells, allowing researchers to construction of a map of odor perception.

It's hard to make sense out of smells; it doesn't work like the rest of our sensory system. For a bunch of reasons it's proven quite difficult to produce a model which predicts how a molecule will be perceived.

It's not broken beyond repair, but it is frustrating because we can never seem to get an airtight model that works for all smells and for all people. With hundreds of different receptors, varying over thousands of alleles, scientists often look somewhere else for the organizing principles -- they look for patterns in the words themselves.

In a study from 2015, distributional semantics is used to create an odor map. They say it's the first attempt to do so. This technique rests on the theory that words occuring in similar contexts are in fact similar. Some of you might remember this as "context clues;" if you come across a new word while you're reading, use the surrounding context to help you guess what the word means.

So instead of trying to make a map of molecular features and receptor actuation potentials, they make a map of the words themselves. They use large text datasets, i.e., really big books, one of which was the Sigma-Aldrich Flavors and Fragrances catalog, then score words based on their co-occurances in the text.

I started this post just so I could paste these lists of words, so let's get on with it. On a scale of 0-1, how likely is it that these words can be interchanged?

Similarity Test:
bakery-bread  0.96
grass-lawn       0.96
dog-terrier      0.90
bacon-meat    0.88
oak-wood        0.84
daisy-violet     0.76
daffodil-rose   0.74

Nearest Neighbor Test:
apple - pear, banana, melon, apricot, pineapple
bacon - smoky, roasted, coffee, mesquite, mossy
brandy - rum, whiskey, wine-like, grape, fleshy
cashew - hazlenut, peanut, almond, hawthorne, jam
chocolate - cocoa, sweet, coffee, licorice, roasted
lemon - geranium, grapefruit, tart, floral
cheese - grassy, butter, oily, creamy, coconut
caramel - nutty, roasted, maple, butterscotch, coffee

Notes:
Kiela, D., Bulat, L. & Clark, S. Grounding semantics in olfactory perception. Assoc. Comput. Linguist. 231–326 (2015).

Distributional Semantics – represents the meanings of words as vectors in a “semantic space”, relying on the distributional hypothesis: the idea that words that occur in similar contexts tend to have similar meanings.

Thursday, April 23, 2020

Normosmia Has No Name



I have really met my match. In the world of smell, where language is a game more than a utility, there is one group of researchers who have finally said f*** it. They took all the words out, smashed all the molecules together, presented a bunch of people with their sniff panel, and recorded the responses. (This study is from 2013, but still worth writing about, since this is pretty important point in smell science.)

And it worked. They found that we don't smell molecules; we smell mixtures of molecules. In their words: "The algorithm that worked best was one that treats the odor-mixture as a single value, rather than a bunch of values reflecting each of its components."

They also found that "Pleasantness is the primary odor dimension in human olfactory perception," but we already knew that.

The Study
They use mixtures of 1 to 43 different components, making each 191 mixture-pairs in total, each having 1433 physiochemical descriptors (via the Dragon dataset), and gave them to 48 people.

The Findings: Pairwise Distance Model for Predicting Odorant-Mixture Similarity
They pit the molecular mixtures against each other and have people rate their similarity.
Again, "We found that the mean pairwise Euclidean distance over all the descriptors of all mono-molecular components comprising any two mixtures was a poor predictor of perceptual similarity between the two mixtures." But it gets even better, because apparently they're saying that the weak predicting capacity is because the data is screwed up by the monomolecules' comparisons to themselves! Maybe I'm wrong here, but I think they're saying  people gave different ratings for the same pairs of molecules at different times in the test, and those ratings changed so much from time to time, that they make the model no good.

The Findings: Angle Distance Model for Predicting Odorant-Mixture Similarity
This is the meat of the study. They came up with a statistical regime to turn the odor mixtures into a perceptual whole, that way it could be manipulated as if they it was an individual odor.

Just about every smell-science experiment like this will use odors that have been very carefully isolated – single molecules with single names (let's not kid anyone here, any particular molecule can have a dozen different names, from the local vernacular to the formalized IUPAC designation). The point is that molecules are typically isolated. Because science likes that. Lumping molecules together is messy. But that's also how we interface odors in the real world.

So they made this study more like the real world. They take all the physiochemical components of each molecule, add them together, and divide by the norm. That makes the mixture-odor as if it were a single odor, with a single vector (a single point in multi-dimensional odor-space). And this was the model that worked.

If you take its higher logical plateau, you end up with Olfactory White, one of the most mindbending osmological facts you'll ever comprehend: if you add enough molecules together, it doesn't make "brown paint" like colors do, it makes the mixture smell like nothing. Sure enough, these researchers found that the more components you add to the mixture, the closer the mixture gets to every other mixture (approaching 30 components).

A very important note here is that these odor mixtures were made to be all the same intensity. (This was done in the Olfactory White study as well.) Odors can have very different perceived intensities, and it's more than unlikely that this would ever happen in the wild. I can't help but get into some quick industrial hygiene here: the odor detection threshold for Ammonia is 50ppm; Acetone 100ppm, Trimethyl Amine (rotten fish) 0.0002ppm, Hydrogen Sulfide (rotten eggs) 0.005ppm.

Another extension of this study is maybe not so logical, but certainly an important point in smell science: we can't identify individual components of a mixture of only 4 components. You think you know what peanut butter smells like. And pineapples, and cinnamon. But if you add one more thing to that mixture, they all fall away, losing their identify on your great epithelial equalizer.

Conclusion
"The olfactory system treats odorant-mixtures as unitary synthetic objects, and not as an analytical combination of components."

Limitations
Being that they have three clear limitations, they should be included here:

1. The mixtures were intensity-normalized. This is not natural, because perceived odor-intensity changes drastically across odors. See mention above.

2. The odorants represent only a limited portion of olfactory perceptual space (not much we can do about that, since it's kind of infinite).

3. Many physicochemical features such as boiling point or vapor pressure remain unrepresented (they narrowed down the features from ~4,000 to 25).

Post Script
If you don’t know what the word steganography means, you do know: camouflage. Think about it – you don’t like broccoli? Blend it with enough other smells and you won’t even notice!

Olfactory White also its own name, and it’s called Laurax.

Notes
Semantic free approach to structure-odor prediction, general perceptual primaries rather than individual odorant primaries:
Predicting odor perceptual similarity from odor structure.
Snitz K, Yablonka A, Weiss T, Frumin I, Khan RM, Sobel N
PLoS Comput Biol. 2013; 9(9):e1003184.

Olfactory White:
Weiss T, Snitz K, Yablonka A, Khan RM, Gafsou D, et al. (2012) Perceptual convergence of multi-component mixtures in olfaction implies an olfactory white. Proc Natl Acad Sci USA 109: 19959–19964.

Odor Thresholds:
Gregory Leonardos , David Kendall & Nancy Barnard (1969) Odor Threshold Determinations of 53 Odorant Chemicals, Journal of the Air Pollution Control Association, 19:2, 91-95, DOI: 10.1080/00022470.1969.10466465

Tuesday, April 14, 2020

On the Power of Words


 

I thought this article about "Keyword Signaling" would be a good one to put here, because it shows us the power of words in today's world. (And the only thing more interesting to Limbic Signal than smells is words.)

With the omni-depository that is the Internet, and the text-based search engines we use to interface with it, words have taken on a new meaning in our world.

Each word you type into a search box will tailor your online experience to such a degree of specificity that no two forays will be the same.

(Super sidenote, back in the day when Google's predictive search algorithm, as well as our collective psyche, was naked for all to see, you could type "Why does Daddy..." and watch a whole lot of sociology populate your search field; in order, the predictions were "...hit Mommy," "...wear a dress," and...I forget because the first two were enough to make me realize what the Internet had really become. This was circa 2010 maybe. But if you changed the text ever so slightly to say "Why does my Dad..." which suggests an older person conducting the search, both because of the dropping of the diminutive -y but also the adding of the possession "my" which shows that the person is aware they have their own Dad vs other Dads, you would get different predictions, such as "...drink so much." The difference in search terms-results is subtle, but it's baked into the interface.)

The work done by Data and Society Institute's Fracesca Tripoldi shows how this all works and especially how it's being used to manipulate the datasphere.

First you find a data void. That's a topic, or rather a term used as a pointer to a topic, that brings up no search results.

There's no results because nobody is using the term, not necessarily because nobody is talking about the topic. But you come up with that new, unused term, and you create content to go along with it. Like, fake content, conspiracy theory content, propaganda content, salacious content, whatever, and when you slap your new word on there, you now own the search results for that content.

Let's say I want to steer people away from the actual facts about a mass murder at an elementary school, so I create a new term - "crisis actor" - then I generate all kinds of content about people who pretend they were in a mass shooting, and I slap my keyword (crisis actor) all over the content, and THEN I make sure to spread the keyword around as much as possible, so other people start saying it out loud, and then other people will start searching for it, which will weight the results of my keyword.

Guess where they'll end up? They'll end up at my site, reading my content. They won't be swayed by different views of the situation, because the whole idea of the crisis actor, the keyword and the content, was fabricated, it was artifice, and it exists in a vacuum, unconnected to the rest of the world.

When I start with a data void, I can control everything that goes into it, and I can make sure that you never see the other side of the story, because there is no other side. Only my side. It doesn't exist within the actual ecosystem of information. It is a Frankenstein of an info-ecosystem, engineered by me to send you in a very specific direction.

The fact that our global repository of information is accessed by typing words into a searchbox-algorithm leaves it susceptible to such workings. It also leaves out an entire dimension of human sensation, that being olfaction.

How does misinformation and nefarious SEO engineering relate to olfaction? Because smells don't correlate to words as a matter of fact, only as a matter of opinion. Everything you can ever read about smells is already, from the start, "alternative facts," because there were no real facts to begin with. The two - language and smell - they just don't go together. The keyword and the content, they have to be artificial, by the very nature of the sense of smell. The entire human experience in regards to olfaction is one big data void, filled by poets and marketing slogans. That is, until the search box can be filled with emotions and autobiographies.

Notes:
"The problem is, whether or not we’re aware, the key words we search are coded with political biases. My research demonstrates that it’s possible to position ideological searches to maximize the exposure of their content."
—Data & Society Affiliate Francesca Tripodi, WIRED

May 2019, Francesca Tripodi for Data and Society