Friday, April 3, 2020

Categorgonzola



A perennial topic on this weblog is the categorization of smells. Today I'm looking at a study from 2011 that looks at common features that group smells together. One of the common denominators is hedonics, or pleasantness vs non-pleasantness.

It always makes me pause to think about this, because it seems that people can never really agree on what makes a smell good or bad, and yet the hedonic dimension is the only one that keeps coming back as the primary distinction between odors. I guess that's just the law of large numbers at work, a law which is against natural human cognition.

If you include enough people in your study, the differences between us cancel out and you're left with a fuzzy but recognizable picture of a smell map, which is seen above.

The other common denominator (it’s not a denominator if there’s two, right?) is a dimension the researchers call natural/chemical.

This map is organized as follows: Whereas the pleasantness of an odor can be predicted on the number of carbon atoms per molecule (related to how fast it evaporates), the natural/chemical dimension is predicted by the polarity of the molecules, or how attracted they are to water.

Why? Not so sure. Mention is made to the difference in the olfactory receptors themselves - some are from when we were fish and some are from when we became land animals, so the two may have a different relationship with water (polarity).

For example, odorants are dispersed more slowly in the water. Also, smellable molecules to fish don’t have to be volatile organic compounds, because for a fish, the air itself is already a liquid. So fish detect water soluble molecules whereas humans detect airborne molecules.

Actually, now that I look at the ‘natural’ part of the map, I realize that none of those things exist underwater, right? Burnt? Nope. Moldy? Although mold is always associated with moisture, it doesn’t grow underwater. And Earthy? Kind of the opposite of water.

Natural - Burnt, Smoky, Nutty, Woody, Resinous, Musty, Earthy, Moldy, Almond, Popcorn, Peanut Butter, Oily, Fatty, Warm, Dry, Powdery

Chemical - Etherish, Anaesthetic, Chemical, Medicinal, Disinfectant, Carbolic, Sharp, Pungent, Acid, Gasoline, Solvent, Cook, Cooling, Cleaning Fluid, Paint, Camphor

Good - Fragrant, Sweet, Perfumery, Floral, Light, Aromatic, Cool, Cooling, Fruity, Citrus, Rose

Bad - Sharp, Pungent, Acid, Heavy, Musty, Earthy, Moldy, Burnt, Smoky, Oily, Fatty, Sour, Vinegar

-image source: link

Notes:
In search of the structure of human olfactory space. A. A. Koulakov, B. E. Kolterman, A. G. Enikolopov, D. Rinberg. Front. Syst. Neurosci. 5, 65 (2011).

Thursday, March 26, 2020

Colexify My Insides


 
Comparison of universal colexification networks of emotion concepts with Austronesian and Indo-European language families. Credit: T. H. Henry



How do you know that a 12-inch ruler is in fact 12 inches long? You don't. You trust. I don't know who you trust, if it's the ruler manufacturer, or the society you live in, or who else. But you don't actually know how long that ruler really is.

How do ruler manufacturers know how long 12 inches is? They use a ruler, of course. And where does that ruler come from?

I work in a field where we have to take very precise and accurate measurements of environmental conditions, such as nanogram-concentrations of mercury vapor in the air. If your equipment thinks it's pulling 0.2 liters per minute of air instead of 0.3, then what happens after 8 hours worth of minutes? You get a very distorted sense of how much mercury is in the air (96 vs 144 liters to begin with).

This is why we calibrate our equipment, using another piece of equipment to make sure ours is doing what it says it does. Sure we could talk about The Kilogram, which until last year was used to calibrate every other kilogram-measuring thing ever, and was protected in multiple nested glass encasements in a vault in the basement of a nondescript building in the remote countryside of France.

But instead, we're going talk about language. Because there's no Kilogram for language.

***
In the same way that we don't know how long any particular ruler is if we don't have an ur-ruler, how do I know that your meaning of a word is the same as mine? This is like asking if the red you see is the same red I see. Or if the pain you feel is the same pain I feel. Language, like feelings in general, is subjective. How can we calibrate something that has no universal standard?

Language, unlike feelings, does offer a metric by which we can compare and even measure it's meaning to different people. It's not a surprise; words are the way we measure language. But not until now, with the era of Big Data fully upon us, can we can put all the words in the world into one database and compare their meanings across all languages, using the database itself as the closest thing to a universal measuring rod that we can get.

This is called colexification, where we draw lines between all the words in that database, and find common denominators and groupings of words. The goal is to create a universal structure of emotional language that can be used to calibrate and understand these words and especially the people who use them. These are called "emotion colexification networks," and they show us for example how in Austronesian languages, "surprise" is  associated with "fear," whereas Tai-Kadai languages associate "surprise" with the concepts "hope" and "want." (Take a look at the top image in this post.)

In other words, we can now see that if you say you're surprised, but you're saying that in an Austronesian language, then you're probably not so happy, although in English, the word surprise represents something more like happiness.

The researchers working with this ultimate cross-lingual lexicon found significant variations on the positioning of words in the network – the meaning of words changes a lot as you go from one language to another, even if those words are translated as equal with each other.

***
In closing, this is interesting research for the world of olfaction, which is another one of those severely subjective phenomena. In fact, the researchers in this study use the same two data points as for olfactory studies, those being valence and intensity. It should be obvious, because the limbic system is the common denominator between the two. The limbic system is the domain of our emotions and of olfactory experience.

Post-Script
Also like in the very recent olfactory research, this study is made possible because of an advance in the database used. CLICS is a database of colexifications involving 2474 languages from around the world; only a few years ago this database had only 300 languages in it.

Notes
J.C. Jackson el al. Science (2019).

Dec 2019, phys.org



Thursday, March 19, 2020

Hyper Dimensions in Olfactory Space


Artwork by Alex Grey

The title of this post is named after an article about categorizing smells, although it would work just as well as the title of a work of science fiction. It's from last August 2018, so it's old news by now; but that title isn't getting old anytime soon.

Probing the interconnected-ness of odors, and sketching a map of an omnicategorical odor network, the article starts out with a basic premise.

Let's say the olfactory system is designed to warn us of poisons in the environment. But a poison could be many chemicals, or a chemical we've never encountered before. So it would be necessary for the odors of those chemicals to be classified not by features intrinsic to the chemicals themselves, but by the likelihood of their co-occurrence with other chemicals. You can't be born with a database of chemicals to recognize and avoid. So instead, the hypothesis here is that an odor is only identified in its relationship to the other odors it's with.

This idea, at least to my ears, sounds really similar to the way statistical correlation text analysis can determine whether a piece of writing was written by a robot or not. (Also called visual forensics.)

It's way easier to visualize, so I'm taking these three images from the paper itself:




In the above 3 images, the first is a chunk of text written by a robot (most of the words are green, with a few yellows sprinkled in), the second is a real New York Times article (only half is green, the rest is yellow, with some red, and a sprinkle of purple) and the third picture is a clip from "the most unpredictable human text ever written", James Joyce's Finnegan's Wake (the colors green, yellow, red, purple are all evenly distributed about the page).

Green words are very predictably the next word. Yellow words are less likely to show up after the word they show up after. And red and purple are for when the next word is something you absolutely did not expect.

Because text-writing algorithms today use a statistical correlation program based on a compendium of written language (so they know what words typically occur together, and can therefore sound more like a normal person) the output of such algos will tend to look like the topmost image with all green words. Very predictable. The algos can't think for themselves, they can't "come up with" new stuff, and they can't be unpredictable. The whole point of writing an algorithm to do this is to prescribe what it's going to do in advance, i.e., it's predictable.

Bringing this back to olfaction, unfortunately there isn't a compendium of odor associations such as a Bible for smells or an encyclopedia for volatile organic compounds in nature. Furthermore, even if there were, we would need to augment it with a companion encyclopedia of the odors in the anthroposphere, because your supermarket isn't "nature" and yet it organizes a whole lot of our daily scentscape. One day though.

Notes:
Hyperbolic geometry of the olfactory space.
Yuansheng Zhou, Brian H. Smith, Tatyana O. Sharpee
Science Advances  29 Aug 2018: Vol. 4, no. 8, eaaq1458
DOI: 10.1126/sciadv.aaq1458

Catching a Unicorn with GLTR: A tool to detect automatically generated text.
Hendrik Strobelt and Sebastian Gehrmann. Association for Computational Linguistics: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Florence, Italy. July 2019. DOI:10.18653/v1/P19-3019