Thursday, March 26, 2020

Colexify My Insides


 
Comparison of universal colexification networks of emotion concepts with Austronesian and Indo-European language families. Credit: T. H. Henry



How do you know that a 12-inch ruler is in fact 12 inches long? You don't. You trust. I don't know who you trust, if it's the ruler manufacturer, or the society you live in, or who else. But you don't actually know how long that ruler really is.

How do ruler manufacturers know how long 12 inches is? They use a ruler, of course. And where does that ruler come from?

I work in a field where we have to take very precise and accurate measurements of environmental conditions, such as nanogram-concentrations of mercury vapor in the air. If your equipment thinks it's pulling 0.2 liters per minute of air instead of 0.3, then what happens after 8 hours worth of minutes? You get a very distorted sense of how much mercury is in the air (96 vs 144 liters to begin with).

This is why we calibrate our equipment, using another piece of equipment to make sure ours is doing what it says it does. Sure we could talk about The Kilogram, which until last year was used to calibrate every other kilogram-measuring thing ever, and was protected in multiple nested glass encasements in a vault in the basement of a nondescript building in the remote countryside of France.

But instead, we're going talk about language. Because there's no Kilogram for language.

***
In the same way that we don't know how long any particular ruler is if we don't have an ur-ruler, how do I know that your meaning of a word is the same as mine? This is like asking if the red you see is the same red I see. Or if the pain you feel is the same pain I feel. Language, like feelings in general, is subjective. How can we calibrate something that has no universal standard?

Language, unlike feelings, does offer a metric by which we can compare and even measure it's meaning to different people. It's not a surprise; words are the way we measure language. But not until now, with the era of Big Data fully upon us, can we can put all the words in the world into one database and compare their meanings across all languages, using the database itself as the closest thing to a universal measuring rod that we can get.

This is called colexification, where we draw lines between all the words in that database, and find common denominators and groupings of words. The goal is to create a universal structure of emotional language that can be used to calibrate and understand these words and especially the people who use them. These are called "emotion colexification networks," and they show us for example how in Austronesian languages, "surprise" is  associated with "fear," whereas Tai-Kadai languages associate "surprise" with the concepts "hope" and "want." (Take a look at the top image in this post.)

In other words, we can now see that if you say you're surprised, but you're saying that in an Austronesian language, then you're probably not so happy, although in English, the word surprise represents something more like happiness.

The researchers working with this ultimate cross-lingual lexicon found significant variations on the positioning of words in the network – the meaning of words changes a lot as you go from one language to another, even if those words are translated as equal with each other.

***
In closing, this is interesting research for the world of olfaction, which is another one of those severely subjective phenomena. In fact, the researchers in this study use the same two data points as for olfactory studies, those being valence and intensity. It should be obvious, because the limbic system is the common denominator between the two. The limbic system is the domain of our emotions and of olfactory experience.

Post-Script
Also like in the very recent olfactory research, this study is made possible because of an advance in the database used. CLICS is a database of colexifications involving 2474 languages from around the world; only a few years ago this database had only 300 languages in it.

Notes
J.C. Jackson el al. Science (2019).

Dec 2019, phys.org



Thursday, March 19, 2020

Hyper Dimensions in Olfactory Space


Artwork by Alex Grey

The title of this post is named after an article about categorizing smells, although it would work just as well as the title of a work of science fiction. It's from last August 2018, so it's old news by now; but that title isn't getting old anytime soon.

Probing the interconnected-ness of odors, and sketching a map of an omnicategorical odor network, the article starts out with a basic premise.

Let's say the olfactory system is designed to warn us of poisons in the environment. But a poison could be many chemicals, or a chemical we've never encountered before. So it would be necessary for the odors of those chemicals to be classified not by features intrinsic to the chemicals themselves, but by the likelihood of their co-occurrence with other chemicals. You can't be born with a database of chemicals to recognize and avoid. So instead, the hypothesis here is that an odor is only identified in its relationship to the other odors it's with.

This idea, at least to my ears, sounds really similar to the way statistical correlation text analysis can determine whether a piece of writing was written by a robot or not. (Also called visual forensics.)

It's way easier to visualize, so I'm taking these three images from the paper itself:




In the above 3 images, the first is a chunk of text written by a robot (most of the words are green, with a few yellows sprinkled in), the second is a real New York Times article (only half is green, the rest is yellow, with some red, and a sprinkle of purple) and the third picture is a clip from "the most unpredictable human text ever written", James Joyce's Finnegan's Wake (the colors green, yellow, red, purple are all evenly distributed about the page).

Green words are very predictably the next word. Yellow words are less likely to show up after the word they show up after. And red and purple are for when the next word is something you absolutely did not expect.

Because text-writing algorithms today use a statistical correlation program based on a compendium of written language (so they know what words typically occur together, and can therefore sound more like a normal person) the output of such algos will tend to look like the topmost image with all green words. Very predictable. The algos can't think for themselves, they can't "come up with" new stuff, and they can't be unpredictable. The whole point of writing an algorithm to do this is to prescribe what it's going to do in advance, i.e., it's predictable.

Bringing this back to olfaction, unfortunately there isn't a compendium of odor associations such as a Bible for smells or an encyclopedia for volatile organic compounds in nature. Furthermore, even if there were, we would need to augment it with a companion encyclopedia of the odors in the anthroposphere, because your supermarket isn't "nature" and yet it organizes a whole lot of our daily scentscape. One day though.

Notes:
Hyperbolic geometry of the olfactory space.
Yuansheng Zhou, Brian H. Smith, Tatyana O. Sharpee
Science Advances  29 Aug 2018: Vol. 4, no. 8, eaaq1458
DOI: 10.1126/sciadv.aaq1458

Catching a Unicorn with GLTR: A tool to detect automatically generated text.
Hendrik Strobelt and Sebastian Gehrmann. Association for Computational Linguistics: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Florence, Italy. July 2019. DOI:10.18653/v1/P19-3019

Thursday, March 12, 2020

Personal Biodata Protection



I've been catching up on the more recent advances in olfactory research, and got lost in a paper from 2003, where they make an olfactory perception database out of co-occurring semantic descriptors from the Sigma Aldrich catalog.

The standard olfactory perception database has grown substantially in the past several years, so we won't go into their results, but I did want to pull an interesting point from their conclusion.

Their data is organized not by chemistry, but by metabolism, and they describe the olfactory system as being able to "recognize metabolism." Our sense of smell is capable of identifying metabolic processes in biological systems. This is one of those things that makes smell so taboo to talk about. Me and you are biological systems.

Throughout the entire book I wrote about the language of smells, not once did the topic of privacy or intimacy come up. But it could have; a major reason why we don't talk about smells is because they remind us of the power held over us by those who get close enough to smell us.

Your scent carries with it information that you wouldn't exactly want to advertise to everyone. The biologic, metabolic activity taking place behind your skin, inside your digestive system, throughout your endocrine network, that information is pretty personal. But I can find out about those things, if I get close enough. Maintaining a general approach to just not talk about smells is probably a good idea all around.

Post Script:
And this focus on metabolism is why I would like to see the metabolome tied-into other olfactory perception databases.

Human Metabolome Database (HMDB) - 40,000 different metabolite entries


Notes:
"Descriptors used to classify molecules containing nitrogen or sulfur were clearly segregated in the odor perception maps. Because these molecules are key atoms in different metabolic cycles, it was proposed that human olfactory perception reflected the organization of animal and plant metabolism."

Quantifying olfactory perception: mapping olfactory perception space by using multidimensional scaling and self-organizing maps. Mamlouk AM, Chee-Ruiter C, Hofmann UG et al. Neurocomputing 2003;52:591–7.

The biological sense of smell: olfactory search behavior and a metabolic view for olfactory perception. C.W.J. Chee-Ruiter. Ph.D. Thesis, California Institute of Technology, Pasadena, CA, 2000.