Artwork by Alex Grey
The title of this post is named after an article about categorizing smells, although it would work just as well as the title of a work of science fiction. It's from last August 2018, so it's old news by now; but that title isn't getting old anytime soon.
Probing the interconnected-ness of odors, and sketching a map of an omnicategorical odor network, the article starts out with a basic premise.
Let's say the olfactory system is designed to warn us of poisons in the environment. But a poison could be many chemicals, or a chemical we've never encountered before. So it would be necessary for the odors of those chemicals to be classified not by features intrinsic to the chemicals themselves, but by the likelihood of their co-occurrence with other chemicals. You can't be born with a database of chemicals to recognize and avoid. So instead, the hypothesis here is that an odor is only identified in its relationship to the other odors it's with.
This idea, at least to my ears, sounds really similar to the way statistical correlation text analysis can determine whether a piece of writing was written by a robot or not. (Also called visual forensics.)
It's way easier to visualize, so I'm taking these three images from the paper itself:
In the above 3 images, the first is a chunk of text written by a robot (most of the words are green, with a few yellows sprinkled in), the second is a real New York Times article (only half is green, the rest is yellow, with some red, and a sprinkle of purple) and the third picture is a clip from "the most unpredictable human text ever written", James Joyce's Finnegan's Wake (the colors green, yellow, red, purple are all evenly distributed about the page).
Green words are very predictably the next word. Yellow words are less likely to show up after the word they show up after. And red and purple are for when the next word is something you absolutely did not expect.
Because text-writing algorithms today use a statistical correlation program based on a compendium of written language (so they know what words typically occur together, and can therefore sound more like a normal person) the output of such algos will tend to look like the topmost image with all green words. Very predictable. The algos can't think for themselves, they can't "come up with" new stuff, and they can't be unpredictable. The whole point of writing an algorithm to do this is to prescribe what it's going to do in advance, i.e., it's predictable.
Bringing this back to olfaction, unfortunately there isn't a compendium of odor associations such as a Bible for smells or an encyclopedia for volatile organic compounds in nature. Furthermore, even if there were, we would need to augment it with a companion encyclopedia of the odors in the anthroposphere, because your supermarket isn't "nature" and yet it organizes a whole lot of our daily scentscape. One day though.
Hyperbolic geometry of the olfactory space.
Yuansheng Zhou, Brian H. Smith, Tatyana O. Sharpee
Science Advances 29 Aug 2018: Vol. 4, no. 8, eaaq1458
Catching a Unicorn with GLTR: A tool to detect automatically generated text.
Hendrik Strobelt and Sebastian Gehrmann. Association for Computational Linguistics: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Florence, Italy. July 2019. DOI:10.18653/v1/P19-3019