Monday, January 23, 2017

Bad Information vs Good Information

Deep Learning, An MIT Press book by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016

I'm not sure exactly how they got this image, but it sure looks like it came from the Google Deep Dream project where a deep learning network was asked to 'dream' about images and produce 'overpreceived' images, which look a lot like hallucinating on psychoactive mycotoxins.

Is there such a thing as Bad Information? If so, what is the difference between Good and Bad? How do we know that difference?

Artificial intelligence, but information theory in general, is a common theme in Hidden Scents. How can you not write about it these days? We are computers. At least, we are becoming computers. Or they us. At least, that's what they say. Do we know anything aside from the analogies we use? (We should probably be asking Douglas Hofstader about that one)

Back when pneumatics was the technology du jour, we thought the nervous system worked according to pressure in the nerve fibers. That was correct for the circulatory system, but the utility of that analogy ended there. Eventually, the computer analogy will run out, but until then, we are computers. And these days, specifically we are computers learning to recognize patterns in our environment using forward-feebacked layers of feature detection.

This brings us to the premier of a new infotech textbook.

“The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular.”

Deep Learning is a (new) textbook, so it's too technical for the interested layperson. But there is some good introductory materials that could help straighten things out for people who want to know what it is, but don't have the context-specific knowledge to digest the whole thing.

Here's from the chapter on Information Theory

“Likely events should have low information content, and in the extreme case, events that are guaranteed to happen should have no information content whatsoever.

“Less likely events should have higher information content.

“Independent events should have additive information. For example, finding out that a tossed coin has come up as heads twice should convey twice as much information as finding out that a tossed coin has come up as heads once. “

The text then goes on to translate these maxims into mathematical formulae.

Sometimes someone says something and I'm like, wow, that was really stupid. But then later on, when I try to think about why it was stupid, I find it difficult to articulate. In the text quoted above, we have a good rationale for explaining why a particular statement is 'stupid' or not: It depends on how much information it has. And this is how we measure that information.

In laymen's terms, we would call this the Captain Obvious principle. If you just said something that everyone already knows or should expect, but you said it like it's got good information value (as if nobody knows or expects it) then that would come across as stupid.

There we go again, turning a branch of applied mathematics into a magnifying glass for human behavior; probably not what the authors of this text intended to be done with their work.

Anyway, back to the text. I like their word “hard-coding.” They use it to describe the 'older' way of writing-in knowledge about the world into a program (instead of 'letting the program figure it out for itself,' as these newer deep learning programs are done).

They point out in the introduction that "A person's everyday life requires an immense amount of knowledge about the world. Much of this knowledge is subjective and intuitive, and therefore difficult to articulate in a formal way. Computers need to capture this same knowledge in order to behave in an intelligent way. One of the key challenges in artificial intelligence is how to get this informal knowledge into a computer." Instead, when computers get their own data, by extracting patterns from raw data, this is known as machine learning. Deep learning is a type of machine learning.

Still, figuring out which details are valuable and which are inconsequential is the hardest part. Disentangling is a word emphasized by the authors. That's a favorite word in Hidden Scents as well. So is inextricable, the information-opposite of disentangle. So is disambiguate, the big brother of disentangle.

If you're into this stuff, and a bit more on the application side than the theoretical side, you might want to check this book out. And if you're just into machine-generated hallucinations, or if you've ever tripped on psilocybin mushrooms and want to see something reminiscent – very reminiscent – unnervingly reminiscent – check out the front cover.

Analogy as the Core of Cognition, Douglas Hofstadter, Stanford lecture, 2006

No comments:

Post a Comment