Deep Learning, An MIT Press book by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016
|
I'm not sure exactly how they got this image, but it sure
looks like it came from the Google Deep
Dream project where a deep learning network was asked to 'dream' about
images and produce 'overpreceived' images, which look a lot like hallucinating
on psychoactive mycotoxins.
Is there such a thing as Bad Information? If so, what is
the difference between Good and Bad? How do we know that difference?
Artificial intelligence, but information theory in
general, is a common theme in Hidden Scents. How can
you not write about it these days? We are computers. At least, we are becoming
computers. Or they us. At least, that's what they say. Do we know anything
aside from the analogies we use? (We should probably be asking Douglas Hofstader about
that one)
Back when pneumatics was the technology du jour, we
thought the nervous system worked according to pressure in the nerve fibers.
That was correct for the circulatory system, but the utility of that analogy
ended there. Eventually, the computer analogy will run out, but until then, we
are computers. And these days, specifically we are computers learning to
recognize patterns in our environment using forward-feebacked layers of feature
detection.
This brings us to the premier of a new infotech textbook.
“The Deep Learning textbook is a resource intended to
help students and practitioners enter the field of machine learning in general
and deep learning in particular.”
Deep Learning
is a (new) textbook, so it's too technical for the interested layperson. But
there is some good introductory materials that could help straighten things out
for people who want to know what it is, but don't have the context-specific
knowledge to digest the whole thing.
Here's from the chapter on Information Theory
“Likely events should have low information content, and
in the extreme case, events that are guaranteed to happen should have no
information content whatsoever.
“Less likely events should have higher information
content.
“Independent events should have additive information. For
example, finding out that a tossed coin has come up as heads twice should
convey twice as much information as finding out that a tossed coin has come up
as heads once. “
The text then goes on to translate these maxims into
mathematical formulae.
***
Sometimes someone says something and I'm like, wow, that
was really stupid. But then later on, when I try to think about why it was stupid, I find it difficult
to articulate. In the text quoted above, we have a good rationale for
explaining why a particular statement is 'stupid' or not: It depends on how
much information it has. And this is how we measure that information.
In laymen's terms, we would call this the Captain Obvious
principle. If you just said something that everyone already knows or should
expect, but you said it like it's got good information value (as if nobody
knows or expects it) then that would come across as stupid.
There we go again, turning a branch of applied
mathematics into a magnifying glass for human behavior; probably not what the
authors of this text intended to be done with their work.
***
Anyway, back to the text. I like their word “hard-coding.”
They use it to describe the 'older' way of writing-in knowledge about the world
into a program (instead of 'letting the program figure it out for itself,' as
these newer deep learning programs are done).
They point out in the introduction that "A person's
everyday life requires an immense amount of knowledge about the world. Much of
this knowledge is subjective and intuitive, and therefore difficult to
articulate in a formal way. Computers need to capture this same knowledge in
order to behave in an intelligent way. One of the key challenges in artificial
intelligence is how to get this informal knowledge into a computer."
Instead, when computers get their own data, by extracting patterns from raw
data, this is known as machine learning. Deep learning is a type of machine
learning.
Still, figuring out which details are valuable and which
are inconsequential is the hardest part. Disentangling
is a word emphasized by the authors. That's a favorite word in Hidden Scents as
well. So is inextricable, the
information-opposite of disentangle. So is disambiguate,
the big brother of disentangle.
If you're into this stuff, and a bit more on the
application side than the theoretical side, you might want to check this book
out. And if you're just into machine-generated hallucinations, or if you've
ever tripped on psilocybin mushrooms and want to see something reminiscent –
very reminiscent – unnervingly reminiscent – check out the front cover.
notes:
Analogy as the Core of Cognition, Douglas Hofstadter,
Stanford lecture, 2006
No comments:
Post a Comment