A Medley of Potpourri is just what it says; various thoughts, opinions, ruminations, and contemplations on a variety of subjects.
Search This Blog
Saturday, January 19, 2019
A New Approach to Understanding How Machines Think
Neural
networks are famously incomprehensible — a computer can come up with a
good answer, but not be able to explain what led to the conclusion. Been
Kim is developing a “translator for humans” so that we can understand
when artificial intelligence breaks down.
Been
Kim, a research scientist at Google Brain, is developing a way to ask a
machine learning system how much a specific, high-level concept went
into its decision-making process.
If
a doctor told that you needed surgery, you would want to know why — and
you’d expect the explanation to make sense to you, even if you’d never
gone to medical school. Been Kim, a research scientist at Google Brain,
believes that we should expect nothing less from artificial
intelligence. As a specialist in “interpretable” machine learning, she
wants to build AI software that can explain itself to anyone.
Since its ascendance roughly a decade ago, the neural-network
technology behind artificial intelligence has transformed everything
from email to drug discovery with its increasingly powerful ability to
learn from and identify patterns in data. But that power has come with
an uncanny caveat: The very complexity that lets modern deep-learning
networks successfully teach themselves how to drive cars and spot
insurance fraud also makes their inner workings nearly impossible to
make sense of, even by AI experts. If a neural network is trained to
identify patients at risk for conditions like liver cancer and
schizophrenia — as a system called “Deep Patient” was in 2015, at Mount Sinai Hospital in New York
— there’s no way to discern exactly which features in the data the
network is paying attention to. That “knowledge” is smeared across many
layers of artificial neurons, each with hundreds or thousands of
connections.
As ever more industries attempt to automate or enhance their
decision-making with AI, this so-called black box problem seems less
like a technological quirk than a fundamental flaw. DARPA’s “XAI”
project (for “explainable AI”) is actively researching the problem, and
interpretability has moved from the fringes of machine-learning research
to its center. “AI is in this critical moment where humankind is trying
to decide whether this technology is good for us or not,” Kim says. “If
we don’t solve this problem of interpretability, I don’t think we’re
going to move forward with this technology. We might just drop it.”
Kim and her colleagues at Google Brain recently developed a system
called “Testing with Concept Activation Vectors” (TCAV), which she
describes as a “translator for humans” that allows a user to ask a black
box AI how much a specific, high-level concept has played into its
reasoning. For example, if a machine-learning system has been trained to
identify zebras in images, a person could use TCAV to determine how
much weight the system gives to the concept of “stripes” when making a
decision.
TCAV was originally tested on machine-learning models trained to
recognize images, but it also works with models trained on text and
certain kinds of data visualizations, like EEG waveforms. “It’s generic
and simple — you can plug it into many different models,” Kim says.
Quanta Magazine spoke with Kim about what interpretability
means, who it’s for, and why it matters. An edited and condensed version
of the interview follows.
You’ve focused your career on “interpretability” for machine learning. But what does that term mean, exactly?
There are two branches of interpretability. One branch is
interpretability for science: If you consider a neural network as an
object of study, then you can conduct scientific experiments to really
understand the gory details about the model, how it reacts, and that
sort of thing.
The second branch of interpretability, which I’ve been mostly focused
on, is interpretability for responsible AI. You don’t have to
understand every single thing about the model. But as long as you can
understand just enough to safely use the tool, then that’s our goal.
But how can you have confidence in a system that you don’t fully understand the workings of?
I’ll give you an analogy. Let’s say I have a tree in my backyard that
I want to cut down. I might have a chain saw to do the job. Now, I
don’t fully understand how the chain saw works. But the manual says,
“These are the things you need to be careful of, so as to not cut your
finger.” So, given this manual, I’d much rather use the chainsaw than a
handsaw, which is easier to understand, but would make me spend five
hours cutting down the tree.
You understand what “cutting” is, even if you don’t exactly know everything about how the mechanism accomplishes that.
Yes. The goal of the second branch of interpretability is: Can we
understand a tool enough so that we can safely use it? And we can create
that understanding by confirming that useful human knowledge is
reflected in the tool.
How does “reflecting human knowledge” make something like a black box AI more understandable?
Here’s another example. If a doctor is using a machine-learning model
to make a cancer diagnosis, the doctor will want to know that the model
isn’t picking up on some random correlation in the data that we don’t
want to pick up. One way to make sure of that is to confirm that the
machine-learning model is doing something that the doctor would have
done. In other words, to show that the doctor’s own diagnostic knowledge
is reflected in the model.
So if doctors were looking at a cell specimen to diagnose cancer,
they might look for something called “fused glands” in the specimen.
They might also consider the age of the patient, as well as whether the
patient has had chemotherapy in the past. These are factors or concepts
that the doctors trying to diagnose cancer would care about. If we can
show that the machine-learning model is also paying attention to these
factors, the model is more understandable, because it reflects the human
knowledge of the doctors.
Google Brain’s Been Kim is building ways to let us interrogate the decisions made by machine learning systems.
Is this what TCAV does — reveal which high-level concepts a machine-learning model is using to make its decisions?
Yes. Prior to this, interpretability methods only explained what
neural networks were doing in terms of “input features.” What do I mean
by that? If you have an image, every single pixel is an input feature.
In fact, Yann LeCun [an early pioneer in deep learning and currently the
director of AI research at Facebook] has said that he believes these
models are already superinterpretable because you can look at every
single node in the neural network and see numerical values for each of
these input features. That’s fine for computers, but humans don’t think
that way. I don’t tell you, “Oh, look at pixels 100 to 200, the RGB
values are 0.2 and 0.3.” I say, “There’s a picture of a dog with really
puffy hair.” That’s how humans communicate — with concepts.
How does TCAV perform this translation between input features and concepts?
Let’s return to the example of a doctor using a machine-learning
model that has already been trained to classify images of cell specimens
as potentially cancerous. You, as the doctor, may want to know how much
the concept of “fused glands” mattered to the model in making positive
predictions of cancer. First you collect some images — say, 20 — that
have examples of fused glands. Now you plug those labeled examples into
the model.
Then what TCAV does internally is called “sensitivity testing.” When
we add in these labeled pictures of fused glands, how much does the
probability of a positive prediction for cancer increase? You can output
that as a number between zero and one. And that’s it. That’s your TCAV
score. If the probability increased, it was an important concept to the
model. If it didn’t, it’s not an important concept.
“Concept” is a fuzzy term. Are there any that won’t work with TCAV?
If you can’t express your concept using some subset of your
[dataset’s] medium, then it won’t work. If your machine-learning model
is trained on images, then the concept has to be visually expressible.
Let’s say I want to visually express the concept of “love.” That’s
really hard.
We also carefully validate the concept. We have a statistical testing
procedure that rejects the concept vector if it has the same effect on
the model as a random vector. If your concept doesn’t pass this test,
then the TCAV will say, “I don’t know. This concept doesn’t look like
something that was important to the model.”
Is TCAV essentially about creating trust in AI, rather than a genuine understanding of it?
It is not — and I’ll explain why, because it’s a fine distinction to make.
We know from repeated studies in cognitive science and psychology
that humans are very gullible. What that means is that it’s actually
pretty easy to fool a person into trusting something. The goal of
interpretability for machine learning is the opposite of this. It is to
tell you if a system is not safe to use. It’s about revealing the truth. So “trust” isn’t the right word.
So the point of interpretability is to reveal potential flaws in an AI’s reasoning?
Yes, exactly.
How can it expose flaws?
You can use TCAV to ask a trained model about irrelevant concepts. To
return to the example of doctors using AI to make cancer predictions,
the doctors might suddenly think, “It looks like the machine is giving
positive predictions of cancer for a lot of images that have a kind of
bluish color artifact. We don’t think that factor should be taken into
account.” So if they get a high TCAV score for “blue,” they’ve just
identified a problem in their machine-learning model.
TCAV is designed to bolt on to existing AI systems that aren’t
interpretable. Why not make the systems interpretable from the
beginning, rather than black boxes?
There is a branch of interpretability research that focuses on
building inherently interpretable models that reflect how humans reason.
But my take is this: Right now you have AI models everywhere that are
already built, and are already being used for important purposes,
without having considered interpretability from the beginning. It’s just
the truth. We have a lot of them at Google! You could say,
“Interpretability is so useful, let me build you another model to
replace the one you already have.” Well, good luck with that.
So then what do you do? We still need to get through this critical
moment of deciding whether this technology is good for us or not. That’s
why I work “post-training” interpretability methods. If you have a
model that someone gave to you and that you can’t change, how do you go
about generating explanations for its behavior so that you can use it
safely? That’s what the TCAV work is about.
TCAV lets humans ask an AI if certain concepts matter to it. But
what if we don’t know what to ask — what if we want the AI system to
explain itself?
We have work that we’re writing up right now that can automatically
discover concepts for you. We call it DTCAV — discovery TCAV. But I
actually think that having humans in the loop, and enabling the
conversation between machines and humans, is the crux of
interpretability.
A lot of times in high-stakes applications, domain experts already
have a list of concepts that they care about. We see this repeat over
and over again in our medical applications at Google Brain. They don’t
want to be given a set of concepts — they want to tell the model the
concepts that they are interested in. We worked with a doctor who treats
diabetic retinopathy, which is an eye disease, and when we told her
about TCAV, she was so excited because she already had many, many
hypotheses about what this model might be doing, and now she can test
those exact questions. It’s actually a huge plus, and a very
user-centric way of doing collaborative machine learning.
You believe that without interpretability, humankind might just give
up on AI technology. Given how powerful it is, do you really think
that’s a realistic possibility?
Yes, I do. That’s what happened with expert systems. [In the 1980s]
we established that they were cheaper than human operators to conduct
certain tasks. But who is using expert systems now? Nobody. And after
that we entered an AI winter.
Right now it doesn’t seem likely, because of all the hype and money
in AI. But in the long run, I think that humankind might decide —
perhaps out of fear, perhaps out of lack of evidence — that this
technology is not for us. It’s possible.