Cognitive Bias in Artificial Intelligence

I believe that artificial intelligence will suffer from cognitive biases, just as humans do. They might be altogether different kinds of bias, I won’t speculate about the details. I came to this conclusion by reading “Thinking Fast and Slow” by psychologist Daniel Kahneman, which proposes the brain has two modes of analysis: a “snap judgement” or “first impression” system and a more methodical or calculating system. Often we engage the quick system out of computational laziness. Why wouldn’t a machine do the same?

Researchers in machine learning already take careful steps to avoid many biases: data collection bias, overfitting, initial connection bias in the neural net, etc. But, I haven’t yet heard of any addressing computation biases in the resulting neural net. I think precursors of biased behavior have been observed already, but was explained away as being present in the input data or as resulting from the reward function during training, or some other statistical inadequacy.

Let me give a simplified example (and admittedly poor example for my argument) of cognitive bias present in humans and reflect on why it would be difficult to filter out such bias in a machine learning algorithm.

In the Muller-Lyer Illusion, which consists of a pair of arrows with fins pointing away or toward the center. Each shaft has the same length, but one appears longer. As a human familiar with this illusion, I will report that the shafts have equal length. Yet, subjectively, I do indeed perceive them as being different. My familiarity with the illusion allows me to report accurate information, lying about my subjective experience.

Now suppose that we train a neural net to gauge linear distances. And we have a way of asking it whether the lines in the Muller-Lyer diagram have the same length. What will it report? Well that depends, being a machine it might have a better mechanism for measuring lines directly in pixels and thus be immune to the extraneous information presented by the fins on the ends of those lines. But, humans ought to have that functionality as well on the cellular sensory level, yet we don’t. But, if the Muller-Lyer Illusion doesn’t fool the neural net, does a different picture confuse it? So far, yes, such things happen: the ML categorizes incorrectly when a human wouldn’t. We tend to interpret this as a one-off “mistake” rather than a “bias”. But the researchers succumb to evidence bias: they have only one example of incorrect categorization and they don’t perform a follow-up investigation into whether that example represents a whole class, demonstrating a cognitive bias in the neural net.

Now suppose the researcher do perform the diligence necessary and discover a cognitive bias. They generate new examples and retrain the net. Now it performs correct categorization for those examples. Have they really removed the bias at a fundamental level? or does the net now have a corrective layer, like I do? I presume the answer here depends on the computation capacity of the net: simple nets will have been retrained, while more complex ones might only have trained a fixer circuit, which identifies the image as being a specific kind of illusion. Thus, the more capable the neural net, the more likely it starts looking like a human: with a first impression followed by a second guess.

How ought research approach this problem? Should the biases get identified one at a time and subsequently be removed with additional training? Due to the large number of biases (c.f. all of Less Wrong, or  this list of cognitive biases), I think that approach doesn’t scale well. Especially considering that biases result from cognitive architecture and trained neural nets differ from human brains, I think the biases in ML will be new to us. Those should be exciting discoveries! I propose training with multiple adversarial nets, each trying to confuse the categorizer. This approach contains architectural symmetry, so it probably won’t work for biases that result from differences in wet-ware vs. hard-ware computation. Those should be even more interesting discoveries!

Humans clearly have a large reliance on contextual clues and the whole point of investing in ML is to capture and replicate that level of cognition. But contextual clues can mislead as easily as they help. So ML ought to have cognitive bias, as humans do, but very likely different kinds. Efforts to train out that bias might even be met with repulsion. Humans feel comfort with the familiar, so cognition which has our biases removed should feel viscerally unwelcome. For example, robots which lack biases associated with empathy will be perceived as sociopathic.