One interesting thing here is I am reminded of a gripe I used to have about the Loebner AI tests - when I last looked at that stuff about 5 years ago, the testers often set about "hard abusing" the "black box replier" with semi-bogus questions "knowing they were in a Loebner test".
I got grumpy because it seemed few / none of the entrants had put in "anti-troll" code to deal with stuff like that. To me, anti-troll code should be fairly easy to write, because the bogus questions are often bogus, so "truncate low" with a defensive sweep like "scan nouns and compare class - why is a cake and the Queen in the same sentence?"
Same idea here - unlike those animorph pics as joke memes, to the human eye this is "clearly a turtle" so maybe use 3 scan algs and they should all "converge on the answer and if not, kick it to a decider module".