Measuring the `I don't know' Problem through the Lens of Gricean Quantity
We consider the intrinsic evaluation of neural generative dialog models through the lens of Grices Maxims of Conversation (1975). Based on the maxim of Quantity (be informative), we propose Relative Utterance Quantity (RUQ) to diagnose the `I don't know' problem. The RUQ diagnostic compares the model score of a generic response to that of the reference response. We find that for reasonable baseline models, `I don't know' is preferred over the reference more than half the time, but this can be mitigated with hyperparameter tuning.
READ FULL TEXT