Evaluation

Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

Automatic evaluation of language generation systems is a well-studied problem in Natural Language Processing. While novel metrics are proposed every year, a few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning …