Testing Untestable Neural Machine Translation: An Industrial Case

Abstract

Neural Machine Translation (NMT) has shown great advantages and is becoming increasingly popular. However, in practice, NMT often produces unexpected translation failures in its translations. While reference-based black-box system testing has been a common practice for NMT quality assurance during development, an increasingly critical industrial practice, named in-vivo testing, exposes unseen types or instances of translation failures when real users are using a deployed industrial NMT system. To fill the gap of lacking test oracles for in-vivo testing of NMT systems, we propose a new methodology for automatically identifying translation failures without reference translations. Our evaluation conducted on real-world datasets shows that our methodology effectively detects several targeted types of translation failures. Our experiences on deploying our methodology in both production and development environments of WeChat (a messenger app with over one billion monthly active users) demonstrate high effectiveness of our methodology along with high industry impact.

Publication
In the 41st International Conference on Software Engineering, Poster, Montréal, QC, Canada.
Date
Links