WebEvo: Taming Web Application Evolution via Detecting Semantic Structure Change


In order to prevent information retrieval (IR) and robotic process automation (RPA) tools from functioning improperly due to web- site evolution, it is important to develop web monitoring tools to monitor changes in a website and report them to the developers and testers. Existing monitoring tools commonly make use of DOM-tree based similarity and visual analysis between different versions of web pages. However, DOM-tree based similarity suffers are prone to false positives, since they cannot identify content-based changes (i.e., contents refreshed every time a web page is retrieved) and GUI widget evolution (e.g., moving a button). Such imprecision adversely affect IR tools or test scripts. To address this problem, we propose approach, WebEvo, that first performs DOM-based change detection, and then leverages historic pages to identify the regions that represent content-based changes, which can be safely ignored. Further, to identify refactoring changes that preserve semantics and appearances of GUI widgets, WebEvo adapts computer vision (CV) techniques to identify the mappings of the GUI widgets from the old web page to the new web page on an element-by-element basis. Empirical evaluations on 13 real-world websites from 9 popular cat- egories demonstrate the superiority of WebEvo over the existing work that relies on DOM-tree based detection or whole-page visual comparison, while also being faster in visual analysis.

In The ACM SIGSOFT International Symposium on Software Testing and Analysis.