As mobile apps are increasingly becoming data-driven, these apps tend to collect much app usage data to carry out their promised utilities and enhance user experiences. Unfortunately, some highly sensitive information in the data provides little or no benefit towards delivering the apps’ utilities. For instance, for an app whose purpose is to show video game trailers, it is unnecessary to request and send its users’ phone number and contact list to a remote server. There is a strong need for a framework to help protect users’ app usage data while retaining the app’s utility efficacy (e.g., the number of enabled features).
There are three main challenges in realizing such framework. First, it is difficult to correctly identify security-sensitive information in the app usage data. For instance, user input text (such as “My password is 12345”) can contain sensitive information, and such framework needs to understand the semantic meaning of such text in order to know whether sensitive information is present or not. Second, because utilities of apps vary dramatically, there is a need for generically applicable program analysis to measure the impact of information anonymization on the level of utility efficacy. Third, balancing privacy preservation and utility efficacy requires fine-grained analysis on privacy specification (such as a privacy policy declared by the app’s developers) and the app. To address these challenges, we propose a privacy framework that enables a mobile app’s developers to determine what sensitive information can be anonymized while maintaining a desirable level of utility efficacy.