Computer Science ETDs

Publication Date

Spring 1-31-2018


What is forbidden to talk about using Chinese apps? Companies operating in China face a complex array of regulations and are liable for content voiced using their platforms. Previous work studying Chinese censorship uses (1) sample testing or (2) measures content deletion; however, these techniques produce an incomplete picture biased toward (1) the tested samples or (2) whichever topics were trending.

In this dissertation, I use reverse engineering to study the code that applications use to determine whether to censor content. In doing so, I can provide a more complete and unbiased view of Chinese Internet censorship. I reverse engineer applications across three Chinese industry segments: instant messaging, live streaming, and gaming. Together this reveals over 100,000 unique blacklisted keywords from blacklists spanning hundreds of different companies.

A common assumption in Chinese censorship research is that observed censorship is the result of a monolithic motive; however, in this dissertation, where I provide a more complete and unbiased view of Chinese Internet censorship, I will test three hypotheses: (1) there is little overlap between the keyword lists used by different companies, (2) there is no China-wide list of banned words or topics largely determining what Chinese companies censor, and (3) provincial-wide lists of banned words or topics do not largely determine what companies censor. These hypotheses suggest that it is largely Chinese companies that are burdened with choosing what topics to censor.




China, censorship, keyword filtering, social media, Internet, Great Firewall

Document Type


Degree Name

Computer Science

Level of Degree


Department Name

Department of Computer Science

First Committee Member (Chair)

Jedidiah Crandall

Second Committee Member

Ronald Deibert

Third Committee Member

Stephanie Forrest

Fourth Committee Member

Jared Saia