Computer Science ETDs
Publication Date
Spring 3-11-2020
Abstract
Widespread Chinese social media applications such as Sina Weibo (Chinese Twitter), the most popular social network in China, are widely known for monitoring and deleting posts to conform to Chinese government requirements. Censorship of Chinese social media is a complex process that involves many factors. There are multiple stakeholders and many different interests: economic, political, legal, personal, etc., which means that there is not a single strategy dictated by a single government authority. Moreover, sometimes Chinese social media do not follow the directives of government, out of concern that they are more strictly censoring than their competitors.
One crucial question in this context to answer is: What kinds of features lead to a given post being likely to be censored? Previous work trying to answer this question (1) ignores the multi-modal nature of social networks and only focuses on the text content, and (2) relies on narrow datasets collected by tracking small number of users over a few months rather than years. Thus, these approaches produce results that are limited and biased toward whatever was trending.
My thesis: Censors pay the most attention to these factors: the user who has posted the content, number of reposts, and the sentiment of the text content than other factors, with the first factor being the strongest. I attempt to support this thesis by using data mining techniques to uncover censors' policies and priorities in Chinese social networks, specifically Sina Weibo. I take a multi-modal approach that takes text content, image content, metadata and other factors, e.g., sentiment, into account. The goals of my thesis are to: 1) investigate how different factors such as text, image, and metadata, etc., correlate with censorship, and how consistently and quickly different topics are censored, 2) determine to what extent censorship is based on the person being posted about, 3) determine to what extent censorship is based on the person posting the post, and 4) predict censorship by considering all available information.
Language
English
Keywords
Social networks, censorship, machine learning, deep learning, NLP
Document Type
Dissertation
Degree Name
Computer Science
Level of Degree
Doctoral
Department Name
Department of Computer Science
First Committee Member (Chair)
Jedidiah R. Crandall
Second Committee Member
Abdullah Mueen
Third Committee Member
Marina Kogan
Fourth Committee Member
Michael Tschantz
Recommended Citation
Navaki Arefi, Meisam. "Data Mining of Chinese Social Networks: Factors That Indicate Post Deletion." (2020). https://digitalrepository.unm.edu/cs_etds/106