The Role of Domain-Specific Pretraining in Digital Discourse Analysis

Submitted by Anita Shiva on Thu, 11/02/2023 - 11:15am

Smriti Singh spends a lot of time on social media. But, not in the way most students do.

As a computer science graduate student, and Natural Language Processing (NLP) researcher, she’s interested in applying her knowledge to the social and ethical nature of online discourse. During her undergraduate studies, she worked on an algorithm to identify sexist threats on Twitter. In her graduate grounded NLP course, Smriti began to look at the effectiveness of traditional language and vision models in picking up misogynistic undertones in online memes.

Turns out, they aren’t very effective.

To help with her research, she recruited fellow second-year CS graduate student Amritha Haridasan. Like Smriti, Amritha noticed, and was disturbed by, the disproportionate amount of hate women on the internet, particularly public figures, receive.

“We don’t realize how much we interact with it every day,” Amritha states.

Amritha, who has experience working with computer vision models, hypothesized that a MultiModal model, incorporating both language and visual cues, would be more effective at identifying misogyny in memes. The existing body of research on MultiModal model use for misogyny detection is somewhat limited. Although separate language and visual-based models have been implemented to moderate content online for quite some time, Smriti and Amritha found that they are no longer sufficient for analyzing multimedia content, which is predominantly shared on social media platforms.

Smriti states that “Memes are very complex to interpret for language only models or even vision models. There are a lot of linguistic cues, but there are also a lot of visual cues that go into making these sort of subtly hateful or misogynistic memes.”

Under the guidance of their professor, Dr. Raymond Mooney, who supervised the project, Smriti and Amritha incorporated domain-specific pretraining to enhance the accuracy of their multimodal model. This technique empowers models to detect subtle visual and linguistic nuances by leveraging knowledge from an existing dataset. In this case, they utilized the Hateful Memes dataset from Facebook AI for pretraining, and trained their models on the Multimedia Automatic Misogyny Detection (MAMI), comprising 12,000 memes categorized as misogynistic. Each meme in the dataset is identified with one or more classifiers, including shaming, objectification, violence, and stereotypical.

As they sorted through the MAMI dataset for the qualitative analysis of the model performance, they were shocked by the sheer explicitness of language and imagery found within the memes. Smriti could only describe this experience as “positively tragic."

Continuing on, they conducted tests using a combination of language and vision models with their dataset. Notably, the blend of the language model BERT and vision model ViT proved to be the most effective once pretrained. Upon completing their study, they concluded that the pretrained BERT+ViT model markedly improved its capacity for detecting misogynistic memes. Additionally, Pretraining notably enhanced the model’s ability to classify misogyny. In a striking example, post-pretraining, BERT successfully identified objectification in an image solely based on language cues.

Smriti and Amritha presented their work at the Workshop on Online Abuse and Harm at ACL 2023, and their enthusiasm for further research in this field remains palpable. In fact, a recent emerging trend on Tik Tok has fueled their determination to dig deeper in this area.

“There's this real concept called ‘girl math’, where illogical math statements are being made. And it's women themselves, calling it ‘girl math’. I feel like that is a low sense of humor and a bit problematic,” Smriti explains.

Expanding the project to video would take significantly more computing resources, as each frame of the video would need to be put through the detection model. In the meantime, Smriti and Amritha are focusing their efforts on applying MultiModal detection to multiple languages. Their work not only sheds light on the prevalence of online misogyny but also breaks ground on an innovative method for identifying and addressing it.

News categories:

Graduate Students