Deep Learning for Identifying Language Features that Differentiate Mental Health Communities on Reddit
Published:
Abstract: In the domain of suicide research and clinical practice, it is important to distinguish between those who are depressed, those who have the capacity for self-harm, and those who are at high-risk for suicide attempts. In this work, we use the tools of Deep Learning to understand what language features differentiate these modes of suicidality. Specifically, we use Reddit communities r/depression, r/StopSelfHarm, and r/SuicideWatch, as language proxies for those who are depressed, capable of self-harm, and suicidal, respectively. We train a spectrum of deep learning models in a single-label, multi-class context to predict to which subreddit a post belongs. Then we conduct a feature importance study to identify the language features that were most useful in making predictions, which we interpret as the language features that differentiate the stated modes of suicidality.
This is a team project for Georgia Tech CS 4644 Deep Learning course, and we got finalist out of a total of 56 teams! Will share the our paper once we clean it up : )