← Back to projects

Reddit Confessions Through a Temporal Lens

A small NLP project on roughly 50,000 r/confession posts, built around a question I found hard to shake: does when people confess change what they confess?

This project started from a question that felt oddly revealing for how small it was: does the time someone writes a confession change the kind of thing they confess?

That question stuck with me because confession is one of those behaviors that obviously depends on context, but we usually talk about that context in broad social terms — anonymity, shame, guilt, audience, distance. Time is a quieter variable. Still, it seemed plausible that it mattered. Someone posting at 2 PM on a Tuesday is probably in a different mental and social setting than someone posting late at night, alone with their thoughts and a throwaway Reddit account.

So I took a dataset of roughly 50,000 posts from r/confession and tried to see whether the clock leaves a visible mark on what people say, how they say it, and what kinds of emotional worlds those posts seem to come from.

It turns out it does.

How I approached it

The raw data gave me post IDs, scores, titles, body text, authors, subreddit labels, and UTC timestamps. The first thing I had to get right was time itself.

Reddit stores timestamps in UTC, but that is not the time that psychologically matters. If someone posts at 3 AM Eastern, that carries a very different meaning from 3 AM UTC when they are actually sitting in California at midnight. So the pipeline does DST-aware conversion from UTC to US/Eastern, then derives features like time of day and weekday versus weekend from the localized timestamps. I also filtered out [removed], [deleted], and very short posts that were unlikely to carry much signal.

After that, the project split into two modes.

The first was distant reading: looking across the whole corpus for broad temporal patterns. I used TF-IDF over unigrams, bigrams, and trigrams to surface which terms were unusually associated with different time windows. I trained Word2Vec embeddings on the corpus and built simple bias axes to measure directional semantic shifts between categories. I also used t-SNE projections to visualize how vocabulary clustered across temporal slices.

The second was close reading. Once the quantitative layer highlighted terms with strong temporal distinctiveness, I used those to pull out actual posts and read them. That mattered a lot. I did not want this to become a project where the statistics felt clever but detached from what people were actually writing. The computational pass was there to tell me where to look; the real interpretation still had to come from reading the confessions themselves.

What came out of it

The biggest surprise was not that there were differences, but that they were so legible.

Daytime posts skewed more institutional. Their vocabulary leaned toward school, work, obligations, and public-facing social life. A lot of the confessions felt tied to roles people perform while moving through the visible structure of everyday life: lying to a boss, cheating on something, saying the wrong thing in a family or school setting, getting trapped inside some bureaucratic or social expectation.

Nighttime posts felt more intimate and heavier. Terms like betrayed and loved ones surfaced more strongly, and the posts themselves often read as if the person had been sitting with the confession for a while before finally writing it down. They were more likely to circle guilt, grief, regret, relationships, and the kinds of thoughts that get louder when the day stops imposing structure.

The weekday/weekend split was interesting in a different way. Weekdays carried a strange mix of mundane and reflective language — practical terms living next to unexpectedly expansive ones. Weekend posts, by contrast, were more social, more immediate, and often more tied to what had just happened: parties, dating, nights out, interpersonal fallout.

None of this proves some grand theory of human nature. But it does suggest that time acts as a kind of filter on confession. People are not just posting randomly throughout the week. The hour and the day seem to correlate with what part of life is pressing on them strongly enough to make them say something they otherwise would not.

What I found most interesting

What stayed with me was how much you can recover from text with methods that are, at this point, not remotely fashionable.

There is nothing especially exotic here. TF-IDF is old. Word2Vec is old. t-SNE is old. But if the question is clean enough, and if you partition the data in a way that actually lines up with human experience, simple tools still do real work. They can point to structure that is not obvious from casual browsing and give you a much better sense of where qualitative reading should start.

That was probably the main methodological lesson I took from the project. A lot of the time, the bottleneck is not that you need a fancier model. It is that you need a sharper question.

I also liked that this project sat in a nice middle ground between computational analysis and actual interpretation. Pure text mining can get sterile fast. Pure close reading does not scale. Here, the two were doing what they are best at: the statistics narrowed the field, and the reading supplied the meaning.

Why I still like this project

I like this one because it was small enough to finish but still ended up saying something real.

It started with a question that could have easily produced noise, and instead it gave me a pretty clear picture of how temporal setting shapes self-disclosure online. More than that, it reminded me that a lot of good analysis comes from noticing an ordinary variable everyone else is willing to treat as background.

Sometimes the interesting thing is not hidden deep inside the model. Sometimes it is just the clock.

Tech stack

  • Python
  • pandas / NumPy
  • scikit-learn
  • gensim
  • matplotlib
  • Bokeh