Two sides of the same coin: a very wonky sentiment analysis of private accounts

The week before midterms, the writing prompt was “what is your favourite online identity?”. Of course, I’ll act somewhat differently online. When you have a whole profile to set up that represents you, that’s a chance to display the most interesting, marketable or likeable parts about yourself. I’m not going to pretend to be holy and say I don’t act differently on social media, because I definitely do try to portray myself a certain way.

Since I already know that my “social media personality” is different from how I behave in real life, simply talking about that would be a very short post. Instead, I decided to think about different roles I might take on within social media. Do I have the same persona across the board? While it’s an interesting question, it’s also a very subjective one. How do you measure it so you can compare? Well, we are “scientists” after all, so I figured I’d trust our good old friend: data! Numbers can tell us a lot, which led me to try my hand at a (very unprofessional) sentiment analysis of some of my own social media accounts. Let me explain!

Priv account only accepting mutuals soz πŸ”πŸ™…β€β™€οΈπŸ’€

Comparing data from different social media platforms might be difficult, especially if they support different media types. But what if we have data to compare from the same platform?

Before the days of Circles and close friends stories and whatnot, people found other ways to post to specific audiences only. The main way to do so was simply to make a new account and only let friends follow it. A good example is the ‘finsta’ (friend insta): separate private accounts to spam pictures and post more personal photos and stories for friends, which makes it a more freeing experience. If your posts aren’t going to be visible to your grandma, that one annoying classmate from year 8, your crush’s best friend and your employer, you don’t have to think so hard about how your profile looks.

Instagram data for someone I know. Fewer followers, more freedom to post.

Having one account for public use and one for more personal content just makes sense nowadays. This phenomenon is also rather common on Twitter, the good thing about which is that Twitter is much more text-based than, say, Instagram. What this results in is that some of us have (nearly) a decade of data from accounts that serve different purposes. I am one of those someones, and I figured it would be fun to compare the ‘feel’ of these accounts. Am I much more negative on one account than another? How much do I use my main account to talk about the personality aspects I want to show the world?

How I compared the data

I’m choosing accounts that I still use actively for the most accurate data. The two accounts I’m working with are as follows: one that has been regularly in use since 2016 with around 12.000 tweets and around 600 photos/videos, and one from 2014 that approaches 10.000 tweets and around 500 photos/videos. They’re similar enough in those respects, since I created them around the same time (2014) and have similar stats when viewed objectively. How does the content compare, though?

Since one account is on ‘private’ mode, I couldn’t simply use web crawlers, since they require either public access to the data or the login details. Now, the website doesn’t like that, and it does everything to prevent you from using web crawlers. Therefore I simply did not bother with alternative crawlers. Instead, Twitter offers an ‘archive’ option, where you can download all of your Tweets and some other useful data from your account. This used to be a simple .csv with only the tweet content and the amount of interactions, but they have now resorted to a package of different filetypes that allow you to view an offline mirror of your own account. Maybe very useful for the average user, but not if you want to analyse anything.

Twitter makes it easy to download all your Twitter content into one handy zip file.

I took the JSON file that contains all the tweets, and used Excel to turn only the ‘tweet text’ of every item into one big csv file. Now I had a file for each account that contained all my tweets from 6-8 years! Nifty.

Of course, it won’t do much to manually go through everything. To turn all these tweets into useful data, I (and my partner who occasionally took over when I got frustrated, thank you!) used Python. First, we used a few steps to separate the data into a huge list of every individual word. Another function or two stripped out any stop words like pronouns, articles and conjunctions. Then we ran a counter on it to get some frequency data. Now, there are some tools out there to automatically run a sentiment analysis, but I think that’s 1. too expensive or complicated, and 2. no fun. So I’ll simply go through the most used words and show you some of the results.

So… how different is my private account?

Now, the data might not be perfect, but they are very interesting.

To begin with, I got confronted with my potty mouth:

Which also already sets the scene for the rest of the analysis. I’m nearing the word limit, so I’ll let the results speak for themselves. Keep in mind that these are taken out of context and that only exact matches of words are displayed, but it gives a good general overview. It seems that my private account contains much more negative and personal content, while my main account shows more about my interests and positive experiences. My conclusion: yes, I am very different people across these accounts. I don’t have a favourite, though!

Highlights from the main account frequency data: much more varied and hobby-related
Highlights from the private account frequency data: generally more negative and personal overall