Twitter Sentiment Mod 4 Project

Phillipojo
2 min readFeb 2, 2021

Business Problem

Twitter has over 300 million monthly active users, which allows businesses to reach a broad audience and connect with customers without traditional marketing techniques. On the downside, there’s so much information that it’s hard for brands to quickly detect negative social mentions that could harm their business.

That’s why sentiment analysis, which involves monitoring emotions in conversations on social media platforms, has become a key strategy in social media marketing.Listening to how customers feel on Twitter allows companies to understand their audience, keep on top of what’s being said about their brand, and their competitors, and discover new trends in the industry.

The Data

For this post I am going to be using a data set taken from the Data.World website. The dataset consists of a list of 9,000 tweets. The particular sentiment you are asked to identify in this problem is wether or not the tweet Negative , Positive or Nuetral for each.

Text Cleaning

Most text data will likely need some processing in order for the chosen machine learning algorithm to perform well. In this case each text document is a tweet and therefore will contain lots of characters that will not be meaningful to any machine learning algorithm. You can see below the text cleaning regressions used on this data set for pre processing. Some libaries come with a built in “Tweet Cleaner”, but for this dataset it wasnt enough. This part is an iterative process because some text datasets require more cleaning than other.

Heres the output after running this function.

We still have some mispelled words and some numbers so this would take some more cleaning before we vectorize.

--

--