TY - GEN
T1 - Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression, MultinominalNB and Support Vector Machine
AU - Patel, Ankitkumar
AU - Meehan, Kevin
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/6/10
Y1 - 2021/6/10
N2 - The distribution of misleading information or fake news has become a problem for society in recent times. In the world of social media, where anyone can share their opinions, beliefs and make it sound like these are fact, fake news becomes a threat to the reputation of companies and to people. In 2016, the USA Presidential elections gathered more attention from the generation of fake news articles, leading to a huge number of researchers and scientists to explore this Natural Language Processing research area with a sense of urgency and keen interest. However, investigation regarding what people are consuming from social media is in early stages and efforts are in progress to explore how people can separate disinformation from truthful content. The primary challenge in fake news detection is determining how to detect it. Supervised learning methods help us to detect these stories using labelled data to determine if text is real or fake. This research aims to develop and compare supervised learning models using Logistic Regression, MultinominalNB, and Support Vector Machine with CountVectorizer and Term Frequency -Inverse Document Frequency methods on Reddit data. The research concludes that the CountVectorizer and MultinominalNB model achieved highest accuracy on the Reddit dataset.
AB - The distribution of misleading information or fake news has become a problem for society in recent times. In the world of social media, where anyone can share their opinions, beliefs and make it sound like these are fact, fake news becomes a threat to the reputation of companies and to people. In 2016, the USA Presidential elections gathered more attention from the generation of fake news articles, leading to a huge number of researchers and scientists to explore this Natural Language Processing research area with a sense of urgency and keen interest. However, investigation regarding what people are consuming from social media is in early stages and efforts are in progress to explore how people can separate disinformation from truthful content. The primary challenge in fake news detection is determining how to detect it. Supervised learning methods help us to detect these stories using labelled data to determine if text is real or fake. This research aims to develop and compare supervised learning models using Logistic Regression, MultinominalNB, and Support Vector Machine with CountVectorizer and Term Frequency -Inverse Document Frequency methods on Reddit data. The research concludes that the CountVectorizer and MultinominalNB model achieved highest accuracy on the Reddit dataset.
KW - CountVectorizer
KW - Fake news detection
KW - Logistic Regression
KW - MultinominalNB
KW - Supervised Learning Methods
KW - Support Vector Machine
UR - http://www.scopus.com/inward/record.url?scp=85114422700&partnerID=8YFLogxK
U2 - 10.1109/ISSC52156.2021.9467842
DO - 10.1109/ISSC52156.2021.9467842
M3 - Conference contribution
AN - SCOPUS:85114422700
T3 - 2021 32nd Irish Signals and Systems Conference, ISSC 2021
BT - 2021 32nd Irish Signals and Systems Conference, ISSC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd Irish Signals and Systems Conference, ISSC 2021
Y2 - 10 June 2021 through 11 June 2021
ER -