Probabilistic_Classification_Model

brief description of Data:

The data consists of a tweet which is text data which has two classes[target] whether it is a real disaster or not disaster. Along with the text we have location and keyword.

data attributes used for classification:

Text: it contains the text of tweet.
Target: The target attribute consists of two classes disaster/ not disaster.

Classification task:

predicting whether a given tweet is about a real disaster or not a disaster. we have applied various classification/linear techniques and probability measures for the determining the classification task.

List of steps:

Probabilities and Zipf’s Law

a) Rank, Frequency, and Probability distribution

b) Probability vs. Rank Plot

c) Regression line fit
Text Vectorization
Terms and Conditional Probabilities distribution
Classification

a) Probabilistic Naive Bayes Model

b) Linear Model

c) Non-linear Classification
Conlcusions