Discussion about Probabilities, Classification and Zipf's law
The data consists of a tweet which is text data which has two classes[target] whether it is a real disaster or not disaster. Along with the text we have location and keyword.
data attributes used for classification:
Text: it contains the text of tweet.
Target: The target attribute consists of two classes disaster/ not disaster.
predicting whether a given tweet is about a real disaster or not a disaster. we have applied various classification/linear techniques and probability measures for the determining the classification task.
Probabilities and Zipf’s Law
a) Rank, Frequency, and Probability distribution
b) Probability vs. Rank Plot
c) Regression line fit
Text Vectorization
Terms and Conditional Probabilities distribution
Classification
a) Probabilistic Naive Bayes Model
b) Linear Model
c) Non-linear Classification
Conlcusions