Spam filter using “spamassasin” dataset by preprocessing, feature extracting (vectorization) and cross validation