PROSAGA码农传奇-Nlp-如何使用scikit learning计算多类案例的精度，召回率，准确性和f1-得分？

如何使用scikit learning计算多类案例的精度，召回率，准确性和f1-得分？

作者: 狗头军师
发布时间: 2024-12-09 03:45:57 (13天前)

我正在研究情绪分析问题，数据看起来像这样：
<pre><code>label instances
 5 1190
 4 838
 3 239
 1 204
 2 127
</code></pre>所以，我的数据是不平衡的，因为1190 instances标有5。对于使用scikit的SVC进行的分类Im 。问题是我不知道如何以正确的方式平衡我的数据，以便准确计算多类案例的精度，查全率，准确性和f1得分。因此，我尝试了以下方法：第一：
<pre><code> wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
 wclf.fit(X, y)
 weighted_prediction = wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
 average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
 average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)
</code></pre>第二：
<pre><code>auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)

print 'F1 score:', f1_score(y_test, auto_weighted_prediction,
                            average='weighted')

print 'Recall:', recall_score(y_test, auto_weighted_prediction,
                              average='weighted')

print 'Precision:', precision_score(y_test, auto_weighted_prediction,
                                    average='weighted')

print '\n clasification report:\n', classification_report(y_test,auto_weighted_prediction)

print '\n confussion matrix:\n',confusion_matrix(y_test, auto_weighted_prediction)
</code></pre>第三：
<pre><code>clf = SVC(kernel='linear', C= 1)
clf.fit(X, y)
prediction = clf.predict(X_test)

from sklearn.metrics import precision_score, \
    recall_score, confusion_matrix, classification_report, \
    accuracy_score, f1_score

print 'Accuracy:', accuracy_score(y_test, prediction)
print 'F1 score:', f1_score(y_test, prediction)
print 'Recall:', recall_score(y_test, prediction)
print 'Precision:', precision_score(y_test, prediction)
print '\n clasification report:\n', classification_report(y_test,prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, prediction)

F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
 sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
 sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1082: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
 sample_weight=sample_weight)
 0.930416613529
</code></pre>但是，我收到这样的警告：
<pre><code>/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172:
DeprecationWarning: The default `weighted` averaging is deprecated,
and from version 0.18, use of precision, recall or F-score with 
multiclass or multilabel data or pos_label=None will result in an 
exception. Please set an explicit value for `average`, one of (None, 
'micro', 'macro', 'weighted', 'samples'). In cross validation use, for 
instance, scoring="f1_weighted" instead of scoring="f1"
</code></pre>如何正确处理我的不平衡数据，以便以正确的方式计算分类器的指标？