我目前的精度和f1测量值都为1.00。我怀疑这是数据泄漏的结果。
我正在寻找尽可能减少数据泄漏的任何技巧。
谢谢。
这是我的python脚本:
import pandas as pd import numpy as np # Other imports here from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import classification_report from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split dataset = pd.read_csv("weather.csv") print(len(dataset)) dataset = pd.get_dummies(dataset, columns=["Date", "Location", "WindGustDir", "WindDir9am", "WindDir3pm",]) dataset["RainToday"] = dataset["RainToday"].map({'Yes': 1, 'No': 0}) dataset["RainTomorrow"] = dataset["RainTomorrow"].map({'Yes': 1, 'No': 0}) dataset.dropna(inplace=True) dataset = dataset.rename_axis(None) X = dataset.drop('RainTomorrow', axis=1) y = dataset['RainTomorrow'] X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20, random_state=216037514) classifier = RandomForestRegressor(n_estimators = 200, random_state = 216037514) classifier.fit(X_train,y_train) y_pred = classifier.predict(X_test) print("Report:\n", classification_report(y_test,y_pred)) print("Accuracy: ", accuracy_score(y_test,y_pred))
当前结果:
142193 Report: precision recall f1-score support 0 1.00 1.00 1.00 9026 1 1.00 1.00 1.00 2592 micro avg 1.00 1.00 1.00 11618 macro avg 1.00 1.00 1.00 11618 weighted avg 1.00 1.00 1.00 11618 Accuracy: 1.0