在R中创建训练和测试数据集

作者: 银角
发布时间: 2024-08-14 03:29:00 (24天前)
转自：

2 条回复

0#
回复此人
甲基蓝 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 可能的解决方案涉及将采样的索引存储在单独的命名向量中。 </p> <pre> <code> train_idx <- sample(1:nrow(mydata),1000,replace=FALSE) train <- mydata[train_idx,] # select all these rows test <- mydata[-train_idx,] # select all but these rows </code> </pre> <P> 另外，知道了 <code> data.frame </code> 的 <code> row.names </code> 属性必须包含唯一值，你也可以设置，例如 </p> <pre> <code> test <- mydata[!(row.names(mydata) %in% row.names(train)), ] </code> </pre> <P> 但第二种解决方案慢了2倍 <code> mydata <- data.frame(a=1:100000, b=rep(letters, len=100000)) </code> ，按照衡量 <code> microbenchmark() </code> 。 </p> </DIV>

编辑

登录后才能参与评论