在Spark Java中按字数分配文本

作者: 不易青年。
发布时间: 2024-07-08 03:58:05 (20天前)
转自：

2 条回复

0#
回复此人
离线请留言 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 欢迎来到SO！ </p> <P> 这是Scala中的一个解决方案，您可以轻松地适应Java。 </p> <pre> <code> val df = spark.createDataset(Seq( "debt ceiling", "declaration of tax", "decryption", "sweats" )).toDF("input") df.select(size(split('input, "\\s+")).as("words")) .groupBy('words) .count .orderBy('words) .show </code> </pre> <P> 这产生了 </p> <pre> <code> +-----+-----+ |words|count| +-----+-----+ | 1| 2| | 2| 1| | 3| 1| +-----+-----+ </code> </pre> </DIV>

编辑

登录后才能参与评论