hive分析查询需要花费大量时间

作者: 你抱着孩子@先跑
发布时间: 2024-03-16 02:16:45 (4月前)
转自：

2 条回复

0#
回复此人
离线请留言 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 如果使用insert overwrite加载表，则可以通过设置自动收集统计信息 <code> hive.stats.autogather=true </code> 在插入覆盖查询期间。 </p> <P> 如果对表进行分区并以递增方式加载分区，则只能分析最后的分区。 </p> <pre> <code> ANALYZE TABLE [db_name.]tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] </code> </pre> <P> 看这里的例子： <a href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" rel="nofollow noreferrer"> https://cwiki.apache.org/confluence/display/Hive/StatsDev </A> </p> <P> 对于ORC文件，可以指定 <code> hive.stats.gather.num.threads </code> 提升并行性。 </p> <P> 在此处查看完整的统计设置列表： <a href="https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics" rel="nofollow noreferrer"> https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Statistics </A> </p> </DIV>

编辑

登录后才能参与评论