使用Kiba-ETL将表转换为集合的散列

作者: v-star*위위
发布时间: 2024-12-27 03:38:09 (6天前)
转自：

3 条回复

0#
回复此人
CC-f | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 你的解决方案可以很好地工作，而且在Kiba中实际设计这种设计的原因（主要是“普通的旧Ruby对象”）是为了让你自己轻松调用组件，如果你需要的话！（这对测试非常有用！）。 </p> <P> 这说的是一些额外的可能性。 </p> <P> 您正在做的是一种聚合形式，可以通过各种方式实现。 </p> <H2> 缓冲目的地 </H2> <P> 实际上，缓冲区在这里是一行。使用如下代码： </p> <pre class="lang-ruby prettyprint-override"> <code> class MyBufferingDestination attr_reader :single_output_row def initialize(config:) @single_output_row = [] end def write(row) row.each do |col, col_val| single_output_row[col] += [col_val] end end def close # will be called by Kiba at the end of the run # here you'd write your output end end </code> </pre> <H2> 使用实例变量聚合+ post_process块 </H2> <pre class="lang-ruby prettyprint-override"> <code> pre_process do @output_row = {} end transform do |row| row.each do |col, col_val| @output_row = # SNIP end row end post_process do # convert @output_row to something # you can invoke a destination manually, or do something else end </code> </pre> <H2> 很快就可能：使用缓冲变换 </H2> <P> 如上所述 <a href="https://github.com/thbar/kiba/issues/53" rel="nofollow noreferrer"> 这里 </A> ，很快就可以创建缓冲变换，以便更好地将聚合机制与目标本身分离。 </p> <P> 会是这样的： </p> <pre class="lang-ruby prettyprint-override"> <code> class MyAggregatingTransform def process(row) @aggregate += xxx nil # remove the row from the pipeline end def close # not yet possible, but soon yield @aggregate end end </code> </pre> <P> 这将是最好的设计，因为那样你就可以重用现有的目的地，而不需要修改它们来支持缓冲，这样它们就会变得更加通用了。可重复使用： </p> <pre class="lang-ruby prettyprint-override"> <code> transform MyAggregatingTransform destination MyJSONDestination, file: "some.json" </code> </pre> <P> 通过检测输入数据集中的边界，甚至可以在目的地中具有多行。相应地屈服。 </p> <P> 一旦可能，我将更新SO答案。 </p> </DIV>

编辑
1#
回复此人
特狼普 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 好的 - 所以，在工作环境中使用Kiba似乎不是这个工具的用途。我想使用Kiba，因为我已经为这个项目实现了很多相关的E，T和L代码，并且重用将是巨大的。 </p> <P> 所以，如果我有重用的代码，但我不能在Kiba框架中使用它，我可以称之为正常代码。这完全归功于Thibaut极其简约的设计！ </p> <P> 这是我解决问题的方法： </p> <pre class="lang-ruby prettyprint-override"> <code> source = CSVOrXLSXSource.new("data.xlsx", document_config: { some: :settings }) xformer = ColumnSetTransformer.new source.each do |row| xformer.process(row) end p xformer.col_set # col_set must be attr_reader on this class. </code> </pre> <P> 现在我有我的数据轻松转换:) </p> </DIV>

编辑

登录后才能参与评论