我认为您可以将现有逻辑简化为单个查询:
proc sql noprint _method; create table table_c as select a.participant, a.start as start_a, a.end as end_a, b.start_date as start_b, b.end_date as end_b, abs(a.end - b.end_date) as delta from table_a a inner join table_b b on a.participant = b.participant group by a.participant, start_a having delta = min(delta) ; quit;
只要您有足够的内存,日志输出就会确认这会执行散列连接:
NOTE: SQL execution methods chosen are: sqxcrta sqxsumg sqxsort sqxjhsh sqxsrc( WORK.TABLE_B(alias = B) ) sqxsrc( WORK.TABLE_A(alias = A) )
如果生成的表格与您尝试生成的表格不同,请澄清。
当您必须处理关系,最大覆盖范围与水中重叠等时,日期范围连接可能非常复杂...您当然不希望将联合和中介存储在最终解决方案中,尽管它们在调试逻辑时可以提供帮助。
这是一个相关的子查询技术,可以查找匹配A的“最佳”内容范围。如果内容数据不明显,则会出现问题 end_date 内 participant 。
end_date
participant
一切 one row(target)已完成查找。范围重叠逻辑很重要
one
where one.participant = two.participant and two.start_date < one.end and two.end_date > one.start
并允许内容日期范围部分超出目标范围。
data one; input participant start: ddmmyy. end: ddmmyy.; format start end yymmdd10.; datalines; 101 1-1-2010 26-4-2010 101 27-4-2010 2-10-2014 101 3-10-2014 4-1-2015 101 5-1-2015 31-8-2015 101 1-9-2015 12-10-2016 101 13-10-2016 31-12-2018 ; data two; input participant start_date: ddmmyy. end_date: ddmmyy. Content: $; format start_date end_date yymmdd10.; datalines; 101 1-1-2012 31-8-2012 A 101 1-9-2012 31-8-2013 B 101 1-9-2013 31-8-2014 C 101 1-9-2014 2-10-2014 D 101 3-10-2014 31-8-2015 E 101 1-9-2015 31-1-2016 F 101 1-9-2015 31-1-2017 F run; proc sql; create table want as select one.* , ( select min(content) from two where one.participant = two.participant and two.start_date < one.end and two.end_date > one.start group by participant having end_date = max(end_date) ) as content from one order by participant, start ; quit;