将嵌套循环计算转换为Numpy以加速

作者: SHOU宅大可爱
发布时间: 2024-11-11 04:11:19 (6天前)
转自：

3 条回复

0#
回复此人
نسر الصحراء | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> 如果 <code> gr </code> 是一个浮点数列表，如果你想用NumPy进行矢量化将是转换的第一步 <code> gr </code> 到一个NumPy数组 <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html" rel="nofollow"> <code> np.array() </code> </A> 。 </p> <P> 接下来，我假设你有 <code> new_gr </code> 用零形状初始化 <code> (height,width) </code> 。在两个最里面的循环中执行的计算基本上代表 <code> 2D convolution </code> 。所以，你可以使用 <a href="http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.signal.convolve2d.html" rel="nofollow"> <code> signal.convolve2d </code> </A> 适当的 <code> kernel </code> 。决定 <code> kernel </code> ，我们需要查看缩放因子并制作一个 <code> 3 x 3 </code> 从它们中取出内核并否定它们来模拟我们在每次迭代时所做的计算。因此，您将获得一个矢量化解决方案，其中两个最内层的循环被移除以获得更好的性能，如此 - </p> <pre> <code> import numpy as np from scipy import signal # Get the scaling factors and negate them to get kernel kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]]) # Initialize output array and run 2D convolution and set values into it out = np.zeros((height,width)) out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2] </code> </pre> <P> 的<strong> 验证输出和运行时测试 </强> </p> <P> 定义功能： </p> <pre> <code> def org_app(gr,t): new_gr = np.zeros((height,width)) for h in xrange(1, height-1): for w in xrange(1, width-1): new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w]) return new_gr def proposed_app(gr,t): kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]]) out = np.zeros((height,width)) out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2] return out </code> </pre> <P> 验证 - </p> <pre> <code> In [244]: # Inputs ...: gr = np.random.rand(40,50) ...: height,width = gr.shape ...: t = 1 ...: In [245]: np.allclose(org_app(gr,t),proposed_app(gr,t)) Out[245]: True Timings - In [246]: # Inputs ...: gr = np.random.rand(400,500) ...: height,width = gr.shape ...: t = 1 ...: In [247]: %timeit org_app(gr,t) 1 loops, best of 3: 2.13 s per loop In [248]: %timeit proposed_app(gr,t) 10 loops, best of 3: 19.4 ms per loop </code> </pre> </DIV>

编辑
1#
回复此人
故人 | 2019-08-31 10-32

<div class =“post-text”itemprop =“text”> <P> @Divakar，我尝试了几种变体 <code> org_app </code> 。完全矢量化的版本是： </p> <pre> <code> def org_app4(gr,t): new_gr = np.zeros((height,width)) I = np.arange(1,height-1)[:,None] J = np.arange(1,width-1) new_gr[I,J] = gr[I,J] + gr[I,J-1] + gr[I-1,J] + t * gr[I+1,J-1]-2 * (gr[I,J-1] + t * gr[I-1,J]) return new_gr </code> </pre> <P> 虽然你的速度只有一半 <code> proposed_app </code> ，它的风格更接近原作。因此可以帮助理解嵌套循环如何被矢量化。 </p> <P> 重要的一步是转换 <code> I </code> 进入一个列数组，这样就可以了 <code> I,J </code> 索引一个值块。 </p> </DIV>

编辑

登录后才能参与评论