It is necessary to have a model capable of predicting marginal heatmap的简体中文翻译

It is necessary to have a model cap

It is necessary to have a model capable of predicting marginal heatmaps in order to use the prediction strategy outlined in Section. Since pose estimation data is inherently spatial, convolutional layers are a natural foundation for the model.The calculation performed by each convolutional layer is spatially local. That is, for any given output pixel, the value of that pixel is calculated using input pixels that are within a fixed spatial neighbourhood. This is appropriate when both the input and output images exist in the same coordinate space and there is a correlation between the locations of input and output features. For example, in 2D pose estimation the output heatmaps and input RGB image both exist in xy coordinate space, and the ground truth target spherical Gaussians align with the joints in the input image.However, we require our model to not only output an xy heatmap,Hˆ (xy), but also heatmaps that have one axis in the z-direction, Hˆ (zy) and Hˆ (xz). This poses a challenge for convolution-based computation.Consider the case of predicting a heatmap in the zy-plane, Hˆ(zy), from an input image in the xy-plane. In general, a location in the z-direction does not correspond to a location in the x-direction. This means that there may be quite some distance between visual evidence in the input image and the desired prediction location in the output image. Such an arrangement is generally not ideal for convolutional neural networks. For 3D pose estimation, the spatial discrepancy is neveralong both axes at once (Table shows axis correspondences for each of the three heatmaps). It is therefore desirable to preserve spatial locality of computation along theappropriate axes.Axis permutation.By transposing the intermediate activations it is possible to permute the axis undergoing spatially-local calculations with the axis undergoingdensely connected calculations. Therefore the model can bebuilt using convolutional layers without depending on spatial correspondence between mismatched axes. This allowsthe model to aggregate depth cues into feature maps, whichwill then become pixel values along the z-axis. Figure 5illustrates the axis permutation operation for Hˆ (zy). Notethat the permutation operation is simply a fixed manipulation of the activations, and does not add any parameters tothe model.Overall model architecture.Figure 6 illustrates the arrangement of residual blocks we used to produce heatmapsfrom image features. Residual blocks are constructed as perResNet using “option C” shortcut connections. For thenetwork paths predicting Hˆ (zy)and Hˆ (xz), the axis permutation operation is applied mid-way through the stage.The complete model is assembled according to Figure 4.Features are extracted from 256 × 256 pixel input imagesusing a truncated Inception v4 model. Multiple heatmap prediction stages are stacked together after the featureextractor to increase the capacity of the model. “Adapter”1 × 1 convolution layers are placed in between the stagesto combine the previous heatmap predictions into featuremaps, which are added with the previous stage’s input toform a large skip connection. This stacking technique is inspired by the Stacked Hourglass architecture for 2D pose estimation.
0/5000
源语言: -
目标语言: -
结果 (简体中文) 1: [复制]
复制成功!
为了使用本节中概述的预测策略,必须具有一个能够预测边际热图的模型。由于姿态估计数据本质上是空间的,所以卷积层是模型的自然基础。<br>每个卷积层执行的计算在空间上都是局部的。即,对于任何给定的输出像素,使用固定空间邻域内的输入像素来计算该像素的值。当输入和输出图像都存在于相同的坐标空间中并且输入和输出要素的位置之间存在相关性时,这是适当的。例如,在2D姿态估计中,输出热图和输入RGB图像都存在于xy坐标空间中,并且地面真实目标球面高斯分布与输入图像中的关节对齐。<br><br>但是,我们要求模型不仅输出xy热图Hˆ(xy),而且还输出在z方向上有一个轴Hˆ(zy)和Hˆ(xz)的热图。这对基于卷积的计算提出了挑战。请考虑从xy平面中的输入图像预测zy平面中的热图Hˆ(zy)的情况。通常,z方向上的位置不对应于x方向上的位置。这意味着输入图像中的视觉证据与输出图像中所需的预测位置之间可能存在相当大的距离。这种布置对于卷积神经网络通常不是理想的。对于3D姿态估计,永远不会出现空间差异<br>沿两个轴一次(表显示了三个热图的每个轴的对应关系)。因此,期望沿<br>适当的轴保留计算的空间局部性。<br><br>轴排列。<br>通过转换中间激活,可以对进行空间局部计算的轴与进行<br>密集连接的计算轴进行置换。因此,可以<br>使用卷积层构建模型,而无需依赖于不匹配轴之间的空间对应关系。这允许<br>模型将深度提示聚合到特征图中,然后<br>将这些特征变为沿z轴的像素值。图5<br>示出了H <br>ˆ(zy)的轴置换操作<br>。注意<br>置换操作仅是激活的固定操作,并且不会向<br>模型添加任何参数。<br><br>总体模型架构。<br>图6说明了我们用来<br>从图像特征生成热图的残差块的排列方式。残余块是根据<br>ResNet使用“选项C”快捷连接来构造的。对于<br>预测H <br>ˆ(zy)<br>和H <br>ˆ(xz)的网络路径<br>,在阶段的中间进行轴置换操作。<br><br>根据图4组装完整的模型。<br>从256×256像素输入图像中提取特征。<br>使用截断的Inception v4模型。在特征<br>提取器之后,将多个热图预测阶段堆叠在一起,以增加模型的容量。在各阶段之间放置“ Adapter” 1×1卷积层,<br>以将先前的热图预测合并到特征<br>图中,然后将其与先前阶段的输入相加以<br>形成较大的跳过连接。这种堆叠技术的灵感来自用于2D姿态估计的Stacked Hourglass体系结构。
正在翻译中..
结果 (简体中文) 2:[复制]
复制成功!
有必要有一个能够预测边际热图的模型,以便使用节中概述的预测策略。由于姿势估计数据本质上是空间的,因此卷积层是模型的自然基础。<br>每个卷积层执行的计算在空间上是局部的。也就是说,对于任何给定的输出像素,该像素的值是使用固定空间邻区内的输入像素计算的。当输入和输出图像都存在于相同的坐标空间中且输入和输出要素的位置之间有相关性时,这是适当的。例如,在 2D 姿势估计中,输出热图和输入 RGB 图像都存在于 xy 坐标空间中,并且地面实真目标球形高斯与输入图像中的关节对齐。<br><br>但是,我们要求我们的模型不仅输出 xy 热图,+H (xy),而且输出在 z 方向、H= (zy) 和 H= (xz) 中具有一个轴的热图。这给基于卷积的计算带来了挑战。考虑从 xy 平面中的输入图像预测 zy 平面中热图(zy)的情况。通常,z 方向中的位置与 x 方向中的位置不对应。这意味着输入图像中的视觉证据与输出图像中所需的预测位置之间可能有相当的距离。这种排列通常不适合卷积神经网络。对于 3D 姿势估计,空间差异从不<br>沿两个轴一次(表显示三个热图中每个轴对应关系)。因此,最好沿<br>适当的轴。<br><br>轴排列。<br>通过转置中间激活,可以置换正在进行空间局部计算的轴,而轴正在进行<br>密集连接的计算。因此,模型可以<br>使用卷积层构建,而不依赖于不匹配的轴之间的空间对应关系。这允许<br>将深度提示聚合到要素图中的模型,该模型<br>然后将成为沿 z 轴的像素值。图5<br>说明了 H 的轴置换操作<br>• (zy)<br>.注意<br>排列操作只是激活的固定操作,不添加任何参数<br>模型。<br><br>整体模型体系结构。<br>图 6 说明了我们用于生成热图的残余块的排列<br>从图像功能。残块是按照<br>使用"选项 C"快捷方式连接进行 ResNet。对于<br>网络路径预测 H<br>• (zy)<br>和 H<br>• (xz)<br>,轴排列操作在舞台的中途应用。<br><br>根据图 4 组装完整的模型。<br>从 256 个 256 像素×图像中提取特征<br>使用截断的"初始 v4"模型。多个热图预测阶段在要素之后堆叠在一起<br>提取器,以增加模型的容量。"适配器"1 × 1 卷积
正在翻译中..
结果 (简体中文) 3:[复制]
复制成功!
It is necessary to have a model capable of predicting marginal heatmaps in order to use the prediction strategy outlined in Section. Since pose estimation data is inherently spatial, convolutional layers are a natural foundation for the model.The calculation performed by each convolutional layer is spatially local. That is, for any given output pixel, the value of that pixel is calculated using input pixels that are within a fixed spatial neighbourhood. This is appropriate when both the input and output images exist in the same coordinate space and there is a correlation between the locations of input and output features. For example, in 2D pose estimation the output heatmaps and input RGB image both exist in xy coordinate space, and the ground truth target spherical Gaussians align with the joints in the input image.However, we require our model to not only output an xy heatmap,Hˆ (xy), but also heatmaps that have one axis in the z-direction, Hˆ (zy) and Hˆ (xz). This poses a challenge for convolution-based computation.Consider the case of predicting a heatmap in the zy-plane, Hˆ(zy), from an input image in the xy-plane. In general, a location in the z-direction does not correspond to a location in the x-direction. This means that there may be quite some distance between visual evidence in the input image and the desired prediction location in the output image. Such an arrangement is generally not ideal for convolutional neural networks. For 3D pose estimation, the spatial discrepancy is neveralong both axes at once (Table shows axis correspondences for each of the three heatmaps). It is therefore desirable to preserve spatial locality of computation along theappropriate axes.Axis permutation.By transposing the intermediate activations it is possible to permute the axis undergoing spatially-local calculations with the axis undergoingdensely connected calculations. Therefore the model can bebuilt using convolutional layers without depending on spatial correspondence between mismatched axes. This allowsthe model to aggregate depth cues into feature maps, whichwill then become pixel values along the z-axis. Figure 5illustrates the axis permutation operation for Hˆ (zy). Notethat the permutation operation is simply a fixed manipulation of the activations, and does not add any parameters tothe model.Overall model architecture.Figure 6 illustrates the arrangement of residual blocks we used to produce heatmapsfrom image features. Residual blocks are constructed as perResNet using “option C” shortcut connections. For thenetwork paths predicting Hˆ (zy)and Hˆ (xz), the axis permutation operation is applied mid-way through the stage.The complete model is assembled according to Figure 4.Features are extracted from 256 × 256 pixel input imagesusing a truncated Inception v4 model. Multiple heatmap prediction stages are stacked together after the featureextractor to increase the capacity of the model. “Adapter”1 × 1 convolution layers are placed in between the stagesto combine the previous heatmap predictions into featuremaps, which are added with the previous stage’s input toform a large skip connection. This stacking technique is inspired by the Stacked Hourglass architecture for 2D pose estimation.<br>
正在翻译中..
 
其它语言
本翻译工具支持: 世界语, 丹麦语, 乌克兰语, 乌兹别克语, 乌尔都语, 亚美尼亚语, 伊博语, 俄语, 保加利亚语, 信德语, 修纳语, 僧伽罗语, 克林贡语, 克罗地亚语, 冰岛语, 加利西亚语, 加泰罗尼亚语, 匈牙利语, 南非祖鲁语, 南非科萨语, 卡纳达语, 卢旺达语, 卢森堡语, 印地语, 印尼巽他语, 印尼爪哇语, 印尼语, 古吉拉特语, 吉尔吉斯语, 哈萨克语, 土库曼语, 土耳其语, 塔吉克语, 塞尔维亚语, 塞索托语, 夏威夷语, 奥利亚语, 威尔士语, 孟加拉语, 宿务语, 尼泊尔语, 巴斯克语, 布尔语(南非荷兰语), 希伯来语, 希腊语, 库尔德语, 弗里西语, 德语, 意大利语, 意第绪语, 拉丁语, 拉脱维亚语, 挪威语, 捷克语, 斯洛伐克语, 斯洛文尼亚语, 斯瓦希里语, 旁遮普语, 日语, 普什图语, 格鲁吉亚语, 毛利语, 法语, 波兰语, 波斯尼亚语, 波斯语, 泰卢固语, 泰米尔语, 泰语, 海地克里奥尔语, 爱尔兰语, 爱沙尼亚语, 瑞典语, 白俄罗斯语, 科西嘉语, 立陶宛语, 简体中文, 索马里语, 繁体中文, 约鲁巴语, 维吾尔语, 缅甸语, 罗马尼亚语, 老挝语, 自动识别, 芬兰语, 苏格兰盖尔语, 苗语, 英语, 荷兰语, 菲律宾语, 萨摩亚语, 葡萄牙语, 蒙古语, 西班牙语, 豪萨语, 越南语, 阿塞拜疆语, 阿姆哈拉语, 阿尔巴尼亚语, 阿拉伯语, 鞑靼语, 韩语, 马其顿语, 马尔加什语, 马拉地语, 马拉雅拉姆语, 马来语, 马耳他语, 高棉语, 齐切瓦语, 等语言的翻译.

Copyright ©2024 I Love Translation. All reserved.

E-mail: