SPU本身使用向量操作,每个时钟周期可以执行多达8条浮点指令。
An SPU USES vector operations itself and can thereby execute up to eight floating point instructions per clock cycle.
通过保持数据对齐并针对16字节边界进行填充,向量操作就可以毫不费力地执行了。
By keeping the data aligned and padded to 16-byte boundaries, vector operations can be performed effortlessly.
如果对代码进行向量化,由于您正在将这些值当作字节进行处理,这意味着每条指令都要一次操作 16 个值!
If you vectorize this code, since you are treating the values as bytes, that means that each instruction will operate on 16 values at once!
应用推荐