Enabling ppc64le support #68

pozdneev · 2018-08-17T12:59:58Z

The current version of the library supports ppc64 big-endian, but does not support ppc64 little-endian. I am trying to enable ppc64le support (see https://github.com/pozdneev/Simd/tree/enabling-ppc64le-support).

@ermig1979, Could you please advise on the endianness-sensitive code?

These are the tests that fail:

-fe=AddFeatureDifference \
-fe=BgraToBayer \
-fe=BgraToYuv420p \
-fe=BgraToYuv422p \
-fe=BgraToYuv444p \
-fe=BgrToYuv420p \
-fe=BgrToYuv422p \
-fe=BgrToYuv444p \
-fe=Bgr48pToBgra32 \
-fe=AveragingBinarization \
-fe=ConditionalCount16i \
-fe=ContourMetricsMasked \
-fe=AlphaBlending \
-fe=MeanFilter3x3 \
-fe=GaussianBlur3x3 \
-fe=SobelDx \
-fe=SobelDy \
-fe=ContourMetrics \
-fe=Laplace \
-fe=HistogramMasked \
-fe=HogDirectionHistograms \
-fe=NeuralConvert \
-fe=VectorProduct \
-fe=ReduceGray5x5 \
-fe=ShiftBilinear \
-fe=GetStatistic \
-fe=GetRowSums \
-fe=GetColSums \
-fe=GetAbsDyRowSums \
-fe=GetAbsDxColSums \
-fe=ValueSum \
-fe=SquareSum \
-fe=CorrelationSum \
-fe=TextureBoostedSaturatedGradient \
-fe=TextureBoostedUv \
-fe=Yuv444pToBgr \
-fe=Yuv422pToBgr \
-fe=Yuv420pToBgr \
-fe=Yuv444pToHue \
-fe=Yuv420pToHue

So far, I modified (see master...pozdneev:enabling-ppc64le-support):

src/Simd/SimdLoad.h
src/Simd/SimdMath.h
src/Simd/SimdStore.h
src/Simd/SimdVmxFill.cpp

The text was updated successfully, but these errors were encountered:

ermig1979 · 2018-08-20T05:57:09Z

I write here some notes about you code:

    template <> SIMD_INLINE v128_u8 Load<false>(const uint8_t * p)
    {
#ifdef __LITTLE_ENDIAN__
        return vec_vsx_ld(0, p);
#else
        v128_u8 lo = vec_ld(0, p);
        v128_u8 hi = vec_ld(A, p);
        return vec_perm(lo, hi, vec_lvsl(0, p));
#endif
    }

I tried to use vec_vsx_ld(0, p). I did it for Power7 (maybe in case of Power8 and Powe9 this instruction has another behavier). If input address is not aligned then vec_vsx_ld is performed is very slowly (about in 1000 times). Instruction vec_vsx_st has the same behavier. So I avoid to use them. So I avoid to use them. In your case the loading of unaligned vector is the same for BIG_ENDIAN and LITTLE_ENDIAN.

The main difference between BIG_ENDIAN and LITTLE_ENDIAN is using of constants for bytes manipulation. I see problem in the following constants (the list is not full of course) in file SimdConst.h:

 const v128_u8 K8_PERM_UNPACK_LO_U8 = SIMD_VEC_SETR_EPI8(0x10, 0x00, 0x11, 0x01, 0x12, 0x02, 0x13, 0x03, 0x14, 0x04, 0x15, 0x05, 0x16, 0x06, 0x17, 0x07);
 const v128_u8 K8_PERM_UNPACK_HI_U8 = SIMD_VEC_SETR_EPI8(0x18, 0x08, 0x19, 0x09, 0x1A, 0x0A, 0x1B, 0x0B, 0x1C, 0x0C, 0x1D, 0x0D, 0x1E, 0x0E, 0x1F, 0x0F);

 const v128_u8 K8_PERM_UNPACK_LO_U16 = SIMD_VEC_SETR_EPI8(0x10, 0x11, 0x00, 0x01, 0x12, 0x13, 0x02, 0x03, 0x14, 0x15, 0x04, 0x05, 0x16, 0x17, 0x06, 0x07);
 const v128_u8 K8_PERM_UNPACK_HI_U16 = SIMD_VEC_SETR_EPI8(0x18, 0x19, 0x08, 0x09, 0x1A, 0x1B, 0x0A, 0x0B, 0x1C, 0x1D, 0x0C, 0x0D, 0x1E, 0x1F, 0x0E, 0x0F);

Obviously for LITTLE_ENDIAN these constant must have another values.

pozdneev · 2018-08-27T06:59:01Z

@ermig1979, thank you for your comments!

The non-aligned loads should be much faster on little-endian P8 and P9. Moreover, the usage of vec_lvsl() is deprecated on little-endian. I will put a reference to an official ABI document into the code later.
I adjusted K8_PERM_UNPACK_LO_U8, K8_PERM_UNPACK_HI_U8, K8_PERM_UNPACK_LO_U16, K8_PERM_UNPACK_HI_U16, and K8_PERM_MUL_HI_U16. It helped to fix a lot of tests!

This is the current list of tests that fail:

'BgraToBayer'
'BgraToYuv420p'
'BgraToYuv422p'
'BgraToYuv444p'
'Bgr48pToBgra32'
'AveragingBinarization'
'NeuralConvert'
'ShiftBilinear'
'GetStatistic'
'Yuv444pToBgr'
'Yuv422pToBgr'
'Yuv420pToBgr'
'BgrToYuv420p'
'BgrToYuv422p'
'BgrToYuv444p'
'ReduceGray4x4'
'Yuv444pToHue'
'Yuv420pToHue'

I have not yet checked what is wrong there.

There is also a minor issue in VectorProduct that I overlooked. I will need to roll my commit back. I believe, I know how to fix that.

I will keep you posted. Meanwhile, if you recall any further places in the code that might be endianness-sensitive, please, let me know.

ermig1979 · 2018-08-27T11:32:00Z

BgraToBayer - there are a constants in file SimdVmxBgraToBayer.cpp (I would recommend to search there).
Other functions also have permute constants witch assuming BIG_ENDIAN.

pozdneev · 2018-08-28T07:14:32Z

@ermig1979 , thank you I'll check.
BTW, GetStatistic was failing because of an issue with a compiler. I filed a bug there and switched to another version of a compiler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling ppc64le support #68

Enabling ppc64le support #68

pozdneev commented Aug 17, 2018

ermig1979 commented Aug 20, 2018

pozdneev commented Aug 27, 2018

ermig1979 commented Aug 27, 2018

pozdneev commented Aug 28, 2018

Enabling ppc64le support #68

Enabling ppc64le support #68

Comments

pozdneev commented Aug 17, 2018

ermig1979 commented Aug 20, 2018

pozdneev commented Aug 27, 2018

ermig1979 commented Aug 27, 2018

pozdneev commented Aug 28, 2018