Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling ppc64le support #68

Open
pozdneev opened this issue Aug 17, 2018 · 4 comments
Open

Enabling ppc64le support #68

pozdneev opened this issue Aug 17, 2018 · 4 comments

Comments

@pozdneev
Copy link
Contributor

The current version of the library supports ppc64 big-endian, but does not support ppc64 little-endian. I am trying to enable ppc64le support (see https://github.com/pozdneev/Simd/tree/enabling-ppc64le-support).

@ermig1979, Could you please advise on the endianness-sensitive code?

These are the tests that fail:

-fe=AddFeatureDifference \
-fe=BgraToBayer \
-fe=BgraToYuv420p \
-fe=BgraToYuv422p \
-fe=BgraToYuv444p \
-fe=BgrToYuv420p \
-fe=BgrToYuv422p \
-fe=BgrToYuv444p \
-fe=Bgr48pToBgra32 \
-fe=AveragingBinarization \
-fe=ConditionalCount16i \
-fe=ContourMetricsMasked \
-fe=AlphaBlending \
-fe=MeanFilter3x3 \
-fe=GaussianBlur3x3 \
-fe=SobelDx \
-fe=SobelDy \
-fe=ContourMetrics \
-fe=Laplace \
-fe=HistogramMasked \
-fe=HogDirectionHistograms \
-fe=NeuralConvert \
-fe=VectorProduct \
-fe=ReduceGray5x5 \
-fe=ShiftBilinear \
-fe=GetStatistic \
-fe=GetRowSums \
-fe=GetColSums \
-fe=GetAbsDyRowSums \
-fe=GetAbsDxColSums \
-fe=ValueSum \
-fe=SquareSum \
-fe=CorrelationSum \
-fe=TextureBoostedSaturatedGradient \
-fe=TextureBoostedUv \
-fe=Yuv444pToBgr \
-fe=Yuv422pToBgr \
-fe=Yuv420pToBgr \
-fe=Yuv444pToHue \
-fe=Yuv420pToHue

So far, I modified (see master...pozdneev:enabling-ppc64le-support):

src/Simd/SimdLoad.h
src/Simd/SimdMath.h
src/Simd/SimdStore.h
src/Simd/SimdVmxFill.cpp
@ermig1979
Copy link
Owner

I write here some notes about you code:

    template <> SIMD_INLINE v128_u8 Load<false>(const uint8_t * p)
    {
#ifdef __LITTLE_ENDIAN__
        return vec_vsx_ld(0, p);
#else
        v128_u8 lo = vec_ld(0, p);
        v128_u8 hi = vec_ld(A, p);
        return vec_perm(lo, hi, vec_lvsl(0, p));
#endif
    }
  1. I tried to use vec_vsx_ld(0, p). I did it for Power7 (maybe in case of Power8 and Powe9 this instruction has another behavier). If input address is not aligned then vec_vsx_ld is performed is very slowly (about in 1000 times). Instruction vec_vsx_st has the same behavier. So I avoid to use them. So I avoid to use them. In your case the loading of unaligned vector is the same for BIG_ENDIAN and LITTLE_ENDIAN.

  2. The main difference between BIG_ENDIAN and LITTLE_ENDIAN is using of constants for bytes manipulation. I see problem in the following constants (the list is not full of course) in file SimdConst.h:

     const v128_u8 K8_PERM_UNPACK_LO_U8 = SIMD_VEC_SETR_EPI8(0x10, 0x00, 0x11, 0x01, 0x12, 0x02, 0x13, 0x03, 0x14, 0x04, 0x15, 0x05, 0x16, 0x06, 0x17, 0x07);
     const v128_u8 K8_PERM_UNPACK_HI_U8 = SIMD_VEC_SETR_EPI8(0x18, 0x08, 0x19, 0x09, 0x1A, 0x0A, 0x1B, 0x0B, 0x1C, 0x0C, 0x1D, 0x0D, 0x1E, 0x0E, 0x1F, 0x0F);
    
     const v128_u8 K8_PERM_UNPACK_LO_U16 = SIMD_VEC_SETR_EPI8(0x10, 0x11, 0x00, 0x01, 0x12, 0x13, 0x02, 0x03, 0x14, 0x15, 0x04, 0x05, 0x16, 0x17, 0x06, 0x07);
     const v128_u8 K8_PERM_UNPACK_HI_U16 = SIMD_VEC_SETR_EPI8(0x18, 0x19, 0x08, 0x09, 0x1A, 0x1B, 0x0A, 0x0B, 0x1C, 0x1D, 0x0C, 0x0D, 0x1E, 0x1F, 0x0E, 0x0F);
    

Obviously for LITTLE_ENDIAN these constant must have another values.

@pozdneev
Copy link
Contributor Author

@ermig1979, thank you for your comments!

  1. The non-aligned loads should be much faster on little-endian P8 and P9. Moreover, the usage of vec_lvsl() is deprecated on little-endian. I will put a reference to an official ABI document into the code later.
  2. I adjusted K8_PERM_UNPACK_LO_U8, K8_PERM_UNPACK_HI_U8, K8_PERM_UNPACK_LO_U16, K8_PERM_UNPACK_HI_U16, and K8_PERM_MUL_HI_U16. It helped to fix a lot of tests!

This is the current list of tests that fail:

'BgraToBayer'
'BgraToYuv420p'
'BgraToYuv422p'
'BgraToYuv444p'
'Bgr48pToBgra32'
'AveragingBinarization'
'NeuralConvert'
'ShiftBilinear'
'GetStatistic'
'Yuv444pToBgr'
'Yuv422pToBgr'
'Yuv420pToBgr'
'BgrToYuv420p'
'BgrToYuv422p'
'BgrToYuv444p'
'ReduceGray4x4'
'Yuv444pToHue'
'Yuv420pToHue'

I have not yet checked what is wrong there.

There is also a minor issue in VectorProduct that I overlooked. I will need to roll my commit back. I believe, I know how to fix that.

I will keep you posted. Meanwhile, if you recall any further places in the code that might be endianness-sensitive, please, let me know.

@ermig1979
Copy link
Owner

BgraToBayer - there are a constants in file SimdVmxBgraToBayer.cpp (I would recommend to search there).
Other functions also have permute constants witch assuming BIG_ENDIAN.

@pozdneev
Copy link
Contributor Author

@ermig1979 , thank you I'll check.
BTW, GetStatistic was failing because of an issue with a compiler. I filed a bug there and switched to another version of a compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants