Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further optimize L3_huffman and L3_imdct36 #35

Open
lieff opened this issue Aug 27, 2018 · 1 comment
Open

Further optimize L3_huffman and L3_imdct36 #35

lieff opened this issue Aug 27, 2018 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@lieff
Copy link
Owner

lieff commented Aug 27, 2018

ARM Instructions profile:

Total executed instructions: 2164536044
L3_huffman.isra.2 685600678 31.674%
mp3d_synth 546698880 25.257%
L3_imdct36 251612240 11.624%
L3_dct3_9 176638976 8.161%
mp3d_DCT_II 165811968 7.660%
mp3d_synth_pair 61793280 2.855%
L3_antialias 48054640 2.220%
L3_change_sign 36160512 1.671%
L3_midside_stereo 27845120 1.286%
get_bits 27395265 1.266%
memset 26390774 1.219%
mp3d_scale_pcm 21970944 1.015%
__memcpy_neon 19825566 0.916%
L3_ldexp_q2 17661260 0.816%
L3_read_scalefactors 14160764 0.654%
L3_decode_scalefactors 10988038 0.508%

L3_huffman and L3_imdct36+L3_dct3_9 needs optimizations. (Vectorize two L3_dct3_9?)

@lieff lieff self-assigned this Aug 28, 2018
@lieff lieff added the enhancement New feature or request label Aug 28, 2018
@lieff
Copy link
Owner Author

lieff commented Sep 4, 2018

After some experiments:

  • There no good enough ARM compiler to completely rely on intrinsics and do not use assembler.
  • Best known compiler - armcc, then clang and gcc8 (close to clang with -flto, clang crashes with -flto).
  • Main compiler problem found - bad post increment usage. https://reviews.llvm.org/D39415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant