{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":16143904,"defaultBranch":"master","name":"blis","ownerLogin":"flame","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-01-22T15:58:24.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/6494486?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1714515214.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"7d486312c8c04afb81e2e424daf25aa65f758069","ref":"refs/heads/1.0-rc2","pushedAt":"2024-04-30T22:13:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Use \"-i auto\" by default in test/3 drivers.\n\nDetails:\n- Request default induced method behavior of BLIS via \"-i auto\" when\n running the standalone performance drivers in test/3 via the runme.sh\n script present in that directory. (Previously, the runme.sh script\n would use \"-i native\" by default.) This change was originally intended\n for fd1a7e3.\n- (cherry picked from commit cad51491e8a0b306015a5a02881dc2a9b60dd8d9)","shortMessageHtmlLink":"Use \"-i auto\" by default in test/3 drivers."}},{"before":null,"after":"c5ed72aac20aaf89052f5742769219a4f3efc41e","ref":"refs/heads/stable-oct27-cand5","pushedAt":"2024-04-30T22:04:17.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Omnibus PR - Oct 2023 (#678)\n\nDetails:\n- This is an \"omnibus\" commit, consisting of multiple medium-sized\n commits that affect non-trivial aspects of BLIS. The major highlights:\n - Relocated the pba, sba pool (from the rntm_t), and mem_t (from the\n cntl_t) to the thrinfo_t object. This allows the rntm_t to be\n effectively const (although it is sometimes copied internally and\n modified to reflect different ways of parallelism). Moving the mem_t\n sets the stage for sharing a global control tree amongst all\n threads.\n - De-templatized the macrokernels for gemmt, trmm, and trsm to match\n the macrokernel for gemm, which has been de-templatized since\n 54fa28b.\n - Reimplemented bli_l3_determine_kc() by separating out the logic for\n adjusting KC based on MR/NR for triangular A and/or B into a new\n function, bli_l3_adjust_kc(). For now, this function is still called\n from bli_l3_determine_kc(), but in the future we plan to have it\n called once when constructing the control tree.\n - Refactored the level-3 thread decorator into two parts:\n - One part deals only with launching threads, each one calling a\n generic thread entry function. This code resides in frame/thread\n and constitutes the definition of bli_thread_launch(). Note that\n it is specific to the threading implementation (OpenMP, pthreads,\n single, etc.)\n - The other part deals with passing the matrix operands and related\n information into bli_thread_launch(). This is the \"l3 decorator\"\n and now resides in frame/3. It is agnostic to the threading\n implementation.\n - Modified the \"level\" of the thread control tree passed in at each\n operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was\n passed in a communicator representing the active thread teams which\n would share the available work. Now, the *parent* thread comm is\n passed in. The operation then grabs the child comm and uses it to\n partition the work. The difference is in bli_trsm_blk_var1(), where\n there are now two children nodes for this single operation (i.e. the\n thread control tree is split one level above where the control tree\n is). The sub-prenode is used for the trsm subproblem while the\n normal sub-node is used for the gemm part. Importantly, the parent\n comm is used for the barrier between them.\n- Removed cntl_t* arguments from bli_*_front() functions. These will be\n added back in the future when the control tree's creation is moved so\n that it happens much sooner (provided that bli_*_front() have not been\n absorbed into their respective bli_*_ex() functions).\n- Renamed various bli_thread_*() query functions to bli_thrinfo_*(),\n for consistency. This includes _num_threads(), _thread_id(), _n_way(),\n _work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and\n _am_chief().\n- Removed extraneous barrier from _blk_var3() of gemm and trsm.\n- Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was\n misspelled.\n- (cherry picked from commit aeb5f0cc19665456e990a7ffccdb09da2e3f504b)\n\nFixed performance bug caused by redundant packing. (#680)\n\nDetails:\n- Fixed a performance bug whereby multiple threads were redundantly\n packing the same (rather than separate) micropanels. This bug was\n caused by different parts of the code using the num_threads/thread_id\n field of the thrinfo_t vs. the n_way/work_id fields. The fix was to\n standardize on the latter and provide a \"fake\" thrinfo_t sub-prenode\n in the thrinfo tree which consists of single-member thread teams. The\n single team with multiple threads node is still required since it and\n only it can be used to perform barriers and broadcasts (e.g. of the\n packed buffer pointer).\n- (cherry picked from commit 29f79f030e939969d4f3876c4fdaac7b0c5daa63)\n\nFixed random segfault in test/3 drivers. (#788)\n\nDetails:\n- Fixed a segfault in the non-gemm test drivers in test/3 that was the\n result of sometimes leaving either .n_str or .k_str fields of the\n params_t struct uninitialized, depending on the operation in question.\n For example, in test_hemm.c, init_def_params() would only initialize\n the .m_str and .n_str fields, but not the .k_str field. Even though\n hemm doesn't use a 'k' dimension, the proc_params() function (called\n via parse_cl_params()) universally attempts to convert all three into\n integers via sscanf(), which was understandably failing when one of\n those strings was a NULL pointer. I'm not sure how this code ever\n worked to begin with. Special thanks to Leick Robinson for finding and\n reporting this bug.\n- (cherry picked from commit 1236ddab455ef3a6293ab394ff06b3a19c2913d9)\n\nFixed staleness in kernels/zen/3/bli_gemm_small.c.\n\nDetails:\n- Added missing 'const' keyword in function prototypes for\n bli_gemm_small() and friends.\n- Updated pba usage to reflect new APIs.\n- Fixed syntax typo in 'export GOMP_CPU_AFFINITY' line in ul2128\n conditional of test/3/runme.sh.\n- Thanks to Jeff Diamond for reporting these issues.","shortMessageHtmlLink":"Omnibus PR - Oct 2023 (#678)"}},{"before":null,"after":"110430f337b1db0b0d5737dc9dfc2e9004b3b2d6","ref":"refs/heads/stable-oct26-cand1","pushedAt":"2024-04-30T22:03:09.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Add check to disable armsve on Apple M1.\n\n- (cherry picked from commit c803b03e52a7a6997a8d304a8cfa9acf7c1c555b)\n\nFix auto-detection of firestorm (Apple M1).\n\n- (cherry picked from commit 2dd692b710b6a9889f7ebdd7934a2108be5c5530)\n\nAdded Discord documentation (#677)\n\nDetails:\n- Added a docs/Discord.md markdown document that walks the reader\n through creating a Discord account, obtaining the invite link, and\n using the link to join the BLIS Discord server.\n- Updated README.md to reference the new Discord.md document in multiple\n places, including via the official Discord logo (used with explicit\n permission from representatives at Discord Inc.).\n- (cherry picked from commit 88105dbecf0f9dfbfa30215743346e8bd6afb971)\n\nShuffled checked properties in bli_l3_check.c. (#676)\n\nDetails:\n- Added certain checks for matrix structure to the level-3 operations'\n _check() functions, and slightly reorganized existing checks.\n- (cherry picked from commit 23f5b8df3e802a27bacd92571184ec57bbdfa646)\n\nCREDITS file update.\n\nDetails:\n- This attribution was intended to go in PR #647.\n- (cherry picked from commit 9453e0f163503f64a290256b4be53d8882224863)\n\nReinstate sanity check in bli_pool_finalize. (#671)\n\nDetails:\n- Added a reinit argument to bli_pool_finalize(). This bool will signal\n whether or not the function is being called from bli_pool_reinit(). If\n it is not being called from _reinit(), we can safely check to confirm\n that .top_index == 0 (i.e., all blocks have been checked in). But if\n it *is* being called from _reinit(), then that check will be skipped\n since one of the predicted use cases for bli_pool_reinit() anticipates\n that some blocks are (probably) checked out when the pool_t is\n reinitialized.\n- Updated existing invocations of bli_pool_finalize() to pass in either\n FALSE (from bli_apool_free_block() or bli_pba_finalize_pools()) or\n TRUE (from bli_pool_reinit()) for the new reinit argument.\n- (cherry picked from commit 76a23bd8c33e161221891935a489df9a9fb9c8c0)\n\nFix some bugs in bli_pool.c (#670)\n\nDetails:\n- Add a check for premature pool exhaustion when checking in blocks via\n bli_pool_checkin_block(). This detects \"double-free\" and other bad\n conditions that don't necessarily result in a segfault.\n- Make sure to copy all block pointers when growing the pool size.\n Previously, checked-out block pointers (which are guaranteed to be set\n to NULL) were not being copied, leading to the presence of\n uninitialized data.\n- (cherry picked from commit 63470b49e3b9b15e00a8f666e86ccd70c6005fe9)\n\nAdd AddressSanitizer (-fsanitize=address) option. (#669)\n\nDetails:\n- Added support for AddressSanitizer (ASan), a compiler-integrated\n memory error detector. The option (disabled by default) enables\n compiling and linking with the -fsanitize=address flag supported by\n clang, gcc, and probably others. This flag is employed during\n compilation of all BLIS source files *except* for optimized kernels,\n which are exempted because ASan usually requires an extra register,\n which violates the constraints for many gemm microkernels.\n- Minor whitespace, comment, ordering, and configure help text updates.\n- (cherry picked from commit 42d0e66318b186d25eeb215b40ce26115401ed8b)\n\nAdd consistent NaN/Inf handling in sumsqv. (#668)\n\nDetails:\n- Changed sumsqv implementation as follows:\n - If there is a NaN (either real or imaginary), then return a sum of\n NaN and unit scale.\n - Else, if there is an Inf (either real or imaginary), then return a\n sum of +Inf and unit scale.\n - Otherwise behave as normal.\n- (cherry picked from commit b861c71b50c6d48cb07282f44aa9dddffc1f1b3f)\n\nParameterized test/3 drivers via command line args. (#667)\n\nDetails:\n- Rewrote the drivers in test/3, the Makefile, and the runme.sh script\n so that most of the important parameters, including parameter combo,\n datatype, storage combo, induced method, problem size range, dimension\n bindings, number of repeats, and alpha/beta values can be passed in\n via command line arguments. (Previously, most of these parameters were\n hard-coded into the driver source, except a few that were hard-coded\n into the Makefile.) If no argument is given for any particular option,\n it will be assigned a sane default. Either way, the values employed at\n runtime will be printed to stdout before the performance data in a\n section that is commented out with '%' characters (which is used by\n matlab and octave for comments), unless the -q option is given, in\n which case the driver will proceed quietly and output only performance\n data. Each driver also provides extensive help via the -h option, with\n the help text tailored for the operation in question (e.g. gemm, hemm,\n herk, etc.). In this help text, the driver reminds the user which\n implementation it was linked to (e.g. blis, openblas, vendor, eigen).\n Thanks to Jeff Diamond for suggesting this CLI-based reimagining of\n the test/3 drivers.\n- In the test/3 drivers: converted cpp macro string constants, as well\n as two string literals (for the opname and pc_str) used in each test\n driver, to global (or static) const char* strings, and replaced the\n use of strncpy() for storing the results of the command line argument\n parsing with pointer copies from the corresponding strings in argv.\n This works because the argv array is guaranteed by the C99 standard\n to persist throughout the life of the program. This new approach uses\n less storage and executes faster. Thanks to Minh Quan Ho for\n recommending this change.\n- Renamed the IMP_STR cpp macro that gets defined on the command line,\n via the test/3/Makefile, to IMPL_STR.\n- Updated runme.sh to set the problem size ranges for single-threaded\n and multithreaded execution independently from one another, as well as\n on a per-system basis.\n- Added a 'quiet' variable to runme.sh that can easily toggle quiet mode\n for the test drivers' output.\n- Very minor typecast fix in call to bli_getopt() in bli_utils.c.\n- In bli_getopt(), changed the nextchar variable from being a local\n static variable to a field of the getopt_t state struct. (Not sure why\n it was ever declared static to begin with.)\n- Other minor changes to bli_getopt() to accommodate the rewritten test\n drivers' command line parsing needs.\n- (cherry picked from commit ee81efc7887374c974a78bfb3e0865776b2f97a8)\n\nAllow test/3 drivers to use default ind_t method. (#804)\n\nDetails:\n- Previously, the standalone performance drivers in test/3 were written\n under the assumption that the user would want to explicitly test\n either native execution *or* 1m. But because the accompanying runme.sh\n script defaults to passing \"native\" in for the -i command line option\n (which explicitly sets the induced method type), running the script\n without modification causes the test drivers to use slow reference\n microkernels on systems where native complex-domain microkernels are\n not registered -- which will yield poor performance for complex-domain\n level-3 operations. Furthermore, even if a user was aware of this, the\n test drivers did not support any single value for the -i option that\n would test BLIS using the library's default behavior -- that is, using\n 1m on systems where it is needed and native execution on systems that\n have native microkernels implemented and registered.\n- This commit addresses the aforementioned issue by supporting a new\n value for the -i option: \"auto\". The \"auto\" value causes the driver\n to avoid explicitly setting the induced method altogether, leaving\n BLIS's default behavior in place. This \"auto\" option is also now the\n default setting within the runme.sh script. Thanks to Leick Robinson\n for finding and reporting this issue.\n- Also added support for \"nat\" as a shorthand for \"native\", which\n the help text already (erroneously) claimed was supported.\n- (cherry picked from commit fd1a7e3ca9547718aa61c806848099705216182b)\n\nUse \"-i auto\" by default in test/3 drivers.\n\nDetails:\n- Request default induced method behavior of BLIS via \"-i auto\" when\n running the standalone performance drivers in test/3 via the runme.sh\n script present in that directory. (Previously, the runme.sh script\n would use \"-i native\" by default.) This change was originally intended\n for fd1a7e3.\n- (cherry picked from commit cad51491e8a0b306015a5a02881dc2a9b60dd8d9)","shortMessageHtmlLink":"Add check to disable armsve on Apple M1."}},{"before":"fd1a7e3ca9547718aa61c806848099705216182b","after":"cad51491e8a0b306015a5a02881dc2a9b60dd8d9","ref":"refs/heads/master","pushedAt":"2024-04-30T21:51:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Use \"-i auto\" by default in test/3 drivers.\n\nDetails:\n- Request default induced method behavior of BLIS via \"-i auto\" when\n running the standalone performance drivers in test/3 via the runme.sh\n script present in that directory. (Previously, the runme.sh script\n would use \"-i native\" by default.) This change was originally intended\n for fd1a7e3.","shortMessageHtmlLink":"Use \"-i auto\" by default in test/3 drivers."}},{"before":"685dcb53e2d04ba9879360f2a2da831c88274ee6","after":"3b0e244e2e98e90f9aa68a4c987fd3b0d66d6b0b","ref":"refs/heads/adat","pushedAt":"2024-04-27T05:16:12.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Implement A*D*B (gemdm), A*D*A^{T,H} (syrkd/herkd), and A*D*B^{T,H}+B*D*A^{T,H} (syrk2d/her2kd) operations (and gemdmt for good measure). Some rough edges still:\n\n- Complex herk2 and her2kd will not work.\n- No mixed-type/mixed-domain or 1m.\n- Not integrated into testsuite yet.","shortMessageHtmlLink":"Implement A*D*B (gemdm), A*D*A^{T,H} (syrkd/herkd), and A*D*B^{T,H}+B…"}},{"before":"06829d0fdbfa442741585c9ce215ac466b417acd","after":"129ce041808a49855934427109bda3b02dbd8b8e","ref":"refs/heads/skew-blas","pushedAt":"2024-04-26T23:10:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Trigger CI build","shortMessageHtmlLink":"Trigger CI build"}},{"before":null,"after":"06829d0fdbfa442741585c9ce215ac466b417acd","ref":"refs/heads/skew-blas","pushedAt":"2024-04-26T22:56:17.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Typo fix in `configure`\n\n[ci skip]","shortMessageHtmlLink":"Typo fix in configure"}},{"before":"f51d4739d7bfeb253eb044fdaae658e274c47f10","after":"453c60b6ab59af65b0306d75a9b904ab9be362a6","ref":"refs/heads/sk","pushedAt":"2024-04-26T22:55:51.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Update `gemmt` test\n\nThe `gemmt` test in the testsuite currently doesn't detect improper modification of the unstored region. This commit attempts to correct this problem.","shortMessageHtmlLink":"Update gemmt test"}},{"before":"005dcce97b96114183d409e1b7e0301510b0e0c6","after":"f51d4739d7bfeb253eb044fdaae658e274c47f10","ref":"refs/heads/sk","pushedAt":"2024-04-26T19:16:50.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Update `gemmt` test\n\nThe `gemmt` test in the testsuite currently doesn't detect improper modification of the unstored region. This commit attempts to correct this problem.","shortMessageHtmlLink":"Update gemmt test"}},{"before":null,"after":"4cf2a99832c7e2c572493d358d972ed3da3b0f4e","ref":"refs/heads/stable-oct27-cand4","pushedAt":"2024-04-25T20:14:17.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Omnibus PR - Oct 2023 (#678)\n\nDetails:\n- This is an \"omnibus\" commit, consisting of multiple medium-sized\n commits that affect non-trivial aspects of BLIS. The major highlights:\n - Relocated the pba, sba pool (from the rntm_t), and mem_t (from the\n cntl_t) to the thrinfo_t object. This allows the rntm_t to be\n effectively const (although it is sometimes copied internally and\n modified to reflect different ways of parallelism). Moving the mem_t\n sets the stage for sharing a global control tree amongst all\n threads.\n - De-templatized the macrokernels for gemmt, trmm, and trsm to match\n the macrokernel for gemm, which has been de-templatized since\n 54fa28b.\n - Reimplemented bli_l3_determine_kc() by separating out the logic for\n adjusting KC based on MR/NR for triangular A and/or B into a new\n function, bli_l3_adjust_kc(). For now, this function is still called\n from bli_l3_determine_kc(), but in the future we plan to have it\n called once when constructing the control tree.\n - Refactored the level-3 thread decorator into two parts:\n - One part deals only with launching threads, each one calling a\n generic thread entry function. This code resides in frame/thread\n and constitutes the definition of bli_thread_launch(). Note that\n it is specific to the threading implementation (OpenMP, pthreads,\n single, etc.)\n - The other part deals with passing the matrix operands and related\n information into bli_thread_launch(). This is the \"l3 decorator\"\n and now resides in frame/3. It is agnostic to the threading\n implementation.\n - Modified the \"level\" of the thread control tree passed in at each\n operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was\n passed in a communicator representing the active thread teams which\n would share the available work. Now, the *parent* thread comm is\n passed in. The operation then grabs the child comm and uses it to\n partition the work. The difference is in bli_trsm_blk_var1(), where\n there are now two children nodes for this single operation (i.e. the\n thread control tree is split one level above where the control tree\n is). The sub-prenode is used for the trsm subproblem while the\n normal sub-node is used for the gemm part. Importantly, the parent\n comm is used for the barrier between them.\n- Removed cntl_t* arguments from bli_*_front() functions. These will be\n added back in the future when the control tree's creation is moved so\n that it happens much sooner (provided that bli_*_front() have not been\n absorbed into their respective bli_*_ex() functions).\n- Renamed various bli_thread_*() query functions to bli_thrinfo_*(),\n for consistency. This includes _num_threads(), _thread_id(), _n_way(),\n _work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and\n _am_chief().\n- Removed extraneous barrier from _blk_var3() of gemm and trsm.\n- Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was\n misspelled.\n- (cherry picked from commit aeb5f0cc19665456e990a7ffccdb09da2e3f504b)\n\nFixed performance bug caused by redundant packing. (#680)\n\nDetails:\n- Fixed a performance bug whereby multiple threads were redundantly\n packing the same (rather than separate) micropanels. This bug was\n caused by different parts of the code using the num_threads/thread_id\n field of the thrinfo_t vs. the n_way/work_id fields. The fix was to\n standardize on the latter and provide a \"fake\" thrinfo_t sub-prenode\n in the thrinfo tree which consists of single-member thread teams. The\n single team with multiple threads node is still required since it and\n only it can be used to perform barriers and broadcasts (e.g. of the\n packed buffer pointer).\n- (cherry picked from commit 29f79f030e939969d4f3876c4fdaac7b0c5daa63)\n\nFixed random segfault in test/3 drivers. (#788)\n\nDetails:\n- Fixed a segfault in the non-gemm test drivers in test/3 that was the\n result of sometimes leaving either .n_str or .k_str fields of the\n params_t struct uninitialized, depending on the operation in question.\n For example, in test_hemm.c, init_def_params() would only initialize\n the .m_str and .n_str fields, but not the .k_str field. Even though\n hemm doesn't use a 'k' dimension, the proc_params() function (called\n via parse_cl_params()) universally attempts to convert all three into\n integers via sscanf(), which was understandably failing when one of\n those strings was a NULL pointer. I'm not sure how this code ever\n worked to begin with. Special thanks to Leick Robinson for finding and\n reporting this bug.\n- (cherry picked from commit 1236ddab455ef3a6293ab394ff06b3a19c2913d9)\n\nFixed staleness in kernels/zen/3/bli_gemm_small.c.\n\nDetails:\n- Added missing 'const' keyword in function prototypes for\n bli_gemm_small() and friends.\n- Updated pba usage to reflect new APIs.\n- Fixed syntax typo in 'export GOMP_CPU_AFFINITY' line in ul2128\n conditional of test/3/runme.sh.\n- Thanks to Jeff Diamond for reporting these issues.\n\nAllow test/3 drivers to use default ind_t method. (#804)\n\nDetails:\n- Previously, the standalone performance drivers in test/3 were written\n under the assumption that the user would want to explicitly test\n either native execution *or* 1m. But because the accompanying runme.sh\n script defaults to passing \"native\" in for the -i command line option\n (which explicitly sets the induced method type), running the script\n without modification causes the test drivers to use slow reference\n microkernels on systems where native complex-domain microkernels are\n not registered -- which will yield poor performance for complex-domain\n level-3 operations. Furthermore, even if a user was aware of this, the\n test drivers did not support any single value for the -i option that\n would test BLIS using the library's default behavior -- that is, using\n 1m on systems where it is needed and native execution on systems that\n have native microkernels implemented and registered.\n- This commit addresses the aforementioned issue by supporting a new\n value for the -i option: \"auto\". The \"auto\" value causes the driver\n to avoid explicitly setting the induced method altogether, leaving\n BLIS's default behavior in place. This \"auto\" option is also now the\n default setting within the runme.sh script. Thanks to Leick Robinson\n for finding and reporting this issue.\n- Also added support for \"nat\" as a shorthand for \"native\", which\n the help text already (erroneously) claimed was supported.\n- (cherry picked from commit fd1a7e3ca9547718aa61c806848099705216182b)","shortMessageHtmlLink":"Omnibus PR - Oct 2023 (#678)"}},{"before":"2eb98b0764a92730e5d97545f03e4e963a845ffc","after":null,"ref":"refs/heads/test3-default-ind-fix","pushedAt":"2024-04-25T20:01:03.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"a49238e6141c96a41aa3c2a4adb0b0663d0b4968","after":"fd1a7e3ca9547718aa61c806848099705216182b","ref":"refs/heads/master","pushedAt":"2024-04-25T20:00:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Allow test/3 drivers to use default ind_t method. (#804)\n\nDetails:\r\n- Previously, the standalone performance drivers in test/3 were written\r\n under the assumption that the user would want to explicitly test\r\n either native execution *or* 1m. But because the accompanying runme.sh\r\n script defaults to passing \"native\" in for the -i command line option\r\n (which explicitly sets the induced method type), running the script\r\n without modification causes the test drivers to use slow reference\r\n microkernels on systems where native complex-domain microkernels are \r\n not registered -- which will yield poor performance for complex-domain\r\n level-3 operations. Furthermore, even if a user was aware of this, the \r\n test drivers did not support any single value for the -i option that \r\n would test BLIS using the library's default behavior -- that is, using \r\n 1m on systems where it is needed and native execution on systems that \r\n have native microkernels implemented and registered.\r\n- This commit addresses the aforementioned issue by supporting a new\r\n value for the -i option: \"auto\". The \"auto\" value causes the driver\r\n to avoid explicitly setting the induced method altogether, leaving\r\n BLIS's default behavior in place. This \"auto\" option is also now the\r\n default setting within the runme.sh script. Thanks to Leick Robinson\r\n for finding and reporting this issue.\r\n- Also added support for \"nat\" as a shorthand for \"native\", which\r\n the help text already (erroneously) claimed was supported.","shortMessageHtmlLink":"Allow test/3 drivers to use default ind_t method. (#804)"}},{"before":"a18252d28de85b4bf738aee6e60c4adcb66cf9cc","after":null,"ref":"refs/heads/new_control_trees","pushedAt":"2024-04-25T19:56:42.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":null,"after":"2eb98b0764a92730e5d97545f03e4e963a845ffc","ref":"refs/heads/test3-default-ind-fix","pushedAt":"2024-04-24T22:12:55.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Allow test/3 drivers to use default ind_t method.\n\nDetails:\n- Previously, the standalone performance drivers in test/3 were written\n under the assumption that the user would want to explicitly test\n either native execution *or* 1m. But because the accompanying runme.sh\n script defaults to passing \"native\" in for the -i command line option\n (which explicitly sets the induced method type), using it without\n modification causes the test drivers to use reference microkernels on\n systems where native complex-domain microkernels are not registered.\n Furthermore, even if a user was aware of this, the test drivers did\n not support any single value for the -i option that would test BLIS\n using the library's default behavior -- that is, using 1m on systems\n where it is needed and native execution on systems that have native\n microkernels.\n- This commit addresses the aforementioned issue by supporting a new\n value for the -i option: \"auto\". The \"auto\" value causes the driver\n to avoid explicitly setting the induced method altogether, leaving\n BLIS's default behavior in place. This \"auto\" option is now the\n default setting within the runme.sh script. Thanks to Leick Robinson\n for finding and reporting this issue.\n- Also added support for \"nat\" as a shorthand for \"native\", which\n the help text already (erroneously) claimed was supported.","shortMessageHtmlLink":"Allow test/3 drivers to use default ind_t method."}},{"before":"a316d2c6c33fc1f8f7c58c4210ab203f48349041","after":"a49238e6141c96a41aa3c2a4adb0b0663d0b4968","ref":"refs/heads/master","pushedAt":"2024-04-24T20:07:18.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Refactor the control tree and other infrastructure (#710)\n\nDetails:\r\n1. A \"plugin\" architecture.\r\n- Users are now able to register new kernels, kernel preferences, and\r\n blocksizes at runtime, directly from user applications.\r\n- Plugins can be created, configured, and built using only an installed\r\n version of BLIS -- no source or source code changes required.\r\n- Plugins support both reference and optimized kernels, as well as\r\n custom configuration-to-kernel-set mappings.\r\n- Building plugins (including reference and relevant optimized kernels)\r\n for enabled architectures or architecture families is automated, as is\r\n linking into the final library.\r\n- The configure script is now installed as 'configure-plugin'. In this\r\n mode, it can be used to initialize a plugin from a template including\r\n optional example code, and prepare a build system for compiling the\r\n plugin into a shared or static library.\r\n- Additional configuration files, templates, and build system components\r\n are also installed to '%prefix%/share/blis'.\r\n- The cntx_t struct now has extensible data structures for holding\r\n kernels, preferences, and blocksizes. These are based on a \"stack\"\r\n structure which contains a list of fixed-size data blocks. Adding a\r\n new entry (which may require allocating a new block or reallocating\r\n the block pointer array) requires locking, but looking up entries is\r\n lock-free and takes O(1) time.\r\n- Kernels can depend on either 1 or 2 type parameters (e.g.\r\n mixed-precision packing requires 2). The func2_t struct supports\r\n the latter, but can be implicitly cast to func_t if only \"diagonal\"\r\n entries are needed. The number of type parameters can be inferred from\r\n the kernel ID for type safety.\r\n- Functions have been added to register new kernels, preferences, and\r\n blocksizes with the global kernel structure (gks). This creates\r\n corresponding entries in each allocated context and returns the next\r\n available ID. Plugins use this API to register user kernels, although\r\n the user is responsible for tracking the returned IDs for later\r\n lookup. Setting newly-registered reference kernels, as well as\r\n overriding these with optimized kernels is done in exactly the same\r\n manner as in bli_cntx_init_ref() and bli_cntx_init_().\r\n\r\n2. Restructuring of the control and thread control trees.\r\n- The control tree has been substantially restructured to support more\r\n flexibility.\r\n- The \"default\" control trees for gemm (also used for\r\n hemm/symm/herk/her2k/syrk/syr2k/trmm/trmm3) and trsm are now\r\n represented as a single structure containing all necessary control\r\n tree nodes and parameters.\r\n- An API has been added to modify the default gemm/trsm control trees.\r\n- This same API is used by the framework and packm/gemm/trsm variants\r\n to access specific control tree nodes.\r\n- Users can alternatively create a custom control tree from scratch.\r\n- The blocksizes are now encoded directly in the control tree, rather\r\n than via loop IDs. The logic for adjusting blocksizes for certain\r\n operations has been moved to the control tree initialization.\r\n- Type information is encoded in the control tree to drive proper\r\n selection of packing and computational kernels provided by the user.\r\n- The packing microkernel now receives an opaque \"params\" struct which\r\n is user-definable and can be used to pass additional information\r\n through the call stack.\r\n- The auxinfo_t struct has been updated with a .params field for\r\n opaque user data as well as the global offsets of the current\r\n microtile.\r\n- The packm and gemm variants can be overridden by the user, and also\r\n receive an opaque params struct via the associated control tree\r\n node.\r\n- The structure-aware packing kernel bli_packm_struc_cxk() is no longer\r\n hard-coded to be called from the default packm variant, but can be\r\n overridden by the user. It also supports mixed-precision/mixed-domain\r\n natively now.\r\n- The thread control tree (thrinfo_t) is now created entirely up-front\r\n by inspecting the control tree. The required number of threads at each\r\n level is encoded in the control tree via loop IDs (actually a bitfield\r\n of loop IDs), although the ordering and number of such IDs is\r\n arbitrary. The logic for adjusting the number of threads at each level\r\n based on operation type (e.g. trmm) is now in the control tree\r\n initialization and expressed by combining loop IDs from multiple\r\n levels into a single level.\r\n- The mem_t object containing the pack buffer pointer has been moved\r\n from the control tree to the thread control tree. NOTE: **The control\r\n tree is now strictly const throughout the operation, and only a\r\n single copy is shared by all threads.**\r\n- The thread control tree node for packing has been changed so that\r\n there is no longer a \"fake\" node indicating a team of single threads.\r\n Instead, the number of threads and thread IDs in the \"normal\" thread\r\n control tree node are used. This change has also been made to the\r\n gemmsup thread control tree and packing variants, as well as to the\r\n gemmlike sandbox.\r\n- Parameters controlling packing (e.g. inversion of the diagonal,\r\n direction, schema) are not stored directly in the control tree but in\r\n the opaque params struct. The packing control tree node and its\r\n default params struct are stored together in the \"combined\"\r\n gemm/trsm control tree structure and initialized as a unit. Users can\r\n update these parameters individually or substitute a custom packm\r\n variant and params struct.\r\n- The \"target\" and \"execution\" datatypes has been removed from the obj_t\r\n struct and replaced by type information in the control tree.\r\n- The \"sub-node\" and \"sub-prenode\" of a control tree node have been\r\n replaced by an arbitrary number of sub-nodes accessed by index. There\r\n is a hard cap on the number of sub-nodes (currently 2). Sub-nodes are\r\n added during control tree initialization, *after*\r\n creation/initialization of the parent node through an updated API.\r\n- The level-3 thread decorator has been significantly simplified and\r\n directly calls bli_l3_int(). The control tree is created externally,\r\n and it is no longer necessary to alias matrices or set object pack\r\n schemas. Also, the rntm_t passed in may be NULL. Finally, family\r\n and scalar information is no longer needed here.\r\n- bli_l3_int() is now a simple inline function which extracts the next\r\n control tree node and variant and calls it.\r\n- bli_*_front() have been removed and inlined into the expert object\r\n API with significant simplification.\r\n- 1m (or other induced method) no longer uses an alternative cntx_t.\r\n- The .pack_fn/.ker_fn pointers and associated params fields on the\r\n obj_t were removed in favor of the present solution.\r\n\r\n3. Overhaul of variable substitution in configure script.\r\n- The configure script has been somewhat re-written to use a\r\n centralized mechanism for substituting variables into build system and\r\n other configuration files.\r\n- All substitution variables go through the same pathway now, which\r\n necessitated some variable naming changes for variables which were\r\n named the same in e.g. Makefile and bli_config.h but with\r\n different definitions.\r\n- CC and CXX variables can now contain spaces, e.g. 'g++ -std=c++17'.\r\n This provides better support for integration with build tooling such\r\n as autotools.\r\n\r\n4. Overhaul of packing kernels.\r\n- Previously there were two packing kernels referenced in the cntx_t\r\n structure for MRxk and NRxk shaped micropanels, respectively. These\r\n have now been merged into one kernel which is responsible for packing\r\n any dense rectangular portion of either A or B.\r\n- The packing kernel now receives information about the register\r\n blocksize (cdim_max) and duplication factor (the \"broadcast-B\"\r\n format, although this can also apply to the A matrix).\r\n- The structure-aware packing kernel (bli_packm_struc_cxk(), which is\r\n now user-overridable) also receives global offsets of the current\r\n micropanel within A or B.\r\n- Explicit kernels for packing the diagonal blocks of\r\n triangular/symmetric/Hermitian matrices have been added to the\r\n cntx_t. This means that the bli_packm_struc_ckx() \"kernel\" no longer\r\n needs to directly touch data (except to zero out some regions).\r\n- bli_packm_struc_cxk() has also been updated to work only in terms of\r\n fundamental elements (i.e., real datatypes) when computing offsets and\r\n when zeroing data, which greatly simplifies mixed-domain/1m packing.\r\n- bli_packm_scalar() has been updated to better support complex scalars\r\n in mixed-domain operations.\r\n- Pack schemas for PACKED_ROW_PANELS* and PACKED_COL_PANELS* have\r\n been merged into simply PACKED_PANELS*. This reflects the merging of\r\n the packing kernels into a single generic kernel. There were only a\r\n very few places which needed the row/column information and this is\r\n now supplied by alternative means.\r\n- Packing variants always behave \"as if\" the A matrix were being packed\r\n (i.e. the code assumes packing column-stored row panels). Packing of B\r\n is handled by applying an implicit or explicit transpose before\r\n packing. This change also applies to gemmsup.\r\n\r\n5. Improved MD/MP support.\r\n- All level-3 operations (except trsm) now support full\r\n mixed-domain/mixed-precision operation.\r\n- Explicit 1m packing kernels have been added in the cntx_t.\r\n- An explicit 1m microkernel wrapper has been added to the cntx_t.\r\n- An extra packing kernel for the \"ro\" format has been added, along with\r\n the pack_t enumeration value. This supports the packing for\r\n real*complex -> real, including potential scaling by a complex alpha,\r\n support for structured matrices, etc.\r\n- Extra microkernel wrappers for mixed-domain operations have been added\r\n to support the 'ccr' (and by extension, 'crc'), 'rcc', and 'crr'\r\n cases. Notably this includes full support for general stride storage\r\n and complex alpha/beta.\r\n- Packing kernels and gemm microkernels are now \"templated\" based on two\r\n type parameters rather than one. For packing this allows direct\r\n optimization of mixed-precision kernels, and for gemm microkernels\r\n this allows direct optimization of mixed-precision without writing to\r\n a temporary buffer. Reference packing kernels are directly\r\n instantiated for all mixes of precisions, while by default\r\n mixed-precision gemm microkernels are supported via a microkernel\r\n wrapper. The \"old\" way of specifying optimized kernels using a single\r\n type parameter works unchanged.\r\n- alpha and beta are typecast appropriately to the computational or\r\n output datatype, respectively, and **always** to the complex domain.\r\n Scalar typecasting has also been added to gemmsup for safety.\r\n- The gemm macrokernel doesn't have to do any typecasting anymore, as a\r\n microkernel wrapper or optimized mixed-precision/mixed-domain kernel\r\n now handles this.\r\n- 1m and mixed-domain operations now always use a microkernel wrapper,\r\n rather than adjusting parameters in the gemm macrokernel.\r\n- The gemmt macrokernel **does** still have to handle explicit\r\n write-back of microtiles which intersect the diagonal, although\r\n typecasting has already been performed.\r\n- The gemmt_x_ker_var2(), trmm_xx_ker_var2(), and trsm_xx_ker_var2()\r\n functions have been removed. The appropriate macrokernel pointer is\r\n selected during control tree initialization.\r\n- Real domain MR/NR are checked for even-ness based on the gemm\r\n microkernel's row preference in order to guarantee proper 1m and\r\n mixed-domain operation.\r\n- Full range of mixed-domain/mixed-precision functionality tested in the\r\n testsuite ('input.*.mixed').\r\n\r\n6. Other changes:\r\n- The build system has been updated to support C++ source files\r\n throughout the framework. While the intent is not to add such files to\r\n BLIS itself, this supports plugins written in C++.\r\n- Many instances of configuration-specific code have been simplified by\r\n introducing an INSERT_GENTCONF macro which instantiates a block of\r\n code for each enabled sub-configuration. The ConfigurationHowTo.md\r\n document has been updated accordingly.\r\n- PASTEMAC?/PASTECH?/PASTEF77? have been removed in favor of\r\n variadic macros which accept any number of arguments (up to a\r\n reasonable limit).\r\n- The INSERT_GENTFUNC* macros have been updated to clean up\r\n mixed-precision and mixed-domain instantiations.\r\n- bli_align_dim_to_mult() has been updated to support rounding either up\r\n or down based on a flag.\r\n- Checking for empty matrices and other early exits (level-3 only) has\r\n been consolidated into a single utility function.\r\n- The auxinfo_t struct is always passed as const.\r\n- The new function bli_obj_alias_submatrix() aliases a matrix while also\r\n resetting the root to NULL, offsets to zero (while adjusting the\r\n buffer), and applying any implicit transpose.\r\n- Level-3 pruning functions now only check matrix structure to see what\r\n to do, not the operation family.\r\n- gemmsup packing has been updated to use the \"normal\" pack buffer\r\n allocation routines.\r\n- Remove duplicate checks for early return from gemmsup handler.\r\n- bli_determine_blocksize() has been significantly simplified.\r\n- Partitioning packed panels is no longer allowed.\r\n- Added bli_xxsame macros.\r\n- Automated the calculation of info bit shifts and masks based on\r\n predefined bit sizes for various flags. This greatly simplifies\r\n reordering, adding, or removing flags from the info/info2 bitfields.\r\n- Moved more BLIS_NUM_* macros into the corresponding enums as the\r\n last entry so that the value is automatically computed.\r\n- Better const-correctness in some level0 scalar macros.\r\n- Better mixed-precision support in some level0 scalar macros.\r\n- Added a bli_axpbys_mxn() macro.\r\n- bli_thread_range_sub() takes explicit thread ID and number of threads\r\n rather than a thrinfo_t node.\r\n- \"De-templated\" BLIS gemmlike sandbox (specifically, bls_gemm_bp_var1()\r\n and bls_packm_var1()).\r\n- Combined bls_l3_packm_[ab]() into one function with thin wrappers.\r\n- Deleted bls_packm_var[23]().\r\n- Add a \"termination tag\" to the testsuite output so that\r\n 'make check-blis' can accurately check for successful completion.\r\n- Add a new function to centrally compute FLOPs for level-3 operations\r\n in the testsuite.","shortMessageHtmlLink":"Refactor the control tree and other infrastructure (#710)"}},{"before":"1aac15a4a08e73b71d0328219f9c53a220a0c7cd","after":"a18252d28de85b4bf738aee6e60c4adcb66cf9cc","ref":"refs/heads/new_control_trees","pushedAt":"2024-04-09T17:49:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Revert \"Merge branch 'master' into new_control_trees\" and fix extraneous '=' by hand.\n\nThis reverts commit 1aac15a4a08e73b71d0328219f9c53a220a0c7cd, reversing\nchanges made to d02de7672cce965fc3ef167805a28a8a038555a1.","shortMessageHtmlLink":"Revert \"Merge branch 'master' into new_control_trees\" and fix extrane…"}},{"before":"d02de7672cce965fc3ef167805a28a8a038555a1","after":"1aac15a4a08e73b71d0328219f9c53a220a0c7cd","ref":"refs/heads/new_control_trees","pushedAt":"2024-04-04T15:19:53.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Merge branch 'master' into new_control_trees\n\n# Conflicts:\n#\tframe/base/bli_rntm.h\n#\tframe/include/bli_type_defs.h","shortMessageHtmlLink":"Merge branch 'master' into new_control_trees"}},{"before":"664cc6bc3ea610b4ecea63d78c6024c48f045635","after":"a316d2c6c33fc1f8f7c58c4210ab203f48349041","ref":"refs/heads/master","pushedAt":"2024-03-28T17:52:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Fix incorrect commenting of `BLIS_RNTM_INITIALIZER` and `BLIS_OBJECT_INITIALIZER`.","shortMessageHtmlLink":"Fix incorrect commenting of BLIS_RNTM_INITIALIZER and `BLIS_OBJECT_…"}},{"before":"9bdfd94744f6323adca519721ed4b27c7ec7ac1a","after":null,"ref":"refs/heads/no-designated-initializers","pushedAt":"2024-03-26T21:25:21.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":"1a8c8180b32cf5988bf9eb5d2f0f8111a729993a","after":"664cc6bc3ea610b4ecea63d78c6024c48f045635","ref":"refs/heads/master","pushedAt":"2024-03-26T21:25:17.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Update BLIS_*_INITIALIZER macros for C++ compatibility. (#802)\n\nDetails:\r\n- Remove designated initializer syntax. This isn't officially supported \r\n until C++20.\r\n- Arrange initializers in the order in which they are defined in the \r\n struct. Even with standard or extension support for designated \r\n initializers, initializing non-static members out-of-order is an \r\n error in C++.\r\n- Remove the conditional code which uses '-1' as the default value of \r\n the 'pack_buf' member of 'mem_t' in C, but 'BLIS_BUFFER_FOR_GEN_USE' \r\n in C++. Simply use the latter as a common-sense default.","shortMessageHtmlLink":"Update BLIS_*_INITIALIZER macros for C++ compatibility. (#802)"}},{"before":"01bfc69397365cd4624ff7f42567fd5d538bb44c","after":"d02de7672cce965fc3ef167805a28a8a038555a1","ref":"refs/heads/new_control_trees","pushedAt":"2024-03-26T21:24:02.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Merge branch 'no-designated-initializers' into new_control_trees\n\n# Conflicts:\n#\tframe/include/bli_type_defs.h","shortMessageHtmlLink":"Merge branch 'no-designated-initializers' into new_control_trees"}},{"before":"f63af13d8790578ac376dfbe3e2dca6bfe43a0e5","after":"9bdfd94744f6323adca519721ed4b27c7ec7ac1a","ref":"refs/heads/no-designated-initializers","pushedAt":"2024-03-26T21:17:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Add comments [ci skip].","shortMessageHtmlLink":"Add comments [ci skip]."}},{"before":"d7c32a286be98a512ff65fada3f8dcf0a4573219","after":null,"ref":"refs/heads/stable-oct27-cand1","pushedAt":"2024-03-26T21:01:11.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"}},{"before":null,"after":"a72680080dc446ec4f948a9b6be114f77d5ed8b1","ref":"refs/heads/stable-oct27-cand3","pushedAt":"2024-03-26T20:58:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Omnibus PR - Oct 2023 (#678)\n\nDetails:\n- This is an \"omnibus\" commit, consisting of multiple medium-sized\n commits that affect non-trivial aspects of BLIS. The major highlights:\n - Relocated the pba, sba pool (from the rntm_t), and mem_t (from the\n cntl_t) to the thrinfo_t object. This allows the rntm_t to be\n effectively const (although it is sometimes copied internally and\n modified to reflect different ways of parallelism). Moving the mem_t\n sets the stage for sharing a global control tree amongst all\n threads.\n - De-templatized the macrokernels for gemmt, trmm, and trsm to match\n the macrokernel for gemm, which has been de-templatized since\n 54fa28b.\n - Reimplemented bli_l3_determine_kc() by separating out the logic for\n adjusting KC based on MR/NR for triangular A and/or B into a new\n function, bli_l3_adjust_kc(). For now, this function is still called\n from bli_l3_determine_kc(), but in the future we plan to have it\n called once when constructing the control tree.\n - Refactored the level-3 thread decorator into two parts:\n - One part deals only with launching threads, each one calling a\n generic thread entry function. This code resides in frame/thread\n and constitutes the definition of bli_thread_launch(). Note that\n it is specific to the threading implementation (OpenMP, pthreads,\n single, etc.)\n - The other part deals with passing the matrix operands and related\n information into bli_thread_launch(). This is the \"l3 decorator\"\n and now resides in frame/3. It is agnostic to the threading\n implementation.\n - Modified the \"level\" of the thread control tree passed in at each\n operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was\n passed in a communicator representing the active thread teams which\n would share the available work. Now, the *parent* thread comm is\n passed in. The operation then grabs the child comm and uses it to\n partition the work. The difference is in bli_trsm_blk_var1(), where\n there are now two children nodes for this single operation (i.e. the\n thread control tree is split one level above where the control tree\n is). The sub-prenode is used for the trsm subproblem while the\n normal sub-node is used for the gemm part. Importantly, the parent\n comm is used for the barrier between them.\n- Removed cntl_t* arguments from bli_*_front() functions. These will be\n added back in the future when the control tree's creation is moved so\n that it happens much sooner (provided that bli_*_front() have not been\n absorbed into their respective bli_*_ex() functions).\n- Renamed various bli_thread_*() query functions to bli_thrinfo_*(),\n for consistency. This includes _num_threads(), _thread_id(), _n_way(),\n _work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and\n _am_chief().\n- Removed extraneous barrier from _blk_var3() of gemm and trsm.\n- Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was\n misspelled.\n- (cherry picked from commit aeb5f0cc19665456e990a7ffccdb09da2e3f504b)\n\nFixed performance bug caused by redundant packing. (#680)\n\nDetails:\n- Fixed a performance bug whereby multiple threads were redundantly\n packing the same (rather than separate) micropanels. This bug was\n caused by different parts of the code using the num_threads/thread_id\n field of the thrinfo_t vs. the n_way/work_id fields. The fix was to\n standardize on the latter and provide a \"fake\" thrinfo_t sub-prenode\n in the thrinfo tree which consists of single-member thread teams. The\n single team with multiple threads node is still required since it and\n only it can be used to perform barriers and broadcasts (e.g. of the\n packed buffer pointer).\n- (cherry picked from commit 29f79f030e939969d4f3876c4fdaac7b0c5daa63)\n\nFixed random segfault in test/3 drivers. (#788)\n\nDetails:\n- Fixed a segfault in the non-gemm test drivers in test/3 that was the\n result of sometimes leaving either .n_str or .k_str fields of the\n params_t struct uninitialized, depending on the operation in question.\n For example, in test_hemm.c, init_def_params() would only initialize\n the .m_str and .n_str fields, but not the .k_str field. Even though\n hemm doesn't use a 'k' dimension, the proc_params() function (called\n via parse_cl_params()) universally attempts to convert all three into\n integers via sscanf(), which was understandably failing when one of\n those strings was a NULL pointer. I'm not sure how this code ever\n worked to begin with. Special thanks to Leick Robinson for finding and\n reporting this bug.\n- (cherry picked from commit 1236ddab455ef3a6293ab394ff06b3a19c2913d9)\n\nFixed staleness in kernels/zen/3/bli_gemm_small.c.\n\nDetails:\n- Added missing 'const' keyword in function prototypes for\n bli_gemm_small() and friends.\n- Updated pba usage to reflect new APIs.\n- Fixed syntax typo in 'export GOMP_CPU_AFFINITY' line in ul2128\n conditional of test/3/runme.sh.\n- Thanks to Jeff Diamond for reporting these issues.","shortMessageHtmlLink":"Omnibus PR - Oct 2023 (#678)"}},{"before":"5a602acedae8bd29c534f78841a89df79ab8c8b1","after":"01bfc69397365cd4624ff7f42567fd5d538bb44c","ref":"refs/heads/new_control_trees","pushedAt":"2024-03-26T04:33:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"The `rcc` and `ccr` reference microkernel wrapper files were named incorrectly [ci skip].","shortMessageHtmlLink":"The rcc and ccr reference microkernel wrapper files were named in…"}},{"before":"ecbeab65da4a2d86df078173179598541d9eb1c9","after":"5a602acedae8bd29c534f78841a89df79ab8c8b1","ref":"refs/heads/new_control_trees","pushedAt":"2024-03-25T02:54:16.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Merge branch 'no-designated-initializers' into new_control_trees\n\n# Conflicts:\n#\tframe/include/bli_type_defs.h","shortMessageHtmlLink":"Merge branch 'no-designated-initializers' into new_control_trees"}},{"before":null,"after":"f63af13d8790578ac376dfbe3e2dca6bfe43a0e5","ref":"refs/heads/no-designated-initializers","pushedAt":"2024-03-25T02:49:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Update BLIS_*_INITIALIZER macros for C++ compatibility.\n\n- Remove designated initializer syntax. This isn't officially supported until C++20.\n- Put initializers in the order in which they are defined in the struct. Even with standard or extension support for designated initializers, initializing non-static members out-of-order is an error in C++.\n- Remove the conditional code which uses `-1` as the default value of the `pack_buf` member of `mem_t` in C, but `BLIS_BUFFER_FOR_GEN_USE` in C++. Simply use the latter as a common-sense default.","shortMessageHtmlLink":"Update BLIS_*_INITIALIZER macros for C++ compatibility."}},{"before":"27e19903fa2ac0e3fa5454cef5e38bff6359ff08","after":"ecbeab65da4a2d86df078173179598541d9eb1c9","ref":"refs/heads/new_control_trees","pushedAt":"2024-03-20T21:38:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Fix problem with overriding Makefile rules [ci skip].","shortMessageHtmlLink":"Fix problem with overriding Makefile rules [ci skip]."}},{"before":"d88dbd47e57fc0faffc403f3eb6bce298f611552","after":"27e19903fa2ac0e3fa5454cef5e38bff6359ff08","ref":"refs/heads/new_control_trees","pushedAt":"2024-03-19T23:45:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"devinamatthews","name":"Devin Matthews","path":"/devinamatthews","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5246113?s=80&v=4"},"commit":{"message":"Improve the plugin interface.\n\n- Add support for configuring and building pre-initialized plugins (configure-plugin --build and make) out of the source tree.\n- Fix various issues with C++-based plugins such as premature inclusion of blis.h, C++ language flags, predefined CXX variables with spaces, etc.","shortMessageHtmlLink":"Improve the plugin interface."}},{"before":"0c6d47e1cb04dda48b5a0c85a96893dab208f399","after":"d88dbd47e57fc0faffc403f3eb6bce298f611552","ref":"refs/heads/new_control_trees","pushedAt":"2024-03-05T19:42:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"fgvanzee","name":"Field G. Van Zee","path":"/fgvanzee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/5487570?s=80&v=4"},"commit":{"message":"Fix typos and clarify.","shortMessageHtmlLink":"Fix typos and clarify."}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEPnbOogA","startCursor":null,"endCursor":null}},"title":"Activity · flame/blis"}