mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases/gcc 12 #65
Open
jacopobrusini
wants to merge
2,185
commits into
master
Choose a base branch
from
releases/gcc-12
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Releases/gcc 12 #65
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time. Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place. |
…(LWG 2195) This was approved in Issaquah last month. libstdc++-v3/ChangeLog: * include/bits/regex.h (match_results): Add allocator-extended copy and move constructors, as per LWG 2195. * testsuite/28_regex/match_results/ctors/char/alloc.cc: New test. (cherry picked from commit 9ae1108)
As explained in LWG 3600, we never implemented a C++0x change that made the copy constructor of std::istream_iterator defined as defaulted. That would be an ABI break, so the resolution of LWG 3600 is to not require it to be trivial, but just constexpr and conditionally noexcept. This applies that resolution. libstdc++-v3/ChangeLog: * include/bits/stream_iterator.h (istream_iterator): Add constexpr to copy constructor, as per LWG 3600. * testsuite/24_iterators/istream_iterator/cons/constexpr.cc: Check copy construction. (cherry picked from commit ad0b9cf)
This avoids overwriting tail padding when algorithms like std::copy are used to write a single value through a pointer to a base subobject. The pointer arithmetic on a Base* is valid for N==1, but the copy/move operation needs to be done using assignment, not a memmove or memcpy of sizeof(Base) bytes. Instead of putting a check for N==1 in all of copy, copy_n, move etc. this adds it to the __copy_move and __copy_move_backward partial specializations used for trivially copyable types. When N==1 those partial specializations dispatch to new static member functions of the partial specializations for non-trivial types, so that a copy/move assignment is done appropriately for the _IsMove constant. libstdc++-v3/ChangeLog: PR libstdc++/108846 * include/bits/stl_algobase.h (__copy_move<false, false, RA>) Add __assign_one static member function. (__copy_move<true, false, RA>): Likewise. (__copy_move<IsMove, true, RA>): Do not use memmove for a single value. (__copy_move_backward<IsMove, true, RA>): Likewise. * testsuite/25_algorithms/copy/108846.cc: New test. * testsuite/25_algorithms/copy_backward/108846.cc: New test. * testsuite/25_algorithms/copy_n/108846.cc: New test. * testsuite/25_algorithms/move/108846.cc: New test. * testsuite/25_algorithms/move_backward/108846.cc: New test. (cherry picked from commit 822a11a)
…mpare_three_way [PR113960] The change in r11-2981-g2f983fa69005b6 meant that std::lexicographical_compare_three_way started to use memcmp for unsigned integers on big endian targets, but for that to be valid we need the two value types to have the same size and we need to use that size to compute the length passed to memcmp. I already defined a __is_memcmp_ordered_with trait that does the right checks, std::lexicographical_compare_three_way just needs to use it. libstdc++-v3/ChangeLog: PR libstdc++/113960 * include/bits/stl_algobase.h (__is_byte_iter): Replace with ... (__memcmp_ordered_with): New concept. (lexicographical_compare_three_way): Use __memcmp_ordered_with instead of __is_byte_iter. Use correct length for memcmp. * testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc: New test. (cherry picked from commit f5cdda8)
The incorrect errc constant here looks like a copy&paste error. libstdc++-v3/ChangeLog: PR libstdc++/112089 * include/std/shared_mutex (shared_lock::unlock): Change errc constant to operation_not_permitted. * testsuite/30_threads/shared_lock/locking/112089.cc: New test. (cherry picked from commit 0c305f3)
…R111172] A void template argument would cause a substitution failure when trying to form a reference for the return type, so the function body would never be instantiated. libstdc++-v3/ChangeLog: PR libstdc++/111172 * include/std/variant (get<T>): Remove !is_void static assertions. (cherry picked from commit d19bdf8)
C++20 allows class types as non-type template parameters, but std::integer_sequence explicitly disallows them. Enforce that. libstdc++-v3/ChangeLog: PR libstdc++/112473 * include/bits/utility.h (integer_sequence): Add static_assert. * testsuite/20_util/integer_sequence/112473.cc: New test. (cherry picked from commit 0953497)
The standard says that the implicit copy assignment operator is deprecated for classes that have a user-provided copy constructor, and vice versa. libstdc++-v3/ChangeLog: * include/bits/new_allocator.h (__new_allocator): Define copy assignment operator as defaulted. * include/std/complex (complex<float>, complex<double>) (complex<long double>): Define copy constructor as defaulted. (cherry picked from commit 008e439)
The filesystem code was using these functions without checking for their existence, assuming that any UNIX-like libc with <unistd.h> would always provide them. That's not true for some newlib targets like arm-eabi. libstdc++-v3/ChangeLog: * acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check for mkdir, chmod, chdir, and getcwd. * config.h.in: Regenerate. * configure: Regenerate. * src/c++17/fs_ops.cc (create_dir): Use USE_MKDIR macro. (fs::current_path): Use USE_GETCWD and USE_CHDIR macros. (fs::permissions): Use USE_CHMOD macro. * src/filesystem/ops-common.h [FILESYSTEM_IS_WINDOWS] (chmod, mkdir, getcwd, chdir): Define new macros. [FILESYSTEM_IS_WINDOWS] (chmod, mkdir, getcwd, chdir): Use new macros. * src/filesystem/ops.cc (create_dir): Use USE_MKDIR macro. (fs::current_path): Use USE_GETCWD and USE_CHDIR macros. (fs::permissions): Use USE_CHMOD macro. (cherry picked from commit 5435449)
Unlike the new str()&& members in <sstream>, there is no real difficulty in supporting the new view() members for the old std::string ABI. Enabling it fixes errors in <chrono> where std::ostringstream::view() is used by ostream insertion operators for calendar types. We just need to use [[gnu::always_inline]] on the view() members for the old ABI, because the library doesn't contain instantiations of them for the old ABI. Making them always inline avoids needing to add those instantiations and export them. libstdc++-v3/ChangeLog: * include/std/sstream (basic_stringbuf::view): Define for old std::string ABI. (basic_istringstream::view, basic_stringstream::view) (basic_stringstream::view): Likewise. * testsuite/27_io/basic_istringstream/view/char/1.cc: Remove { dg-require-effective-target cxx11_abi }. * testsuite/27_io/basic_istringstream/view/wchar_t/1.cc: Likewise. * testsuite/27_io/basic_ostringstream/view/char/1.cc: Likewise. * testsuite/27_io/basic_ostringstream/view/wchar_t/1.cc: Likewise. * testsuite/27_io/basic_stringbuf/view/char/1.cc: Likewise. * testsuite/27_io/basic_stringbuf/view/wchar_t/1.cc: Likewise. * testsuite/27_io/basic_stringstream/view/char/1.cc: Likewise. * testsuite/27_io/basic_stringstream/view/wchar_t/1.cc: Likewise. (cherry picked from commit 331b4f1)
The PR points out that we assume the match_results allocator is default constuctible, which might not be true. We also have a related issue with unwanted propagation from an object that might have an unequal allocator. Ideally we use the same allocator type for _State_info::_M_match_queue but that would be an ABI change now. We should investigate if that can be done without breaking anything, which might be possible because the _Executor object is short-lived and never leaks out of the regex_match, regex_search, and regex_replace algorithms. If we change the mangled name for _Executor then there would be no ODR violations when mixing old and new definitions. This commit does not attempt that. libstdc++-v3/ChangeLog: PR libstdc++/107376 * include/bits/regex_executor.h (_Executor::_Executor): Use same allocator for _M_cur_results and _M_results. * include/bits/regex_executor.tcc (_Executor::_M_main_dispatch): Prevent possibly incorrect allocator propagating to _M_cur_results. * testsuite/28_regex/algorithms/regex_match/107376.cc: New test. (cherry picked from commit 988dd22)
egrep has been deprecated in favor of grep -E for a long time, and the next grep release (3.8 or 4.0) will print a warning of egrep is used. Stop using egrep so we won't see the warning. grep's from GNU, BSD (including Mac OS X), AIX, BusyBox all support -E and -F. Solaris grep doesn't support -E, but extract_symvers.in already contains a special case for Solaris and doxygen documentation generation is already broken on non-GNU. libstdc++-v3/ChangeLog: * scripts/extract_symvers.in: Use grep -E instead of egrep. * scripts/run_doxygen: Likewise. (cherry picked from commit fa4e979)
libstdc++-v3/ChangeLog: * include/bits/ptr_traits.h: Add some doxygen comments. (cherry picked from commit 757146f)
Add a simpler definition of std::__detected_or using concepts. This also replaces the __detector::value_t member which should have been using a reserved name. Use __detected_or in pointer_traits. libstdc++-v3/ChangeLog: * include/bits/alloc_traits.h (allocator_traits::is_always_equal): Only instantiate is_empty if needed. * include/bits/ptr_traits.h (__ptr_traits_impl::difference_type) (__ptr_traits_impl::rebind): Use __detected_or. * include/experimental/type_traits (is_same_v): Add a partial specialization instead of instantiating the std::is_same class template. (detected_t): Redefine in terms of detected_or_t. (is_detected, is_detected_v): Redefine in terms of detected_t. * include/std/type_traits [__cpp_concepts] (__detected_or): Add new definition using concepts. (__detector::value_t): Rename to __is_detected. * testsuite/17_intro/names.cc: Check value_t isn't used. (cherry picked from commit 2b667be)
It was pointed out in recent LWG 3545 discussion that having a constrained partial specialization of std::pointer_traits can cause ambiguities with program-defined specializations. For example, the addition to the testcase has: template<typename P> requires std::derived_from<P, base_type struct std::pointer_traits<P>; This would be ambiguous with the library's own constrained partial specialization: template<typename Ptr> requires requires { typename Ptr::element_type; } struct std::pointer_traits<Ptr>; Neither specialization is more specialized than the other for a type that is derived from base_type and also has an element_type member. The solution is to remove the library's partial specialization, and do the check for Ptr::element_type in the __ptr_traits_elem helper (which is what we already do for !__cpp_concepts anyway). libstdc++-v3/ChangeLog: * include/bits/ptr_traits.h (__ptr_traits_elem) [__cpp_concepts]: Also define the __ptr_traits_elem class template for the concepts case. (pointer_traits<Ptr>): Remove constrained partial specialization. * testsuite/20_util/pointer_traits/lwg3545.cc: Check for ambiguitiy with program-defined partial specialization. (cherry picked from commit 03cb9ed)
libstdc++-v3/ChangeLog: * include/bits/fs_path.h (__is_path_iter_src): Replace class template with variable template. (cherry picked from commit 6177f60)
Before Doxygen version 1.9.2 this option is broken (see doxygen/doxygen#8638 for more details) and classes are not added to the correct groups by @InGroup and @addtogroup. Also remove the obsolete CLASS_DIAGRAMS option that causes a warning. libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (GROUP_NESTED_COMPOUNDS): Set to NO. (CLASS_DIAGRAMS): Remove obsolete option. (cherry picked from commit 9c3a8fe)
Use macros to open and close the inline namespace _V2 that is used for ABI versioning of individual components such as chrono::system_clock. This allows the namespace to be hidden in the docs generated by Doxygen, so that we document std::foo instead of std::_V2::foo. This also makes it easy to remove that namespace entirely for the gnu-versioned-namespace build, where everything is already versioned as std::__8 and there are no backwards compatibility guarantees. libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (PREDEFINED): Expand new macros to nothing. * include/bits/c++config (_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE) (_GLIBCXX_END_INLINE_ABI_NAMESPACE): Define new macros. * include/bits/algorithmfwd.h (_V2::__rotate): Use new macros for the namespace. * include/bits/chrono.h (chrono::_V2::system_clock): Likewise. * include/bits/stl_algo.h (_V2::__rotate): Likewise. * include/std/condition_variable (_V2::condition_variable_any): Likewise. * include/std/system_error (_V2::error_category): Likewise. (cherry picked from commit e4905f1)
The src/c++11/compatibility*-c++0x.cc files define symbols that need to be exported for ancient versions of libstdc++.so.6 due to changes between C++0x and the final C++11 standard. Those symbols are not needed in the libstdc++.so.8 library, and we can skip building them entirely. This also fixes the build failure I introduced last week when making the versioned namespace config not use the _V2 namespace for compat symbols. libstdc++-v3/ChangeLog: * src/Makefile.am [ENABLE_SYMVERS_GNU_NAMESPACE] (cxx11_sources): Do not build the compatibility*-c++0x.cc objects. * src/Makefile.in: Regenerate. * src/c++11/compatibility-c++0x.cc [_GLIBCXX_INLINE_VERSION]: Refuse to build for the versioned namespace. * src/c++11/compatibility-chrono.cc: Likewise. * src/c++11/compatibility-condvar.cc: Likewise. * src/c++11/compatibility-thread-c++0x.cc: Likewise. * src/c++11/chrono.cc (system_clock, steady_clock): Use macros to define in inline namespace _V2, matching the declarations in <system_error>. * src/c++11/system_error.cc (system_category, generic_category): Likewise. (cherry picked from commit 357d6fc)
libstdc++-v3/ChangeLog: * include/std/atomic: Suppress doxygen docs for implementation details. * include/bits/atomic_base.h: Likewise. * include/bits/shared_ptr_atomic.h: Use markdown. Fix grouping so that std::atomic is not added to the pointer abstractions group. (cherry picked from commit 1566ca0)
Add @headerfile and @SInCE tags. Improve grouping of non-member functions via @relates tags. Mark the std::pair base class of std::sub_match as undocumented, so that the docs don't show all the related non-member functions are part of the sub_match API. Use a new macro to re-add the data members for Doxygen only. libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (PREDEFINED): Define macro _GLIBCXX_DOXYGEN_ONLY to expand its argument. * include/bits/c++config (_GLIBCXX_DOXYGEN_ONLY): Define. * include/bits/regex.h: Improve doxygen docs. * include/bits/regex_constants.h: Likewise. * include/bits/regex_error.h: Likewise. (cherry picked from commit 1b01963)
libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (PREDEFINED): Define __allocator_base so that Doxygen shows the right base-class for std::allocator. * include/bits/alloc_traits.h: Improve doxygen docs. * include/bits/allocator.h: Likewise. * include/bits/new_allocator.h: Likewise. * include/ext/new_allocator.h: Likewise. (cherry picked from commit 171f41f)
libstdc++-v3/ChangeLog: * include/bits/ostream_insert.h: Mark helper functions as undocumented by Doxygen. * include/bits/stl_algo.h: Use markdown for formatting and mark helper functions as undocumented. * include/bits/stl_numeric.h: Likewise. * include/bits/stl_pair.h (pair): Add @headerfile. (cherry picked from commit e614925)
libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (PREDEFINED): Define _GLIBCXX23_CONSTEXPR macro. * include/backward/auto_ptr.h (auto_ptr): Use @deprecated. * include/bits/unique_ptr.h (default_delete): Use @SInCE and @headerfile. * include/std/scoped_allocator: Remove @InGroup from @file block. (cherry picked from commit a278402)
libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (PREDEFINED): Define _GTHREAD_USE_MUTEX_TIMEDLOCK macro. * include/bits/std_mutex.h (mutex, lock_guard): Use @SInCE and @headerfile. * include/bits/unique_lock.h (unique_lock): Likewise. * include/std/mutex (recursive_mutex, timed_mutex) (recursive_timed_mutex, scoped_lock): Likewise. (cherry picked from commit b584cbd)
Fix some problems noticed with -Wsystem-headers. libstdc++-v3/ChangeLog: * include/bits/stl_tempbuf.h (_Temporary_buffer): Disable warnings about get_temporary_buffer being deprecated. * include/ext/functional (mem_fun1, mem_fun1_ref): Disable warnings about mem_fun1_t, const_mem_fun1_t, mem_fun1_ref_t and const_mem_fun1_ref_t being deprecated. * include/std/spanstream (basic_spanbuf::setbuf): Add assertion and adjust to avoid narrowing warning. * libsupc++/exception_ptr.h [!__cpp_rtti && !__cpp_exceptions] (make_exception_ptr): Add missing inline specifier. (cherry picked from commit 8f6d25f)
libstdc++-v3/ChangeLog: * testsuite/20_util/headers/memory/synopsis.cc: Add declarations from C++11 and later. (cherry picked from commit 980aa91)
libstdc++-v3/ChangeLog: * testsuite/18_support/new_nothrow.cc: Add missing noexcept to operator delete replacements. * testsuite/20_util/any/cons/92156.cc: Disable -Winit-list-lifetime warnings from instantiating invalid specialization of manager function. * testsuite/20_util/any/modifiers/92156.cc: Likewise. * testsuite/20_util/default_delete/void_neg.cc: Prune additional diagnostics. * testsuite/20_util/headers/memory/synopsis.cc: Add missing noexcept. * testsuite/20_util/shared_ptr/cons/void_neg.cc: Prune additional diagnostic. * testsuite/20_util/unique_ptr/creation/for_overwrite.cc: Add missing noexcept to operator delete replacements. * testsuite/21_strings/basic_string/cons/char/103919.cc: Likewise. * testsuite/23_containers/map/modifiers/emplace/92300.cc: Likewise. * testsuite/23_containers/map/modifiers/insert/92300.cc: Likewise. * testsuite/24_iterators/headers/iterator/range_access_c++11.cc: Add missing noexcept to synopsis declarations. * testsuite/24_iterators/headers/iterator/range_access_c++14.cc: Likewise. * testsuite/24_iterators/headers/iterator/range_access_c++17.cc: Likewise. (cherry picked from commit bbcb84b)
libstdc++-v3/ChangeLog: * include/std/tuple: Add better Doxygen comments. (cherry picked from commit 94f7baf)
We were failing to handle ANNOTATE_EXPR in tsubst_copy_and_build, leading to problems with substitution of any wrapped expressions. Let's also not tell users that lambda templates are available in C++14. PR c++/111529 gcc/cp/ChangeLog: * parser.cc (cp_parser_lambda_declarator_opt): Don't suggest -std=c++14 for lambda templates. * pt.cc (tsubst_expr): Move ANNOTATE_EXPR handling... (tsubst_copy_and_build): ...here. gcc/testsuite/ChangeLog: * g++.dg/ext/unroll-4.C: New test. (cherry picked from commit 9c62af1)
r12-10468-g19827831516023 added the ANNOTATE_EXPR in the wrong place, leading to ICEs on several testcases. gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build): Move ANNOTATE_EXPR out of fallthrough path.
The requirement that a type argument be complete is excessive in the case of direct reference binding to the same type, which does not rely on any properties of the type. This is LWG 2939. PR c++/100667 gcc/cp/ChangeLog: * semantics.cc (same_type_ref_bind_p): New. (finish_trait_expr): Use it. gcc/testsuite/ChangeLog: * g++.dg/ext/is_constructible8.C: New test. (cherry picked from commit 8bb3ef3)
This is a manual backport of r14-9840-g1162861439fd3c from master. Manual because the bits and value range representation in jump functions have changes during the gcc 14 development cycle. In PR 113907 comment #58, Honza found a case where ICF thinks bodies of functions are equivalent but becaise of difference in aliases in a memory access, different aggregate jump functions are associated with supposedly equivalent call statements. This patch adds a way to compare jump functions and plugs it into ICF to avoid the issue. gcc/ChangeLog: 2024-05-14 Martin Jambor <mjambor@suse.cz> PR ipa/113907 * ipa-prop.h (ipa_jump_functions_equivalent_p): Declare. (values_equal_for_ipcp_p): Likewise. * ipa-prop.cc (ipa_agg_pass_through_jf_equivalent_p): New function. (ipa_agg_jump_functions_equivalent_p): Likewise. (ipa_jump_functions_equivalent_p): Likewise. * ipa-cp.cc (values_equal_for_ipcp_p): Make function public. * ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h, ipa-cp.h and ipa-prop.h. (func_checker::compare_gimple_call): Comapre jump functions. gcc/testsuite/ChangeLog: 2024-05-10 Martin Jambor <mjambor@suse.cz> PR ipa/113907 * gcc.dg/lto/pr113907_0.c: New. * gcc.dg/lto/pr113907_1.c: Likewise. * gcc.dg/lto/pr113907_2.c: Likewise. (cherry picked from commit 1db45e8)
PR fortran/115150 gcc/fortran/ChangeLog: * trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE for zero-size arrays gcc/testsuite/ChangeLog: * gfortran.dg/shape_12.f90: New test. (cherry picked from commit b701306)
…tization [PR115172] The following testcase is miscompiled, because -fsanitize=bool,enum creates a MEM_REF without propagating there address space qualifiers, so what should be normally loaded using say %gs:/%fs: segment prefix isn't. Together with asan it then causes that load to be sanitized. 2024-05-22 Jakub Jelinek <jakub@redhat.com> PR sanitizer/115172 * ubsan.cc (instrument_bool_enum_load): If rhs is not in generic address space, use qualified version of utype with the right address space. Formatting fix. * gcc.dg/asan/pr115172.c: New test. (cherry picked from commit d3c506e)
PR Target/84790. The gp init sequence li $2,%hi(_gp_disp) addiu $3,$pc,%lo(_gp_disp) sll $2,16 addu $2,$3 is generated directly in `mips_output_function_prologue`, and does not appear in the RTL. So the IRA/IPA passes are not aware that $2/$3 have been clobbered, so they may be used for cross (local) function call. Let's mark $2/$3 clobber both: - Just after the UNSPEC_GP RTL of a function; - Just after a function call. Reported-by: Matthias Schiffer <mschiffer@universe-factory.net> Origin-Patch-by: Felix Fietkau <nbd@nbd.name>. gcc * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered. (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP. (cherry picked from commit 915440e)
Link to the docs for GCC trunk instead. For the release branches, the link should be to the docs for appropriate release branch. Also replace the incomplete/outdated list of explicit -std options with a single entry for the -std option. libstdc++-v3/ChangeLog: PR libstdc++/115269 * doc/xml/manual/using.xml: Replace link to gcc-4.3.2 docs. Replace list of -std=... options with a single entry for -std. * doc/html/manual/using.html: Regenerate. (cherry picked from commit b460ede)
any_divmod instructions are modelled with invalid RTX: [(set (match_operand:DI 0 "register_operand" "=c") (sign_extend:DI (match_operator:SI 3 "divmod_operator" [(match_operand:DI 1 "register_operand" "a") (match_operand:DI 2 "register_operand" "b")]))) (clobber (reg:DI 23)) (clobber (reg:DI 28))] where SImode divmod_operator (div,mod,udiv,umod) has DImode operands. Wrap input operand with truncate:SI to make machine modes consistent. PR target/115297 gcc/ChangeLog: * config/alpha/alpha.md (<any_divmod:code>si3): Wrap DImode operands 3 and 4 with truncate:SI RTX. (*divmodsi_internal_er): Ditto for operands 1 and 2. (*divmodsi_internal_er_1): Ditto. (*divmodsi_internal): Ditto. * config/alpha/constraints.md ("b"): Correct register number in the description. gcc/testsuite/ChangeLog: * gcc.target/alpha/pr115297.c: New test. (cherry picked from commit 0ac8020)
create_intersect_range_checks checks whether two access ranges a and b are alias-free using something equivalent to: end_a <= start_b || end_b <= start_a It has two ways of doing this: a "vanilla" way that calculates the exact exclusive end pointers, and another way that uses the last inclusive aligned pointers (and changes the comparisons accordingly). The comment for the latter is: /* Calculate the minimum alignment shared by all four pointers, then arrange for this alignment to be subtracted from the exclusive maximum values to get inclusive maximum values. This "- min_align" is cumulative with a "+ access_size" in the calculation of the maximum values. In the best (and common) case, the two cancel each other out, leaving us with an inclusive bound based only on seg_len. In the worst case we're simply adding a smaller number than before. The problem is that the associated code implicitly assumed that the access size was a multiple of the pointer alignment, and so the alignment could be carried over to the exclusive end pointer. The testcase started failing after g:9fa5b473b5b8e289b6542 because that commit improved the alignment information for the accesses. gcc/ PR tree-optimization/115192 * tree-data-ref.cc (create_intersect_range_checks): Take the alignment of the access sizes into account. gcc/testsuite/ PR tree-optimization/115192 * gcc.dg/vect/pr115192.c: New test. (cherry picked from commit a0fe4fb)
This was another PR caused by the way that vect_determine_precisions_from_range handles shifts. We tried to narrow 32768 >> x to a 16-bit shift based on range information for the inputs and outputs, with vect_recog_over_widening_pattern (after PR110828) adjusting the shift amount. But this doesn't work for the case where x is in [16, 31], since then 32-bit 32768 >> x is a well-defined zero, whereas no well-defined 16-bit 32768 >> y will produce 0. We could perhaps generate x < 16 ? 32768 >> x : 0 instead, but since vect_determine_precisions_from_range was never really supposed to rely on fix-ups, it seems better to fix that instead. The patch also makes the code more selective about which codes can be narrowed based on input and output ranges. This showed that vect_truncatable_operation_p was missing cases for BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR (equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1). pr113281-1.c is the original testcase. pr113281-[23].c failed before the patch due to overly optimistic narrowing. pr113281-[45].c previously passed and are meant to protect against accidental optimisation regressions. gcc/ PR target/113281 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove workaround for right shifts. (vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR. (vect_determine_precisions_from_range): Be more selective about which codes can be narrowed based on their input and output ranges. For shifts, require at least one more bit of precision than the maximum shift amount. gcc/testsuite/ PR target/113281 * gcc.dg/vect/pr113281-1.c: New test. * gcc.dg/vect/pr113281-2.c: Likewise. * gcc.dg/vect/pr113281-3.c: Likewise. * gcc.dg/vect/pr113281-4.c: Likewise. * gcc.dg/vect/pr113281-5.c: Likewise. (cherry picked from commit 1a8261e)
For the testcase in PR113910 we spend a lot of time in PTA comparing bitmaps for looking up equivalence class members. This points to the very weak bitmap_hash function which effectively hashes set and a subset of not set bits. The major problem with it is that it simply truncates the BITMAP_WORD sized intermediate hash to hashval_t which is unsigned int, effectively not hashing half of the bits. This reduces the compile-time for the testcase from tens of minutes to 42 seconds and PTA time from 99% to 46%. PR tree-optimization/113910 * bitmap.cc (bitmap_hash): Mix the full element "hash" to the hashval_t hash. (cherry picked from commit ad7a365)
…uctions The following fixes a bug that manifests itself during fold-left reduction transform in picking not the last scalar def to replace and thus double-counting some elements. But the underlying issue is that we merge a load permutation into the in-order reduction which is of course wrong. Now, reduction analysis has not yet been performend when optimizing permutations so we have to resort to check that ourselves. PR tree-optimization/110381 * tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts): Materialize permutes before fold-left reductions. * gcc.dg/vect/pr110381.c: New testcase. (cherry picked from commit 53d6f57)
The following fixes a stray TYPE_ALIAS_SET in a type variant built by build_opaque_vector_type which is diagnosed by type checking enabled with -flto. PR middle-end/112732 * tree.cc (build_opaque_vector_type): Reset TYPE_ALIAS_SET of the newly built type. (cherry picked from commit f26d68d)
This testcase was fixed by r14-5934-gf26d68d5d128c8 but we should add one to make sure it does not regress again. Committed as obvious after a quick test on the testcase. PR c++/97990 gcc/testsuite/ChangeLog: * g++.dg/torture/vector-struct-1.C: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> (cherry picked from commit 5f1438d)
this patch disables use of FMA in matrix multiplication loop for generic (for x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U. For Intel this is neutral both on the matrix multiplication microbenchmark (attached) and spec2k17 where the difference was within noise for Core. On core the micro-benchmark runs as follows: With FMA: 578,500,241 cycles:u # 3.645 GHz ( +- 0.12% ) 753,318,477 instructions:u # 1.30 insn per cycle ( +- 0.00% ) 125,417,701 branches:u # 790.227 M/sec ( +- 0.00% ) 0.159146 +- 0.000363 seconds time elapsed ( +- 0.23% ) No FMA: 577,573,960 cycles:u # 3.514 GHz ( +- 0.15% ) 878,318,479 instructions:u # 1.52 insn per cycle ( +- 0.00% ) 125,417,702 branches:u # 763.035 M/sec ( +- 0.00% ) 0.164734 +- 0.000321 seconds time elapsed ( +- 0.19% ) So the cycle count is unchanged and discrete multiply+add takes same time as FMA. While on zen: With FMA: 484875179 cycles:u # 3.599 GHz ( +- 0.05% ) (82.11%) 752031517 instructions:u # 1.55 insn per cycle 125106525 branches:u # 928.712 M/sec ( +- 0.03% ) (85.09%) 128356 branch-misses:u # 0.10% of all branches ( +- 0.06% ) (83.58%) No FMA: 375875209 cycles:u # 3.592 GHz ( +- 0.08% ) (80.74%) 875725341 instructions:u # 2.33 insn per cycle 124903825 branches:u # 1.194 G/sec ( +- 0.04% ) (84.59%) 0.105203 +- 0.000188 seconds time elapsed ( +- 0.18% ) The diffrerence is that Cores understand the fact that fmadd does not need all three parameters to start computation, while Zen cores doesn't. Since this seems noticeable win on zen and not loss on Core it seems like good default for generic. float a[SIZE][SIZE]; float b[SIZE][SIZE]; float c[SIZE][SIZE]; void init(void) { int i, j, k; for(i=0; i<SIZE; ++i) { for(j=0; j<SIZE; ++j) { a[i][j] = (float)i + j; b[i][j] = (float)i - j; c[i][j] = 0.0f; } } } void mult(void) { int i, j, k; for(i=0; i<SIZE; ++i) { for(j=0; j<SIZE; ++j) { for(k=0; k<SIZE; ++k) { c[i][j] += a[i][k] * b[k][j]; } } } } int main(void) { clock_t s, e; init(); s=clock(); mult(); e=clock(); printf(" mult took %10d clocks\n", (int)(e-s)); return 0; } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS, X86_TUNE_AVOID_256FMA_CHAINS): Enable for znver4 and Core. (cherry picked from commit 467cc39)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for Apple Silicon!!!