Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases/gcc 12 #65

Open
wants to merge 2,185 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,185 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

jwakely and others added 29 commits March 18, 2024 13:50
…(LWG 2195)

This was approved in Issaquah last month.

libstdc++-v3/ChangeLog:

	* include/bits/regex.h (match_results): Add allocator-extended
	copy and move constructors, as per LWG 2195.
	* testsuite/28_regex/match_results/ctors/char/alloc.cc: New test.

(cherry picked from commit 9ae1108)
As explained in LWG 3600, we never implemented a C++0x change that made
the copy constructor of std::istream_iterator defined as defaulted. That
would be an ABI break, so the resolution of LWG 3600 is to not require
it to be trivial, but just constexpr and conditionally noexcept. This
applies that resolution.

libstdc++-v3/ChangeLog:

	* include/bits/stream_iterator.h (istream_iterator): Add
	constexpr to copy constructor, as per LWG 3600.
	* testsuite/24_iterators/istream_iterator/cons/constexpr.cc:
	Check copy construction.

(cherry picked from commit ad0b9cf)
This avoids overwriting tail padding when algorithms like std::copy are
used to write a single value through a pointer to a base subobject.

The pointer arithmetic on a Base* is valid for N==1, but the copy/move
operation needs to be done using assignment, not a memmove or memcpy of
sizeof(Base) bytes.

Instead of putting a check for N==1 in all of copy, copy_n, move etc.
this adds it to the __copy_move and __copy_move_backward partial
specializations used for trivially copyable types. When N==1 those
partial specializations dispatch to new static member functions of the
partial specializations for non-trivial types, so that a copy/move
assignment is done appropriately for the _IsMove constant.

libstdc++-v3/ChangeLog:

	PR libstdc++/108846
	* include/bits/stl_algobase.h (__copy_move<false, false, RA>)
	Add __assign_one static member function.
	(__copy_move<true, false, RA>): Likewise.
	(__copy_move<IsMove, true, RA>): Do not use memmove for a single
	value.
	(__copy_move_backward<IsMove, true, RA>): Likewise.
	* testsuite/25_algorithms/copy/108846.cc: New test.
	* testsuite/25_algorithms/copy_backward/108846.cc: New test.
	* testsuite/25_algorithms/copy_n/108846.cc: New test.
	* testsuite/25_algorithms/move/108846.cc: New test.
	* testsuite/25_algorithms/move_backward/108846.cc: New test.

(cherry picked from commit 822a11a)
…mpare_three_way [PR113960]

The change in r11-2981-g2f983fa69005b6 meant that
std::lexicographical_compare_three_way started to use memcmp for
unsigned integers on big endian targets, but for that to be valid we
need the two value types to have the same size and we need to use that
size to compute the length passed to memcmp.

I already defined a __is_memcmp_ordered_with trait that does the right
checks, std::lexicographical_compare_three_way just needs to use it.

libstdc++-v3/ChangeLog:

	PR libstdc++/113960
	* include/bits/stl_algobase.h (__is_byte_iter): Replace with ...
	(__memcmp_ordered_with): New concept.
	(lexicographical_compare_three_way): Use __memcmp_ordered_with
	instead of __is_byte_iter. Use correct length for memcmp.
	* testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc:
	New test.

(cherry picked from commit f5cdda8)
The incorrect errc constant here looks like a copy&paste error.

libstdc++-v3/ChangeLog:

	PR libstdc++/112089
	* include/std/shared_mutex (shared_lock::unlock): Change errc
	constant to operation_not_permitted.
	* testsuite/30_threads/shared_lock/locking/112089.cc: New test.

(cherry picked from commit 0c305f3)
…R111172]

A void template argument would cause a substitution failure when trying
to form a reference for the return type, so the function body would
never be instantiated.

libstdc++-v3/ChangeLog:

	PR libstdc++/111172
	* include/std/variant (get<T>): Remove !is_void static
	assertions.

(cherry picked from commit d19bdf8)
C++20 allows class types as non-type template parameters, but
std::integer_sequence explicitly disallows them. Enforce that.

libstdc++-v3/ChangeLog:

	PR libstdc++/112473
	* include/bits/utility.h (integer_sequence): Add static_assert.
	* testsuite/20_util/integer_sequence/112473.cc: New test.

(cherry picked from commit 0953497)
The standard says that the implicit copy assignment operator is
deprecated for classes that have a user-provided copy constructor, and
vice versa.

libstdc++-v3/ChangeLog:

	* include/bits/new_allocator.h (__new_allocator): Define copy
	assignment operator as defaulted.
	* include/std/complex (complex<float>, complex<double>)
	(complex<long double>): Define copy constructor as defaulted.

(cherry picked from commit 008e439)
The filesystem code was using these functions without checking for their
existence, assuming that any UNIX-like libc with <unistd.h> would always
provide them. That's not true for some newlib targets like arm-eabi.

libstdc++-v3/ChangeLog:

	* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check for mkdir,
	chmod, chdir, and getcwd.
	* config.h.in: Regenerate.
	* configure: Regenerate.
	* src/c++17/fs_ops.cc (create_dir): Use USE_MKDIR macro.
	(fs::current_path): Use USE_GETCWD and USE_CHDIR macros.
	(fs::permissions): Use USE_CHMOD macro.
	* src/filesystem/ops-common.h [FILESYSTEM_IS_WINDOWS]
	(chmod, mkdir, getcwd, chdir): Define new macros.
	[FILESYSTEM_IS_WINDOWS] (chmod, mkdir, getcwd, chdir): Use
	new macros.
	* src/filesystem/ops.cc (create_dir): Use USE_MKDIR macro.
	(fs::current_path): Use USE_GETCWD and USE_CHDIR macros.
	(fs::permissions): Use USE_CHMOD macro.

(cherry picked from commit 5435449)
Unlike the new str()&& members in <sstream>, there is no real difficulty
in supporting the new view() members for the old std::string ABI.
Enabling it fixes errors in <chrono> where std::ostringstream::view() is
used by ostream insertion operators for calendar types.

We just need to use [[gnu::always_inline]] on the view() members for the
old ABI, because the library doesn't contain instantiations of them for
the old ABI. Making them always inline avoids needing to add those
instantiations and export them.

libstdc++-v3/ChangeLog:

	* include/std/sstream  (basic_stringbuf::view): Define for old
	std::string ABI.
	(basic_istringstream::view, basic_stringstream::view)
	(basic_stringstream::view): Likewise.
	* testsuite/27_io/basic_istringstream/view/char/1.cc: Remove
	{ dg-require-effective-target cxx11_abi }.
	* testsuite/27_io/basic_istringstream/view/wchar_t/1.cc:
	Likewise.
	* testsuite/27_io/basic_ostringstream/view/char/1.cc: Likewise.
	* testsuite/27_io/basic_ostringstream/view/wchar_t/1.cc:
	Likewise.
	* testsuite/27_io/basic_stringbuf/view/char/1.cc: Likewise.
	* testsuite/27_io/basic_stringbuf/view/wchar_t/1.cc: Likewise.
	* testsuite/27_io/basic_stringstream/view/char/1.cc: Likewise.
	* testsuite/27_io/basic_stringstream/view/wchar_t/1.cc:
	Likewise.

(cherry picked from commit 331b4f1)
The PR points out that we assume the match_results allocator is default
constuctible, which might not be true. We also have a related issue with
unwanted propagation from an object that might have an unequal
allocator.

Ideally we use the same allocator type for _State_info::_M_match_queue
but that would be an ABI change now. We should investigate if that can
be done without breaking anything, which might be possible because the
_Executor object is short-lived and never leaks out of the regex_match,
regex_search, and regex_replace algorithms. If we change the mangled
name for _Executor then there would be no ODR violations when mixing old
and new definitions. This commit does not attempt that.

libstdc++-v3/ChangeLog:

	PR libstdc++/107376
	* include/bits/regex_executor.h (_Executor::_Executor): Use same
	allocator for _M_cur_results and _M_results.
	* include/bits/regex_executor.tcc (_Executor::_M_main_dispatch):
	Prevent possibly incorrect allocator propagating to
	_M_cur_results.
	* testsuite/28_regex/algorithms/regex_match/107376.cc: New test.

(cherry picked from commit 988dd22)
egrep has been deprecated in favor of grep -E for a long time, and the
next grep release (3.8 or 4.0) will print a warning of egrep is used.
Stop using egrep so we won't see the warning.

grep's from GNU, BSD (including Mac OS X), AIX, BusyBox all support -E
and -F.  Solaris grep doesn't support -E, but extract_symvers.in already
contains a special case for Solaris and doxygen documentation generation
is already broken on non-GNU.

libstdc++-v3/ChangeLog:

	* scripts/extract_symvers.in: Use grep -E instead of egrep.
	* scripts/run_doxygen: Likewise.

(cherry picked from commit fa4e979)
libstdc++-v3/ChangeLog:

	* include/bits/ptr_traits.h: Add some doxygen comments.

(cherry picked from commit 757146f)
Add a simpler definition of std::__detected_or using concepts.  This
also replaces the __detector::value_t member which should have been using
a reserved name.

Use __detected_or in pointer_traits.

libstdc++-v3/ChangeLog:

	* include/bits/alloc_traits.h (allocator_traits::is_always_equal):
	Only instantiate is_empty if needed.
	* include/bits/ptr_traits.h (__ptr_traits_impl::difference_type)
	(__ptr_traits_impl::rebind): Use __detected_or.
	* include/experimental/type_traits (is_same_v): Add a partial
	specialization instead of instantiating the std::is_same class
	template.
	(detected_t): Redefine in terms of detected_or_t.
	(is_detected, is_detected_v): Redefine in terms of detected_t.
	* include/std/type_traits [__cpp_concepts] (__detected_or): Add
	new definition using concepts.
	(__detector::value_t): Rename to __is_detected.
	* testsuite/17_intro/names.cc: Check value_t isn't used.

(cherry picked from commit 2b667be)
It was pointed out in recent LWG 3545 discussion that having a
constrained partial specialization of std::pointer_traits can cause
ambiguities with program-defined specializations. For example, the
addition to the testcase has:

template<typename P> requires std::derived_from<P, base_type
struct std::pointer_traits<P>;

This would be ambiguous with the library's own constrained partial
specialization:

template<typename Ptr> requires requires { typename Ptr::element_type; }
struct std::pointer_traits<Ptr>;

Neither specialization is more specialized than the other for a type
that is derived from base_type and also has an element_type member.

The solution is to remove the library's partial specialization, and do
the check for Ptr::element_type in the __ptr_traits_elem helper (which
is what we already do for !__cpp_concepts anyway).

libstdc++-v3/ChangeLog:

	* include/bits/ptr_traits.h (__ptr_traits_elem) [__cpp_concepts]:
	Also define the __ptr_traits_elem class template for the
	concepts case.
	(pointer_traits<Ptr>): Remove constrained partial
	specialization.
	* testsuite/20_util/pointer_traits/lwg3545.cc: Check for
	ambiguitiy with program-defined partial specialization.

(cherry picked from commit 03cb9ed)
libstdc++-v3/ChangeLog:

	* include/bits/fs_path.h (__is_path_iter_src): Replace class
	template with variable template.

(cherry picked from commit 6177f60)
Before Doxygen version 1.9.2 this option is broken (see
doxygen/doxygen#8638 for more details) and
classes are not added to the correct groups by @InGroup and @addtogroup.

Also remove the obsolete CLASS_DIAGRAMS option that causes a warning.

libstdc++-v3/ChangeLog:

	* doc/doxygen/user.cfg.in (GROUP_NESTED_COMPOUNDS): Set to NO.
	(CLASS_DIAGRAMS): Remove obsolete option.

(cherry picked from commit 9c3a8fe)
Use macros to open and close the inline namespace _V2 that is used for
ABI versioning of individual components such as chrono::system_clock.

This allows the namespace to be hidden in the docs generated by Doxygen,
so that we document std::foo instead of std::_V2::foo.

This also makes it easy to remove that namespace entirely for the
gnu-versioned-namespace build, where everything is already versioned as
std::__8 and there are no backwards compatibility guarantees.

libstdc++-v3/ChangeLog:

	* doc/doxygen/user.cfg.in (PREDEFINED): Expand new macros to
	nothing.
	* include/bits/c++config (_GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE)
	(_GLIBCXX_END_INLINE_ABI_NAMESPACE): Define new macros.
	* include/bits/algorithmfwd.h (_V2::__rotate): Use new macros
	for the namespace.
	* include/bits/chrono.h (chrono::_V2::system_clock): Likewise.
	* include/bits/stl_algo.h (_V2::__rotate): Likewise.
	* include/std/condition_variable (_V2::condition_variable_any):
	Likewise.
	* include/std/system_error (_V2::error_category): Likewise.

(cherry picked from commit e4905f1)
The src/c++11/compatibility*-c++0x.cc files define symbols that need to
be exported for ancient versions of libstdc++.so.6 due to changes
between C++0x and the final C++11 standard. Those symbols are not needed
in the libstdc++.so.8 library, and we can skip building them entirely.

This also fixes the build failure I introduced last week when making the
versioned namespace config not use the _V2 namespace for compat symbols.

libstdc++-v3/ChangeLog:

	* src/Makefile.am [ENABLE_SYMVERS_GNU_NAMESPACE] (cxx11_sources):
	Do not build the compatibility*-c++0x.cc objects.
	* src/Makefile.in: Regenerate.
	* src/c++11/compatibility-c++0x.cc [_GLIBCXX_INLINE_VERSION]:
	Refuse to build for the versioned namespace.
	* src/c++11/compatibility-chrono.cc: Likewise.
	* src/c++11/compatibility-condvar.cc: Likewise.
	* src/c++11/compatibility-thread-c++0x.cc: Likewise.
	* src/c++11/chrono.cc (system_clock, steady_clock):
	Use macros to define in inline namespace _V2, matching the
	declarations in <system_error>.
	* src/c++11/system_error.cc (system_category, generic_category):
	Likewise.

(cherry picked from commit 357d6fc)
libstdc++-v3/ChangeLog:

	* include/std/atomic: Suppress doxygen docs for
	implementation details.
	* include/bits/atomic_base.h: Likewise.
	* include/bits/shared_ptr_atomic.h: Use markdown. Fix grouping
	so that std::atomic is not added to the pointer abstractions
	group.

(cherry picked from commit 1566ca0)
Add @headerfile and @SInCE tags. Improve grouping of non-member
functions via @relates tags.

Mark the std::pair base class of std::sub_match as undocumented, so that
the docs don't show all the related non-member functions are part of the
sub_match API. Use a new macro to re-add the data members for Doxygen
only.

libstdc++-v3/ChangeLog:

	* doc/doxygen/user.cfg.in (PREDEFINED): Define macro
	_GLIBCXX_DOXYGEN_ONLY to expand its argument.
	* include/bits/c++config (_GLIBCXX_DOXYGEN_ONLY): Define.
	* include/bits/regex.h: Improve doxygen docs.
	* include/bits/regex_constants.h: Likewise.
	* include/bits/regex_error.h: Likewise.

(cherry picked from commit 1b01963)
libstdc++-v3/ChangeLog:

	* doc/doxygen/user.cfg.in (PREDEFINED): Define __allocator_base
	so that Doxygen shows the right base-class for std::allocator.
	* include/bits/alloc_traits.h: Improve doxygen docs.
	* include/bits/allocator.h: Likewise.
	* include/bits/new_allocator.h: Likewise.
	* include/ext/new_allocator.h: Likewise.

(cherry picked from commit 171f41f)
libstdc++-v3/ChangeLog:

	* include/bits/ostream_insert.h: Mark helper functions as
	undocumented by Doxygen.
	* include/bits/stl_algo.h: Use markdown for formatting and mark
	helper functions as undocumented.
	* include/bits/stl_numeric.h:  Likewise.
	* include/bits/stl_pair.h (pair): Add @headerfile.

(cherry picked from commit e614925)
libstdc++-v3/ChangeLog:

	* doc/doxygen/user.cfg.in (PREDEFINED): Define
	_GLIBCXX23_CONSTEXPR macro.
	* include/backward/auto_ptr.h (auto_ptr): Use @deprecated.
	* include/bits/unique_ptr.h (default_delete): Use @SInCE and
	@headerfile.
	* include/std/scoped_allocator: Remove @InGroup from @file
	block.

(cherry picked from commit a278402)
libstdc++-v3/ChangeLog:

	* doc/doxygen/user.cfg.in (PREDEFINED): Define
	_GTHREAD_USE_MUTEX_TIMEDLOCK macro.
	* include/bits/std_mutex.h (mutex, lock_guard): Use @SInCE and
	@headerfile.
	* include/bits/unique_lock.h (unique_lock): Likewise.
	* include/std/mutex (recursive_mutex, timed_mutex)
	(recursive_timed_mutex, scoped_lock): Likewise.

(cherry picked from commit b584cbd)
Fix some problems noticed with -Wsystem-headers.

libstdc++-v3/ChangeLog:

	* include/bits/stl_tempbuf.h (_Temporary_buffer): Disable
	warnings about get_temporary_buffer being deprecated.
	* include/ext/functional (mem_fun1, mem_fun1_ref): Disable
	warnings about mem_fun1_t, const_mem_fun1_t, mem_fun1_ref_t and
	const_mem_fun1_ref_t being deprecated.
	* include/std/spanstream (basic_spanbuf::setbuf): Add assertion
	and adjust to avoid narrowing warning.
	* libsupc++/exception_ptr.h [!__cpp_rtti && !__cpp_exceptions]
	(make_exception_ptr): Add missing inline specifier.

(cherry picked from commit 8f6d25f)
libstdc++-v3/ChangeLog:

	* testsuite/20_util/headers/memory/synopsis.cc: Add declarations
	from C++11 and later.

(cherry picked from commit 980aa91)
libstdc++-v3/ChangeLog:

	* testsuite/18_support/new_nothrow.cc: Add missing noexcept
	to operator delete replacements.
	* testsuite/20_util/any/cons/92156.cc: Disable
	-Winit-list-lifetime warnings from instantiating invalid
	specialization of manager function.
	* testsuite/20_util/any/modifiers/92156.cc: Likewise.
	* testsuite/20_util/default_delete/void_neg.cc: Prune additional
	diagnostics.
	* testsuite/20_util/headers/memory/synopsis.cc: Add missing
	noexcept.
	* testsuite/20_util/shared_ptr/cons/void_neg.cc: Prune
	additional diagnostic.
	* testsuite/20_util/unique_ptr/creation/for_overwrite.cc: Add
	missing noexcept to operator delete replacements.
	* testsuite/21_strings/basic_string/cons/char/103919.cc:
	Likewise.
	* testsuite/23_containers/map/modifiers/emplace/92300.cc:
	Likewise.
	* testsuite/23_containers/map/modifiers/insert/92300.cc:
	Likewise.
	* testsuite/24_iterators/headers/iterator/range_access_c++11.cc:
	Add missing noexcept to synopsis declarations.
	* testsuite/24_iterators/headers/iterator/range_access_c++14.cc:
	Likewise.
	* testsuite/24_iterators/headers/iterator/range_access_c++17.cc:
	Likewise.

(cherry picked from commit bbcb84b)
libstdc++-v3/ChangeLog:

	* include/std/tuple: Add better Doxygen comments.

(cherry picked from commit 94f7baf)
jicama and others added 30 commits May 24, 2024 09:26
We were failing to handle ANNOTATE_EXPR in tsubst_copy_and_build, leading to
problems with substitution of any wrapped expressions.

Let's also not tell users that lambda templates are available in C++14.

	PR c++/111529

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_lambda_declarator_opt): Don't suggest
	-std=c++14 for lambda templates.
	* pt.cc (tsubst_expr): Move ANNOTATE_EXPR handling...
	(tsubst_copy_and_build): ...here.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/unroll-4.C: New test.

(cherry picked from commit 9c62af1)
r12-10468-g19827831516023 added the ANNOTATE_EXPR in the wrong place,
leading to ICEs on several testcases.

gcc/cp/ChangeLog:

	* pt.cc (tsubst_copy_and_build): Move ANNOTATE_EXPR out of
	fallthrough path.
The requirement that a type argument be complete is excessive in the case of
direct reference binding to the same type, which does not rely on any
properties of the type.  This is LWG 2939.

	PR c++/100667

gcc/cp/ChangeLog:

	* semantics.cc (same_type_ref_bind_p): New.
	(finish_trait_expr): Use it.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/is_constructible8.C: New test.

(cherry picked from commit 8bb3ef3)
This is a manual backport of r14-9840-g1162861439fd3c from master.
Manual because the bits and value range representation in jump
functions have changes during the gcc 14 development cycle.

In PR 113907 comment #58, Honza found a case where ICF thinks bodies
of functions are equivalent but becaise of difference in aliases in a
memory access, different aggregate jump functions are associated with
supposedly equivalent call statements.  This patch adds a way to
compare jump functions and plugs it into ICF to avoid the issue.

gcc/ChangeLog:

2024-05-14  Martin Jambor  <mjambor@suse.cz>

	PR ipa/113907
	* ipa-prop.h (ipa_jump_functions_equivalent_p): Declare.
	(values_equal_for_ipcp_p): Likewise.
	* ipa-prop.cc (ipa_agg_pass_through_jf_equivalent_p): New function.
	(ipa_agg_jump_functions_equivalent_p): Likewise.
	(ipa_jump_functions_equivalent_p): Likewise.
	* ipa-cp.cc (values_equal_for_ipcp_p): Make function public.
	* ipa-icf-gimple.cc: Include alloc-pool.h, symbol-summary.h, sreal.h,
	ipa-cp.h and ipa-prop.h.
	(func_checker::compare_gimple_call): Comapre jump functions.

gcc/testsuite/ChangeLog:

2024-05-10  Martin Jambor  <mjambor@suse.cz>

	PR ipa/113907
	* gcc.dg/lto/pr113907_0.c: New.
	* gcc.dg/lto/pr113907_1.c: Likewise.
	* gcc.dg/lto/pr113907_2.c: Likewise.

(cherry picked from commit 1db45e8)
	PR fortran/115150

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
	for zero-size arrays

gcc/testsuite/ChangeLog:

	* gfortran.dg/shape_12.f90: New test.

(cherry picked from commit b701306)
…tization [PR115172]

The following testcase is miscompiled, because -fsanitize=bool,enum
creates a MEM_REF without propagating there address space qualifiers,
so what should be normally loaded using say %gs:/%fs: segment prefix
isn't.  Together with asan it then causes that load to be sanitized.

2024-05-22  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/115172
	* ubsan.cc (instrument_bool_enum_load): If rhs is not in generic
	address space, use qualified version of utype with the right
	address space.  Formatting fix.

	* gcc.dg/asan/pr115172.c: New test.

(cherry picked from commit d3c506e)
PR Target/84790.
The gp init sequence
        li      $2,%hi(_gp_disp)
        addiu   $3,$pc,%lo(_gp_disp)
        sll     $2,16
        addu    $2,$3
is generated directly in `mips_output_function_prologue`, and does
not appear in the RTL.

So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
so they may be used for cross (local) function call.

Let's mark $2/$3 clobber both:
  - Just after the UNSPEC_GP RTL of a function;
  - Just after a function call.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.

gcc
	* config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
	MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
	(mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
	MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.

(cherry picked from commit 915440e)
Link to the docs for GCC trunk instead. For the release branches, the
link should be to the docs for appropriate release branch.

Also replace the incomplete/outdated list of explicit -std options with
a single entry for the -std option.

libstdc++-v3/ChangeLog:

	PR libstdc++/115269
	* doc/xml/manual/using.xml: Replace link to gcc-4.3.2 docs.
	Replace list of -std=... options with a single entry for -std.
	* doc/html/manual/using.html: Regenerate.

(cherry picked from commit b460ede)
any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
        (sign_extend:DI (match_operator:SI 3 "divmod_operator"
                        [(match_operand:DI 1 "register_operand" "a")
                         (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

	PR target/115297

gcc/ChangeLog:

	* config/alpha/alpha.md (<any_divmod:code>si3): Wrap DImode
	operands 3 and 4 with truncate:SI RTX.
	(*divmodsi_internal_er): Ditto for operands 1 and 2.
	(*divmodsi_internal_er_1): Ditto.
	(*divmodsi_internal): Ditto.
	* config/alpha/constraints.md ("b"): Correct register
	number in the description.

gcc/testsuite/ChangeLog:

	* gcc.target/alpha/pr115297.c: New test.

(cherry picked from commit 0ac8020)
create_intersect_range_checks checks whether two access ranges
a and b are alias-free using something equivalent to:

  end_a <= start_b || end_b <= start_a

It has two ways of doing this: a "vanilla" way that calculates
the exact exclusive end pointers, and another way that uses the
last inclusive aligned pointers (and changes the comparisons
accordingly).  The comment for the latter is:

      /* Calculate the minimum alignment shared by all four pointers,
	 then arrange for this alignment to be subtracted from the
	 exclusive maximum values to get inclusive maximum values.
	 This "- min_align" is cumulative with a "+ access_size"
	 in the calculation of the maximum values.  In the best
	 (and common) case, the two cancel each other out, leaving
	 us with an inclusive bound based only on seg_len.  In the
	 worst case we're simply adding a smaller number than before.

The problem is that the associated code implicitly assumed that the
access size was a multiple of the pointer alignment, and so the
alignment could be carried over to the exclusive end pointer.

The testcase started failing after g:9fa5b473b5b8e289b6542
because that commit improved the alignment information for
the accesses.

gcc/
	PR tree-optimization/115192
	* tree-data-ref.cc (create_intersect_range_checks): Take the
	alignment of the access sizes into account.

gcc/testsuite/
	PR tree-optimization/115192
	* gcc.dg/vect/pr115192.c: New test.

(cherry picked from commit a0fe4fb)
This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts.  We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount.  But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.

We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.

The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges.  This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).

pr113281-1.c is the original testcase.  pr113281-[23].c failed
before the patch due to overly optimistic narrowing.  pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.

gcc/
	PR target/113281
	* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
	workaround for right shifts.
	(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
	(vect_determine_precisions_from_range): Be more selective about
	which codes can be narrowed based on their input and output ranges.
	For shifts, require at least one more bit of precision than the
	maximum shift amount.

gcc/testsuite/
	PR target/113281
	* gcc.dg/vect/pr113281-1.c: New test.
	* gcc.dg/vect/pr113281-2.c: Likewise.
	* gcc.dg/vect/pr113281-3.c: Likewise.
	* gcc.dg/vect/pr113281-4.c: Likewise.
	* gcc.dg/vect/pr113281-5.c: Likewise.

(cherry picked from commit 1a8261e)
For the testcase in PR113910 we spend a lot of time in PTA comparing
bitmaps for looking up equivalence class members.  This points to
the very weak bitmap_hash function which effectively hashes set
and a subset of not set bits.

The major problem with it is that it simply truncates the
BITMAP_WORD sized intermediate hash to hashval_t which is
unsigned int, effectively not hashing half of the bits.

This reduces the compile-time for the testcase from tens of minutes
to 42 seconds and PTA time from 99% to 46%.

	PR tree-optimization/113910
	* bitmap.cc (bitmap_hash): Mix the full element "hash" to
	the hashval_t hash.

(cherry picked from commit ad7a365)
…uctions

The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements.  But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

	PR tree-optimization/110381
	* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
	Materialize permutes before fold-left reductions.

	* gcc.dg/vect/pr110381.c: New testcase.

(cherry picked from commit 53d6f57)
The following fixes a stray TYPE_ALIAS_SET in a type variant built
by build_opaque_vector_type which is diagnosed by type checking
enabled with -flto.

	PR middle-end/112732
	* tree.cc (build_opaque_vector_type): Reset TYPE_ALIAS_SET
	of the newly built type.

(cherry picked from commit f26d68d)
This testcase was fixed by r14-5934-gf26d68d5d128c8 but we should add
one to make sure it does not regress again.

Committed as obvious after a quick test on the testcase.

	PR c++/97990

gcc/testsuite/ChangeLog:

	* g++.dg/torture/vector-struct-1.C: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 5f1438d)
this patch disables use of FMA in matrix multiplication loop for generic (for
x86-64-v3) and zen4.  I tested this on zen4 and Xenon Gold Gold 6212U.

For Intel this is neutral both on the matrix multiplication microbenchmark
(attached) and spec2k17 where the difference was within noise for Core.

On core the micro-benchmark runs as follows:

With FMA:

       578,500,241      cycles:u                         #    3.645 GHz
                ( +-  0.12% )
       753,318,477      instructions:u                   #    1.30  insn per
cycle              ( +-  0.00% )
       125,417,701      branches:u                       #  790.227 M/sec
                ( +-  0.00% )
          0.159146 +- 0.000363 seconds time elapsed  ( +-  0.23% )

No FMA:

       577,573,960      cycles:u                         #    3.514 GHz
                ( +-  0.15% )
       878,318,479      instructions:u                   #    1.52  insn per
cycle              ( +-  0.00% )
       125,417,702      branches:u                       #  763.035 M/sec
                ( +-  0.00% )
          0.164734 +- 0.000321 seconds time elapsed  ( +-  0.19% )

So the cycle count is unchanged and discrete multiply+add takes same time as
FMA.

While on zen:

With FMA:
         484875179      cycles:u                         #    3.599 GHz
             ( +-  0.05% )  (82.11%)
         752031517      instructions:u                   #    1.55  insn per
cycle
         125106525      branches:u                       #  928.712 M/sec
             ( +-  0.03% )  (85.09%)
            128356      branch-misses:u                  #    0.10% of all
branches          ( +-  0.06% )  (83.58%)

No FMA:
         375875209      cycles:u                         #    3.592 GHz
             ( +-  0.08% )  (80.74%)
         875725341      instructions:u                   #    2.33  insn per
cycle
         124903825      branches:u                       #    1.194 G/sec
             ( +-  0.04% )  (84.59%)
          0.105203 +- 0.000188 seconds time elapsed  ( +-  0.18% )

The diffrerence is that Cores understand the fact that fmadd does not need
all three parameters to start computation, while Zen cores doesn't.

Since this seems noticeable win on zen and not loss on Core it seems like good
default for generic.

float a[SIZE][SIZE];
float b[SIZE][SIZE];
float c[SIZE][SIZE];

void init(void)
{
   int i, j, k;
   for(i=0; i<SIZE; ++i)
   {
      for(j=0; j<SIZE; ++j)
      {
         a[i][j] = (float)i + j;
         b[i][j] = (float)i - j;
         c[i][j] = 0.0f;
      }
   }
}

void mult(void)
{
   int i, j, k;

   for(i=0; i<SIZE; ++i)
   {
      for(j=0; j<SIZE; ++j)
      {
         for(k=0; k<SIZE; ++k)
         {
            c[i][j] += a[i][k] * b[k][j];
         }
      }
   }
}

int main(void)
{
   clock_t s, e;

   init();
   s=clock();
   mult();
   e=clock();
   printf("        mult took %10d clocks\n", (int)(e-s));

   return 0;

}

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS,
	X86_TUNE_AVOID_256FMA_CHAINS): Enable for znver4 and Core.

(cherry picked from commit 467cc39)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet