Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC make the partial scalars VecElements #563

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KristofferC
Copy link
Collaborator

@KristofferC KristofferC commented Nov 22, 2021

Only implemented enough so that the benchmark in #555 can be tested. Putting it up here in case people want to play with it.

Results of the benchmark in #555:

Branch:

julia> include("fdiffbench.jl")
  0.818333 seconds (5.35 M allocations: 261.957 MiB, 14.27% gc time, 99.99% compilation time)
  810.158 ns (0 allocations: 0 bytes)

PR:

julia> include("fdiffbench.jl")
  0.622877 seconds (4.19 M allocations: 216.628 MiB, 7.33% gc time, 99.99% compilation time)
  723.782 ns (0 allocations: 0 bytes)

The number of LLVM instructions after optimization didn't really seem to change. Compile-time seems to improve quite a bit though, likely due to more compact LLVM IR pre-optimization.

This is a bit annoying since VecElement behave differently from numbers in some ways:

julia> VecElement{2.0} == VecElement{2}
false

julia> 2.0 == 2
true

@chriselrod
Copy link
Contributor

chriselrod commented Nov 22, 2021

The number of LLVM instructions in the end didn't really seem to change.

The llvm for rhs! seems to have changed a lot, but it isn't much shorter (435 vs 425 lines):

Master
define void @"julia_rhs!_4840"({}* nonnull align 16 dereferenceable(40) %0, {}* nonnull align 16 dereferenceable(40) %1, double %2) {
top:
  %3 = bitcast {}* %1 to { double, [1 x [8 x double]] }**
  %4 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %3, align 8
  %.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 0, i32 0
  %.unpack = load double, double* %.elt, align 8
  %.unpack1661.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 0, i32 1, i64 0, i64 0
  %5 = bitcast double* %.unpack1661.unpack.elt to <8 x double>*
  %6 = load <8 x double>, <8 x double>* %5, align 8
  %7 = fmul double %.unpack, 3.500000e-01
  %8 = fmul <8 x double> %6, <double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01>
  %.elt1678 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 1, i32 0
  %.unpack1679 = load double, double* %.elt1678, align 8
  %.unpack1681.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 1, i32 1, i64 0, i64 0
  %9 = bitcast double* %.unpack1681.unpack.elt to <8 x double>*
  %10 = load <8 x double>, <8 x double>* %9, align 8
  %.elt1698 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 3, i32 0
  %.unpack1699 = load double, double* %.elt1698, align 8
  %.unpack1701.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 3, i32 1, i64 0, i64 0
  %11 = bitcast double* %.unpack1701.unpack.elt to <8 x double>*
  %12 = load <8 x double>, <8 x double>* %11, align 8
  %13 = fmul double %.unpack1679, 2.660000e+01
  %14 = fmul <8 x double> %10, <double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01>
  %15 = fmul double %13, %.unpack1699
  %16 = insertelement <8 x double> undef, double %13, i32 0
  %res.i1659 = shufflevector <8 x double> %16, <8 x double> undef, <8 x i32> zeroinitializer
  %17 = fmul <8 x double> %res.i1659, %12
  %18 = insertelement <8 x double> undef, double %.unpack1699, i32 0
  %res.i1658 = shufflevector <8 x double> %18, <8 x double> undef, <8 x i32> zeroinitializer
  %19 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1658, <8 x double> %14, <8 x double> %17)
  %.elt1718 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 4, i32 0
  %.unpack1719 = load double, double* %.elt1718, align 8
  %.unpack1721.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 4, i32 1, i64 0, i64 0
  %20 = bitcast double* %.unpack1721.unpack.elt to <8 x double>*
  %21 = load <8 x double>, <8 x double>* %20, align 8
  %22 = fmul double %.unpack1719, 1.230000e+04
  %23 = fmul <8 x double> %21, <double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04>
  %24 = fmul double %22, %.unpack1679
  %25 = insertelement <8 x double> undef, double %22, i32 0
  %res.i1657 = shufflevector <8 x double> %25, <8 x double> undef, <8 x i32> zeroinitializer
  %26 = fmul <8 x double> %res.i1657, %10
  %27 = insertelement <8 x double> undef, double %.unpack1679, i32 0
  %res.i1656 = shufflevector <8 x double> %27, <8 x double> undef, <8 x i32> zeroinitializer
  %28 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1656, <8 x double> %23, <8 x double> %26)
  %.elt1758 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 6, i32 0
  %.unpack1759 = load double, double* %.elt1758, align 8
  %.unpack1761.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 6, i32 1, i64 0, i64 0
  %29 = bitcast double* %.unpack1761.unpack.elt to <8 x double>*
  %30 = load <8 x double>, <8 x double>* %29, align 8
  %31 = fmul double %.unpack1759, 0x3F4C2E33EFF19503
  %32 = fmul <8 x double> %30, <double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503>
  %33 = fmul double %.unpack1759, 0x3F4ADEA897635E74
  %34 = fmul <8 x double> %30, <double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74>
  %.elt1818 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 5, i32 0
  %.unpack1819 = load double, double* %.elt1818, align 8
  %.unpack1821.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 5, i32 1, i64 0, i64 0
  %35 = bitcast double* %.unpack1821.unpack.elt to <8 x double>*
  %36 = load <8 x double>, <8 x double>* %35, align 8
  %37 = fmul double %.unpack1759, 1.500000e+04
  %38 = fmul <8 x double> %30, <double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04>
  %39 = fmul double %37, %.unpack1819
  %40 = insertelement <8 x double> undef, double %37, i32 0
  %res.i1655 = shufflevector <8 x double> %40, <8 x double> undef, <8 x i32> zeroinitializer
  %41 = fmul <8 x double> %res.i1655, %36
  %42 = insertelement <8 x double> undef, double %.unpack1819, i32 0
  %res.i1654 = shufflevector <8 x double> %42, <8 x double> undef, <8 x i32> zeroinitializer
  %43 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1654, <8 x double> %38, <8 x double> %41)
  %.elt1838 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 8, i32 0
  %.unpack1839 = load double, double* %.elt1838, align 8
  %.unpack1841.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 8, i32 1, i64 0, i64 0
  %44 = bitcast double* %.unpack1841.unpack.elt to <8 x double>*
  %45 = load <8 x double>, <8 x double>* %44, align 8
  %46 = fmul double %.unpack1839, 1.300000e-04
  %47 = fmul <8 x double> %45, <double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04>
  %48 = fmul double %.unpack1839, 2.400000e+04
  %49 = fmul <8 x double> %45, <double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04>
  %50 = fmul double %48, %.unpack1819
  %51 = insertelement <8 x double> undef, double %48, i32 0
  %res.i1653 = shufflevector <8 x double> %51, <8 x double> undef, <8 x i32> zeroinitializer
  %52 = fmul <8 x double> %res.i1653, %36
  %53 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1654, <8 x double> %49, <8 x double> %52)
  %.elt1898 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 10, i32 0
  %.unpack1899 = load double, double* %.elt1898, align 8
  %.unpack1901.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 10, i32 1, i64 0, i64 0
  %54 = bitcast double* %.unpack1901.unpack.elt to <8 x double>*
  %55 = load <8 x double>, <8 x double>* %54, align 8
  %56 = fmul double %.unpack1899, 1.650000e+04
  %57 = fmul <8 x double> %55, <double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04>
  %58 = fmul double %56, %.unpack1679
  %59 = insertelement <8 x double> undef, double %56, i32 0
  %res.i1651 = shufflevector <8 x double> %59, <8 x double> undef, <8 x i32> zeroinitializer
  %60 = fmul <8 x double> %res.i1651, %10
  %61 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1656, <8 x double> %57, <8 x double> %60)
  %62 = fmul double %.unpack1899, 9.000000e+03
  %63 = fmul <8 x double> %55, <double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03>
  %64 = fmul double %62, %.unpack
  %65 = insertelement <8 x double> undef, double %62, i32 0
  %res.i1649 = shufflevector <8 x double> %65, <8 x double> undef, <8 x i32> zeroinitializer
  %66 = fmul <8 x double> %res.i1649, %6
  %67 = insertelement <8 x double> undef, double %.unpack, i32 0
  %res.i1648 = shufflevector <8 x double> %67, <8 x double> undef, <8 x i32> zeroinitializer
  %68 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1648, <8 x double> %63, <8 x double> %66)
  %.elt1978 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 12, i32 0
  %.unpack1979 = load double, double* %.elt1978, align 8
  %.unpack1981.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 12, i32 1, i64 0, i64 0
  %69 = bitcast double* %.unpack1981.unpack.elt to <8 x double>*
  %70 = load <8 x double>, <8 x double>* %69, align 8
  %71 = fmul double %.unpack1979, 2.200000e-02
  %72 = fmul <8 x double> %70, <double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02>
  %.elt1998 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 9, i32 0
  %.unpack1999 = load double, double* %.elt1998, align 8
  %.unpack2001.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 9, i32 1, i64 0, i64 0
  %73 = bitcast double* %.unpack2001.unpack.elt to <8 x double>*
  %74 = load <8 x double>, <8 x double>* %73, align 8
  %75 = fmul double %.unpack1999, 1.200000e+04
  %76 = fmul <8 x double> %74, <double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04>
  %77 = fmul double %75, %.unpack1679
  %78 = insertelement <8 x double> undef, double %75, i32 0
  %res.i1647 = shufflevector <8 x double> %78, <8 x double> undef, <8 x i32> zeroinitializer
  %79 = fmul <8 x double> %res.i1647, %10
  %80 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1656, <8 x double> %76, <8 x double> %79)
  %.elt2038 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 13, i32 0
  %.unpack2039 = load double, double* %.elt2038, align 8
  %.unpack2041.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 13, i32 1, i64 0, i64 0
  %81 = bitcast double* %.unpack2041.unpack.elt to <8 x double>*
  %82 = load <8 x double>, <8 x double>* %81, align 8
  %83 = fmul double %.unpack2039, 1.880000e+00
  %84 = fmul <8 x double> %82, <double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00>
  %85 = fmul double %.unpack, 1.630000e+04
  %86 = fmul <8 x double> %6, <double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04>
  %87 = fmul double %85, %.unpack1819
  %88 = insertelement <8 x double> undef, double %85, i32 0
  %res.i1645 = shufflevector <8 x double> %88, <8 x double> undef, <8 x i32> zeroinitializer
  %89 = fmul <8 x double> %res.i1645, %36
  %90 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1654, <8 x double> %86, <8 x double> %89)
  %.elt2098 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 2, i32 0
  %.unpack2099 = load double, double* %.elt2098, align 8
  %.unpack2101.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 2, i32 1, i64 0, i64 0
  %91 = bitcast double* %.unpack2101.unpack.elt to <8 x double>*
  %92 = load <8 x double>, <8 x double>* %91, align 8
  %93 = fmul double %.unpack2099, 4.800000e+06
  %94 = fmul <8 x double> %92, <double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06>
  %95 = fmul double %.unpack1699, 3.500000e-04
  %96 = fmul <8 x double> %12, <double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04>
  %97 = fmul double %.unpack1699, 1.750000e-02
  %98 = fmul <8 x double> %12, <double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02>
  %.elt2158 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 15, i32 0
  %.unpack2159 = load double, double* %.elt2158, align 8
  %.unpack2161.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 15, i32 1, i64 0, i64 0
  %99 = bitcast double* %.unpack2161.unpack.elt to <8 x double>*
  %100 = load <8 x double>, <8 x double>* %99, align 8
  %101 = fmul double %.unpack2159, 1.000000e+08
  %102 = fmul <8 x double> %100, <double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08>
  %103 = fmul double %.unpack2159, 4.440000e+11
  %104 = fmul <8 x double> %100, <double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11>
  %.elt2198 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 16, i32 0
  %.unpack2199 = load double, double* %.elt2198, align 8
  %.unpack2201.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 16, i32 1, i64 0, i64 0
  %105 = bitcast double* %.unpack2201.unpack.elt to <8 x double>*
  %106 = load <8 x double>, <8 x double>* %105, align 8
  %107 = fmul double %.unpack2199, 1.240000e+03
  %108 = fmul <8 x double> %106, <double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03>
  %109 = fmul double %107, %.unpack1819
  %110 = insertelement <8 x double> undef, double %107, i32 0
  %res.i1643 = shufflevector <8 x double> %110, <8 x double> undef, <8 x i32> zeroinitializer
  %111 = fmul <8 x double> %res.i1643, %36
  %112 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1654, <8 x double> %108, <8 x double> %111)
  %.elt2238 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 18, i32 0
  %.unpack2239 = load double, double* %.elt2238, align 8
  %.unpack2241.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 18, i32 1, i64 0, i64 0
  %113 = bitcast double* %.unpack2241.unpack.elt to <8 x double>*
  %114 = load <8 x double>, <8 x double>* %113, align 8
  %115 = fmul double %.unpack2239, 2.100000e+00
  %116 = fmul <8 x double> %114, <double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00>
  %117 = fmul double %.unpack2239, 5.780000e+00
  %118 = fmul <8 x double> %114, <double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00>
  %119 = fmul double %.unpack, 4.740000e-02
  %120 = fmul <8 x double> %6, <double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02>
  %121 = fmul double %119, %.unpack1699
  %122 = insertelement <8 x double> undef, double %119, i32 0
  %res.i1641 = shufflevector <8 x double> %122, <8 x double> undef, <8 x i32> zeroinitializer
  %123 = fmul <8 x double> %res.i1641, %12
  %124 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1658, <8 x double> %120, <8 x double> %123)
  %125 = fmul double %.unpack2239, 1.780000e+03
  %126 = fmul <8 x double> %114, <double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03>
  %127 = fmul double %125, %.unpack
  %128 = insertelement <8 x double> undef, double %125, i32 0
  %res.i1639 = shufflevector <8 x double> %128, <8 x double> undef, <8 x i32> zeroinitializer
  %129 = fmul <8 x double> %res.i1639, %6
  %130 = call <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1648, <8 x double> %126, <8 x double> %129)
  %.elt2358 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 19, i32 0
  %.unpack2359 = load double, double* %.elt2358, align 8
  %.unpack2361.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 19, i32 1, i64 0, i64 0
  %131 = bitcast double* %.unpack2361.unpack.elt to <8 x double>*
  %132 = load <8 x double>, <8 x double>* %131, align 8
  %133 = fmul double %.unpack2359, 3.120000e+00
  %134 = fmul <8 x double> %132, <double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00>
  %135 = fneg double %7
  %136 = fneg <8 x double> %8
  %137 = fsub double %135, %64
  %138 = fsub <8 x double> %136, %68
  %139 = fsub double %137, %87
  %140 = fsub <8 x double> %138, %90
  %141 = fsub double %139, %121
  %142 = fsub <8 x double> %140, %124
  %143 = fsub double %141, %127
  %144 = fsub <8 x double> %142, %130
  %145 = fadd double %15, %143
  %146 = fadd <8 x double> %19, %144
  %147 = fadd double %24, %145
  %148 = fadd <8 x double> %28, %146
  %149 = fadd double %58, %147
  %150 = fadd <8 x double> %61, %148
  %151 = fadd double %71, %149
  %152 = fadd <8 x double> %72, %150
  %153 = fadd double %77, %151
  %154 = fadd <8 x double> %80, %152
  %155 = fadd double %117, %153
  %156 = fadd <8 x double> %118, %154
  %157 = fadd double %155, %133
  %158 = fadd <8 x double> %156, %134
  %159 = bitcast {}* %0 to { double, [1 x [8 x double]] }**
  %160 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %159, align 8
  %.repack = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 0, i32 0
  store double %157, double* %.repack, align 8
  %161 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 0, i32 1, i64 0
  %162 = bitcast [8 x double]* %161 to <8 x double>*
  store <8 x double> %158, <8 x double>* %162, align 8
  %163 = fneg double %15
  %164 = fneg <8 x double> %19
  %165 = fsub double %163, %24
  %166 = fsub <8 x double> %164, %28
  %167 = fsub double %165, %58
  %168 = fsub <8 x double> %166, %61
  %169 = fsub double %167, %77
  %170 = fsub <8 x double> %168, %80
  %171 = fadd double %7, %169
  %172 = fadd <8 x double> %8, %170
  %173 = fadd double %171, %115
  %174 = fadd <8 x double> %172, %116
  %.repack2380 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 1, i32 0
  store double %173, double* %.repack2380, align 8
  %175 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 1, i32 1, i64 0
  %176 = bitcast [8 x double]* %175 to <8 x double>*
  store <8 x double> %174, <8 x double>* %176, align 8
  %177 = fsub double %7, %93
  %178 = fsub <8 x double> %8, %94
  %179 = fadd double %177, %97
  %180 = fadd <8 x double> %178, %98
  %181 = fadd double %179, %103
  %182 = fadd <8 x double> %180, %104
  %183 = fadd double %181, %117
  %184 = fadd <8 x double> %182, %118
  %.repack2383 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 2, i32 0
  store double %183, double* %.repack2383, align 8
  %185 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 2, i32 1, i64 0
  %186 = bitcast [8 x double]* %185 to <8 x double>*
  store <8 x double> %184, <8 x double>* %186, align 8
  %187 = fsub double %163, %95
  %188 = fsub <8 x double> %164, %96
  %189 = fsub double %187, %97
  %190 = fsub <8 x double> %188, %98
  %191 = fsub double %189, %121
  %192 = fsub <8 x double> %190, %124
  %193 = fadd double %93, %191
  %194 = fadd <8 x double> %94, %192
  %.repack2386 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 3, i32 0
  store double %193, double* %.repack2386, align 8
  %195 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 3, i32 1, i64 0
  %196 = bitcast [8 x double]* %195 to <8 x double>*
  store <8 x double> %194, <8 x double>* %196, align 8
  %197 = fsub double %31, %24
  %198 = fsub <8 x double> %32, %28
  %199 = fadd double %31, %197
  %200 = fadd <8 x double> %32, %198
  %201 = fadd double %199, %39
  %202 = fadd <8 x double> %200, %43
  %203 = fadd double %201, %46
  %204 = fadd <8 x double> %202, %47
  %205 = fadd double %203, %83
  %206 = fadd <8 x double> %204, %84
  %207 = fadd double %205, %109
  %208 = fadd <8 x double> %206, %112
  %.repack2389 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 4, i32 0
  store double %207, double* %.repack2389, align 8
  %209 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 4, i32 1, i64 0
  %210 = bitcast [8 x double]* %209 to <8 x double>*
  store <8 x double> %208, <8 x double>* %210, align 8
  %211 = fneg double %39
  %212 = fneg <8 x double> %43
  %213 = fsub double %211, %50
  %214 = fsub <8 x double> %212, %53
  %215 = fsub double %213, %87
  %216 = fsub <8 x double> %214, %90
  %217 = fsub double %215, %109
  %218 = fsub <8 x double> %216, %112
  %219 = fadd double %24, %217
  %220 = fadd <8 x double> %28, %218
  %221 = fadd double %101, %219
  %222 = fadd <8 x double> %102, %220
  %223 = fadd double %101, %221
  %224 = fadd <8 x double> %102, %222
  %.repack2392 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 5, i32 0
  store double %223, double* %.repack2392, align 8
  %225 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 5, i32 1, i64 0
  %226 = bitcast [8 x double]* %225 to <8 x double>*
  store <8 x double> %224, <8 x double>* %226, align 8
  %227 = fneg double %31
  %228 = fneg <8 x double> %32
  %229 = fsub double %227, %33
  %230 = fsub <8 x double> %228, %34
  %231 = fsub double %229, %39
  %232 = fsub <8 x double> %230, %43
  %233 = fadd double %231, %83
  %234 = fadd <8 x double> %232, %84
  %.repack2395 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 6, i32 0
  store double %233, double* %.repack2395, align 8
  %235 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %160, i64 6, i32 1, i64 0
  %236 = bitcast [8 x double]* %235 to <8 x double>*
  store <8 x double> %234, <8 x double>* %236, align 8
  %237 = fadd double %31, %33
  %238 = fadd <8 x double> %32, %34
  %239 = fadd double %237, %39
  %240 = fadd <8 x double> %238, %43
  %241 = fadd double %239, %46
  %242 = fadd <8 x double> %240, %47
  %243 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %159, align 8
  %.repack2398 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 7, i32 0
  store double %241, double* %.repack2398, align 8
  %244 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 7, i32 1, i64 0
  %245 = bitcast [8 x double]* %244 to <8 x double>*
  store <8 x double> %242, <8 x double>* %245, align 8
  %246 = fneg double %46
  %247 = fneg <8 x double> %47
  %248 = fsub double %246, %50
  %249 = fsub <8 x double> %247, %53
  %.repack2401 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 8, i32 0
  store double %248, double* %.repack2401, align 8
  %250 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 8, i32 1, i64 0
  %251 = bitcast [8 x double]* %250 to <8 x double>*
  store <8 x double> %249, <8 x double>* %251, align 8
  %252 = fsub double %46, %77
  %253 = fsub <8 x double> %47, %80
  %254 = fadd double %58, %252
  %255 = fadd <8 x double> %61, %253
  %.repack2404 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 9, i32 0
  store double %254, double* %.repack2404, align 8
  %256 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 9, i32 1, i64 0
  %257 = bitcast [8 x double]* %256 to <8 x double>*
  store <8 x double> %255, <8 x double>* %257, align 8
  %258 = fneg double %58
  %259 = fneg <8 x double> %61
  %260 = fsub double %258, %64
  %261 = fsub <8 x double> %259, %68
  %262 = fadd double %50, %260
  %263 = fadd <8 x double> %53, %261
  %264 = fadd double %262, %71
  %265 = fadd <8 x double> %263, %72
  %.repack2407 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 10, i32 0
  store double %264, double* %.repack2407, align 8
  %266 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 10, i32 1, i64 0
  %267 = bitcast [8 x double]* %266 to <8 x double>*
  store <8 x double> %265, <8 x double>* %267, align 8
  %.repack2410 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 11, i32 0
  store double %58, double* %.repack2410, align 8
  %268 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 11, i32 1, i64 0
  %269 = bitcast [8 x double]* %268 to <8 x double>*
  store <8 x double> %61, <8 x double>* %269, align 8
  %270 = fsub double %64, %71
  %271 = fsub <8 x double> %68, %72
  %.repack2413 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 12, i32 0
  store double %270, double* %.repack2413, align 8
  %272 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 12, i32 1, i64 0
  %273 = bitcast [8 x double]* %272 to <8 x double>*
  store <8 x double> %271, <8 x double>* %273, align 8
  %274 = fsub double %77, %83
  %275 = fsub <8 x double> %80, %84
  %.repack2416 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 13, i32 0
  store double %274, double* %.repack2416, align 8
  %276 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 13, i32 1, i64 0
  %277 = bitcast [8 x double]* %276 to <8 x double>*
  store <8 x double> %275, <8 x double>* %277, align 8
  %.repack2419 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 14, i32 0
  store double %87, double* %.repack2419, align 8
  %278 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 14, i32 1, i64 0
  %279 = bitcast [8 x double]* %278 to <8 x double>*
  store <8 x double> %90, <8 x double>* %279, align 8
  %280 = fneg double %101
  %281 = fneg <8 x double> %102
  %282 = fsub double %280, %103
  %283 = fsub <8 x double> %281, %104
  %284 = fadd double %95, %282
  %285 = fadd <8 x double> %96, %283
  %.repack2422 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 15, i32 0
  store double %284, double* %.repack2422, align 8
  %286 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 15, i32 1, i64 0
  %287 = bitcast [8 x double]* %286 to <8 x double>*
  store <8 x double> %285, <8 x double>* %287, align 8
  %288 = fneg double %109
  %289 = fneg <8 x double> %112
  %.repack2425 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 16, i32 0
  store double %288, double* %.repack2425, align 8
  %290 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 16, i32 1, i64 0
  %291 = bitcast [8 x double]* %290 to <8 x double>*
  store <8 x double> %289, <8 x double>* %291, align 8
  %.repack2428 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 17, i32 0
  store double %109, double* %.repack2428, align 8
  %292 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 17, i32 1, i64 0
  %293 = bitcast [8 x double]* %292 to <8 x double>*
  store <8 x double> %112, <8 x double>* %293, align 8
  %294 = fneg double %115
  %295 = fneg <8 x double> %116
  %296 = fsub double %294, %117
  %297 = fsub <8 x double> %295, %118
  %298 = fsub double %296, %127
  %299 = fsub <8 x double> %297, %130
  %300 = fadd double %121, %298
  %301 = fadd <8 x double> %124, %299
  %302 = fadd double %300, %133
  %303 = fadd <8 x double> %301, %134
  %.repack2431 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 18, i32 0
  store double %302, double* %.repack2431, align 8
  %304 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %243, i64 18, i32 1, i64 0
  %305 = bitcast [8 x double]* %304 to <8 x double>*
  store <8 x double> %303, <8 x double>* %305, align 8
  %306 = fsub double %127, %133
  %307 = fsub <8 x double> %130, %134
  %308 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %159, align 8
  %.repack2434 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %308, i64 19, i32 0
  store double %306, double* %.repack2434, align 8
  %309 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %308, i64 19, i32 1, i64 0
  %310 = bitcast [8 x double]* %309 to <8 x double>*
  store <8 x double> %307, <8 x double>* %310, align 8
  ret void
}
This PR
define void @"julia_rhs!_2397"({}* nonnull align 16 dereferenceable(40) %0, {}* nonnull align 16 dereferenceable(40) %1, double %2) {
top:
  %3 = bitcast {}* %1 to { double, [1 x [8 x double]] }**
  %4 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %3, align 8
  %.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 0, i32 0
  %.unpack = load double, double* %.elt, align 8
  %.unpack1798.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 0, i32 1, i64 0, i64 0
  %5 = bitcast double* %.unpack1798.unpack.elt to <8 x double>*
  %6 = load <8 x double>, <8 x double>* %5, align 8
  %7 = fmul double %.unpack, 3.500000e-01
  %res.i = fmul nsz contract <8 x double> %6, <double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01, double 3.500000e-01>
  %.elt1815 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 1, i32 0
  %.unpack1816 = load double, double* %.elt1815, align 8
  %.unpack1818.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 1, i32 1, i64 0, i64 0
  %8 = bitcast double* %.unpack1818.unpack.elt to <8 x double>*
  %9 = load <8 x double>, <8 x double>* %8, align 8
  %.elt1835 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 3, i32 0
  %.unpack1836 = load double, double* %.elt1835, align 8
  %.unpack1838.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 3, i32 1, i64 0, i64 0
  %10 = bitcast double* %.unpack1838.unpack.elt to <8 x double>*
  %11 = load <8 x double>, <8 x double>* %10, align 8
  %12 = fmul double %.unpack1816, 2.660000e+01
  %res.i1796 = fmul nsz contract <8 x double> %9, <double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01, double 2.660000e+01>
  %13 = fmul double %12, %.unpack1836
  %el1.i1790 = insertelement <8 x double> undef, double %.unpack1836, i32 0
  %afactor.i1791 = shufflevector <8 x double> %el1.i1790, <8 x double> undef, <8 x i32> zeroinitializer
  %el2.i1792 = insertelement <8 x double> undef, double %12, i32 0
  %bfactor.i1793 = shufflevector <8 x double> %el2.i1792, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1794 = fmul nsz contract <8 x double> %bfactor.i1793, %11
  %res.i1795 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1796, <8 x double> %afactor.i1791, <8 x double> %tmp.i1794)
  %.elt1855 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 4, i32 0
  %.unpack1856 = load double, double* %.elt1855, align 8
  %.unpack1858.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 4, i32 1, i64 0, i64 0
  %14 = bitcast double* %.unpack1858.unpack.elt to <8 x double>*
  %15 = load <8 x double>, <8 x double>* %14, align 8
  %16 = fmul double %.unpack1856, 1.230000e+04
  %res.i1789 = fmul nsz contract <8 x double> %15, <double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04, double 1.230000e+04>
  %17 = fmul double %16, %.unpack1816
  %el1.i1783 = insertelement <8 x double> undef, double %.unpack1816, i32 0
  %afactor.i1784 = shufflevector <8 x double> %el1.i1783, <8 x double> undef, <8 x i32> zeroinitializer
  %el2.i1785 = insertelement <8 x double> undef, double %16, i32 0
  %bfactor.i1786 = shufflevector <8 x double> %el2.i1785, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1787 = fmul nsz contract <8 x double> %bfactor.i1786, %9
  %res.i1788 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1789, <8 x double> %afactor.i1784, <8 x double> %tmp.i1787)
  %.elt1895 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 6, i32 0
  %.unpack1896 = load double, double* %.elt1895, align 8
  %.unpack1898.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 6, i32 1, i64 0, i64 0
  %18 = bitcast double* %.unpack1898.unpack.elt to <8 x double>*
  %19 = load <8 x double>, <8 x double>* %18, align 8
  %20 = fmul double %.unpack1896, 0x3F4C2E33EFF19503
  %res.i1782 = fmul nsz contract <8 x double> %19, <double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503, double 0x3F4C2E33EFF19503>
  %21 = fmul double %.unpack1896, 0x3F4ADEA897635E74
  %res.i1781 = fmul nsz contract <8 x double> %19, <double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74, double 0x3F4ADEA897635E74>
  %.elt1955 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 5, i32 0
  %.unpack1956 = load double, double* %.elt1955, align 8
  %.unpack1958.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 5, i32 1, i64 0, i64 0
  %22 = bitcast double* %.unpack1958.unpack.elt to <8 x double>*
  %23 = load <8 x double>, <8 x double>* %22, align 8
  %24 = fmul double %.unpack1896, 1.500000e+04
  %res.i1780 = fmul nsz contract <8 x double> %19, <double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04, double 1.500000e+04>
  %25 = fmul double %24, %.unpack1956
  %el1.i1774 = insertelement <8 x double> undef, double %.unpack1956, i32 0
  %afactor.i1775 = shufflevector <8 x double> %el1.i1774, <8 x double> undef, <8 x i32> zeroinitializer
  %el2.i1776 = insertelement <8 x double> undef, double %24, i32 0
  %bfactor.i1777 = shufflevector <8 x double> %el2.i1776, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1778 = fmul nsz contract <8 x double> %bfactor.i1777, %23
  %res.i1779 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1780, <8 x double> %afactor.i1775, <8 x double> %tmp.i1778)
  %.elt1975 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 8, i32 0
  %.unpack1976 = load double, double* %.elt1975, align 8
  %.unpack1978.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 8, i32 1, i64 0, i64 0
  %26 = bitcast double* %.unpack1978.unpack.elt to <8 x double>*
  %27 = load <8 x double>, <8 x double>* %26, align 8
  %28 = fmul double %.unpack1976, 1.300000e-04
  %res.i1773 = fmul nsz contract <8 x double> %27, <double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04, double 1.300000e-04>
  %29 = fmul double %.unpack1976, 2.400000e+04
  %res.i1772 = fmul nsz contract <8 x double> %27, <double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04, double 2.400000e+04>
  %30 = fmul double %29, %.unpack1956
  %el2.i1768 = insertelement <8 x double> undef, double %29, i32 0
  %bfactor.i1769 = shufflevector <8 x double> %el2.i1768, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1770 = fmul nsz contract <8 x double> %bfactor.i1769, %23
  %res.i1771 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1772, <8 x double> %afactor.i1775, <8 x double> %tmp.i1770)
  %.elt2035 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 10, i32 0
  %.unpack2036 = load double, double* %.elt2035, align 8
  %.unpack2038.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 10, i32 1, i64 0, i64 0
  %31 = bitcast double* %.unpack2038.unpack.elt to <8 x double>*
  %32 = load <8 x double>, <8 x double>* %31, align 8
  %33 = fmul double %.unpack2036, 1.650000e+04
  %res.i1765 = fmul nsz contract <8 x double> %32, <double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04, double 1.650000e+04>
  %34 = fmul double %33, %.unpack1816
  %el2.i1761 = insertelement <8 x double> undef, double %33, i32 0
  %bfactor.i1762 = shufflevector <8 x double> %el2.i1761, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1763 = fmul nsz contract <8 x double> %bfactor.i1762, %9
  %res.i1764 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1765, <8 x double> %afactor.i1784, <8 x double> %tmp.i1763)
  %35 = fmul double %.unpack2036, 9.000000e+03
  %res.i1758 = fmul nsz contract <8 x double> %32, <double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03, double 9.000000e+03>
  %36 = fmul double %35, %.unpack
  %el1.i1752 = insertelement <8 x double> undef, double %.unpack, i32 0
  %afactor.i1753 = shufflevector <8 x double> %el1.i1752, <8 x double> undef, <8 x i32> zeroinitializer
  %el2.i1754 = insertelement <8 x double> undef, double %35, i32 0
  %bfactor.i1755 = shufflevector <8 x double> %el2.i1754, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1756 = fmul nsz contract <8 x double> %bfactor.i1755, %6
  %res.i1757 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1758, <8 x double> %afactor.i1753, <8 x double> %tmp.i1756)
  %.elt2115 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 12, i32 0
  %.unpack2116 = load double, double* %.elt2115, align 8
  %.unpack2118.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 12, i32 1, i64 0, i64 0
  %37 = bitcast double* %.unpack2118.unpack.elt to <8 x double>*
  %38 = load <8 x double>, <8 x double>* %37, align 8
  %39 = fmul double %.unpack2116, 2.200000e-02
  %res.i1751 = fmul nsz contract <8 x double> %38, <double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02, double 2.200000e-02>
  %.elt2135 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 9, i32 0
  %.unpack2136 = load double, double* %.elt2135, align 8
  %.unpack2138.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 9, i32 1, i64 0, i64 0
  %40 = bitcast double* %.unpack2138.unpack.elt to <8 x double>*
  %41 = load <8 x double>, <8 x double>* %40, align 8
  %42 = fmul double %.unpack2136, 1.200000e+04
  %res.i1750 = fmul nsz contract <8 x double> %41, <double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04, double 1.200000e+04>
  %43 = fmul double %42, %.unpack1816
  %el2.i1746 = insertelement <8 x double> undef, double %42, i32 0
  %bfactor.i1747 = shufflevector <8 x double> %el2.i1746, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1748 = fmul nsz contract <8 x double> %bfactor.i1747, %9
  %res.i1749 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1750, <8 x double> %afactor.i1784, <8 x double> %tmp.i1748)
  %.elt2175 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 13, i32 0
  %.unpack2176 = load double, double* %.elt2175, align 8
  %.unpack2178.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 13, i32 1, i64 0, i64 0
  %44 = bitcast double* %.unpack2178.unpack.elt to <8 x double>*
  %45 = load <8 x double>, <8 x double>* %44, align 8
  %46 = fmul double %.unpack2176, 1.880000e+00
  %res.i1743 = fmul nsz contract <8 x double> %45, <double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00, double 1.880000e+00>
  %47 = fmul double %.unpack, 1.630000e+04
  %res.i1742 = fmul nsz contract <8 x double> %6, <double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04, double 1.630000e+04>
  %48 = fmul double %47, %.unpack1956
  %el2.i1738 = insertelement <8 x double> undef, double %47, i32 0
  %bfactor.i1739 = shufflevector <8 x double> %el2.i1738, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1740 = fmul nsz contract <8 x double> %bfactor.i1739, %23
  %res.i1741 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1742, <8 x double> %afactor.i1775, <8 x double> %tmp.i1740)
  %.elt2235 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 2, i32 0
  %.unpack2236 = load double, double* %.elt2235, align 8
  %.unpack2238.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 2, i32 1, i64 0, i64 0
  %49 = bitcast double* %.unpack2238.unpack.elt to <8 x double>*
  %50 = load <8 x double>, <8 x double>* %49, align 8
  %51 = fmul double %.unpack2236, 4.800000e+06
  %res.i1735 = fmul nsz contract <8 x double> %50, <double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06, double 4.800000e+06>
  %52 = fmul double %.unpack1836, 3.500000e-04
  %res.i1734 = fmul nsz contract <8 x double> %11, <double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04, double 3.500000e-04>
  %53 = fmul double %.unpack1836, 1.750000e-02
  %res.i1733 = fmul nsz contract <8 x double> %11, <double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02, double 1.750000e-02>
  %.elt2295 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 15, i32 0
  %.unpack2296 = load double, double* %.elt2295, align 8
  %.unpack2298.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 15, i32 1, i64 0, i64 0
  %54 = bitcast double* %.unpack2298.unpack.elt to <8 x double>*
  %55 = load <8 x double>, <8 x double>* %54, align 8
  %56 = fmul double %.unpack2296, 1.000000e+08
  %res.i1732 = fmul nsz contract <8 x double> %55, <double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08, double 1.000000e+08>
  %57 = fmul double %.unpack2296, 4.440000e+11
  %res.i1731 = fmul nsz contract <8 x double> %55, <double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11, double 4.440000e+11>
  %.elt2335 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 16, i32 0
  %.unpack2336 = load double, double* %.elt2335, align 8
  %.unpack2338.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 16, i32 1, i64 0, i64 0
  %58 = bitcast double* %.unpack2338.unpack.elt to <8 x double>*
  %59 = load <8 x double>, <8 x double>* %58, align 8
  %60 = fmul double %.unpack2336, 1.240000e+03
  %res.i1730 = fmul nsz contract <8 x double> %59, <double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03, double 1.240000e+03>
  %61 = fmul double %60, %.unpack1956
  %el2.i1726 = insertelement <8 x double> undef, double %60, i32 0
  %bfactor.i1727 = shufflevector <8 x double> %el2.i1726, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1728 = fmul nsz contract <8 x double> %bfactor.i1727, %23
  %res.i1729 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1730, <8 x double> %afactor.i1775, <8 x double> %tmp.i1728)
  %.elt2375 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 18, i32 0
  %.unpack2376 = load double, double* %.elt2375, align 8
  %.unpack2378.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 18, i32 1, i64 0, i64 0
  %62 = bitcast double* %.unpack2378.unpack.elt to <8 x double>*
  %63 = load <8 x double>, <8 x double>* %62, align 8
  %64 = fmul double %.unpack2376, 2.100000e+00
  %res.i1723 = fmul nsz contract <8 x double> %63, <double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00, double 2.100000e+00>
  %65 = fmul double %.unpack2376, 5.780000e+00
  %res.i1722 = fmul nsz contract <8 x double> %63, <double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00, double 5.780000e+00>
  %66 = fmul double %.unpack, 4.740000e-02
  %res.i1721 = fmul nsz contract <8 x double> %6, <double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02, double 4.740000e-02>
  %67 = fmul double %66, %.unpack1836
  %el2.i1717 = insertelement <8 x double> undef, double %66, i32 0
  %bfactor.i1718 = shufflevector <8 x double> %el2.i1717, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i1719 = fmul nsz contract <8 x double> %bfactor.i1718, %11
  %res.i1720 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1721, <8 x double> %afactor.i1791, <8 x double> %tmp.i1719)
  %68 = fmul double %.unpack2376, 1.780000e+03
  %res.i1714 = fmul nsz contract <8 x double> %63, <double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03, double 1.780000e+03>
  %69 = fmul double %68, %.unpack
  %el2.i = insertelement <8 x double> undef, double %68, i32 0
  %bfactor.i = shufflevector <8 x double> %el2.i, <8 x double> undef, <8 x i32> zeroinitializer
  %tmp.i = fmul nsz contract <8 x double> %bfactor.i, %6
  %res.i1713 = call nsz contract <8 x double> @llvm.fmuladd.v8f64(<8 x double> %res.i1714, <8 x double> %afactor.i1753, <8 x double> %tmp.i)
  %.elt2495 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 19, i32 0
  %.unpack2496 = load double, double* %.elt2495, align 8
  %.unpack2498.unpack.elt = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %4, i64 19, i32 1, i64 0, i64 0
  %70 = bitcast double* %.unpack2498.unpack.elt to <8 x double>*
  %71 = load <8 x double>, <8 x double>* %70, align 8
  %72 = fmul double %.unpack2496, 3.120000e+00
  %res.i1712 = fmul nsz contract <8 x double> %71, <double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00, double 3.120000e+00>
  %73 = fneg double %7
  %74 = fsub double %73, %36
  %75 = fadd nsz contract <8 x double> %res.i, %res.i1757
  %76 = fsub double %74, %48
  %77 = fadd nsz contract <8 x double> %75, %res.i1741
  %78 = fsub double %76, %67
  %79 = fadd nsz contract <8 x double> %77, %res.i1720
  %80 = fsub double %78, %69
  %81 = fadd nsz contract <8 x double> %79, %res.i1713
  %82 = fadd double %13, %80
  %res.i1706 = fsub nsz contract <8 x double> %res.i1795, %81
  %83 = fadd double %17, %82
  %res.i1705 = fadd nsz contract <8 x double> %res.i1788, %res.i1706
  %84 = fadd double %34, %83
  %res.i1704 = fadd nsz contract <8 x double> %res.i1764, %res.i1705
  %85 = fadd double %39, %84
  %res.i1703 = fadd nsz contract <8 x double> %res.i1751, %res.i1704
  %86 = fadd double %43, %85
  %res.i1702 = fadd nsz contract <8 x double> %res.i1749, %res.i1703
  %87 = fadd double %65, %86
  %res.i1701 = fadd nsz contract <8 x double> %res.i1722, %res.i1702
  %88 = fadd double %87, %72
  %res.i1700 = fadd nsz contract <8 x double> %res.i1701, %res.i1712
  %89 = bitcast {}* %0 to { double, [1 x [8 x double]] }**
  %90 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %89, align 8
  %.repack = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 0, i32 0
  store double %88, double* %.repack, align 8
  %91 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 0, i32 1, i64 0
  %92 = bitcast [8 x double]* %91 to <8 x double>*
  store <8 x double> %res.i1700, <8 x double>* %92, align 8
  %93 = fneg double %13
  %94 = fsub double %93, %17
  %95 = fadd nsz contract <8 x double> %res.i1795, %res.i1788
  %96 = fsub double %94, %34
  %97 = fadd nsz contract <8 x double> %95, %res.i1764
  %98 = fsub double %96, %43
  %99 = fadd nsz contract <8 x double> %97, %res.i1749
  %100 = fadd double %7, %98
  %res.i1695 = fsub nsz contract <8 x double> %res.i, %99
  %101 = fadd double %100, %64
  %res.i1694 = fadd nsz contract <8 x double> %res.i1695, %res.i1723
  %.repack2517 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 1, i32 0
  store double %101, double* %.repack2517, align 8
  %102 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 1, i32 1, i64 0
  %103 = bitcast [8 x double]* %102 to <8 x double>*
  store <8 x double> %res.i1694, <8 x double>* %103, align 8
  %104 = fsub double %7, %51
  %res.i1692 = fsub nsz contract <8 x double> %res.i, %res.i1735
  %105 = fadd double %104, %53
  %res.i1691 = fadd nsz contract <8 x double> %res.i1692, %res.i1733
  %106 = fadd double %105, %57
  %res.i1690 = fadd nsz contract <8 x double> %res.i1691, %res.i1731
  %107 = fadd double %106, %65
  %res.i1689 = fadd nsz contract <8 x double> %res.i1690, %res.i1722
  %.repack2520 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 2, i32 0
  store double %107, double* %.repack2520, align 8
  %108 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 2, i32 1, i64 0
  %109 = bitcast [8 x double]* %108 to <8 x double>*
  store <8 x double> %res.i1689, <8 x double>* %109, align 8
  %110 = fsub double %93, %52
  %111 = fadd nsz contract <8 x double> %res.i1795, %res.i1734
  %112 = fsub double %110, %53
  %113 = fadd nsz contract <8 x double> %111, %res.i1733
  %114 = fsub double %112, %67
  %115 = fadd nsz contract <8 x double> %113, %res.i1720
  %116 = fadd double %51, %114
  %res.i1684 = fsub nsz contract <8 x double> %res.i1735, %115
  %.repack2523 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 3, i32 0
  store double %116, double* %.repack2523, align 8
  %117 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 3, i32 1, i64 0
  %118 = bitcast [8 x double]* %117 to <8 x double>*
  store <8 x double> %res.i1684, <8 x double>* %118, align 8
  %119 = fsub double %20, %17
  %res.i1682 = fsub nsz contract <8 x double> %res.i1782, %res.i1788
  %120 = fadd double %20, %119
  %res.i1681 = fadd nsz contract <8 x double> %res.i1782, %res.i1682
  %121 = fadd double %120, %25
  %res.i1680 = fadd nsz contract <8 x double> %res.i1681, %res.i1779
  %122 = fadd double %121, %28
  %res.i1679 = fadd nsz contract <8 x double> %res.i1680, %res.i1773
  %123 = fadd double %122, %46
  %res.i1678 = fadd nsz contract <8 x double> %res.i1679, %res.i1743
  %124 = fadd double %123, %61
  %res.i1677 = fadd nsz contract <8 x double> %res.i1678, %res.i1729
  %.repack2526 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 4, i32 0
  store double %124, double* %.repack2526, align 8
  %125 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 4, i32 1, i64 0
  %126 = bitcast [8 x double]* %125 to <8 x double>*
  store <8 x double> %res.i1677, <8 x double>* %126, align 8
  %127 = fneg double %25
  %128 = fsub double %127, %30
  %129 = fadd nsz contract <8 x double> %res.i1779, %res.i1771
  %130 = fsub double %128, %48
  %131 = fadd nsz contract <8 x double> %129, %res.i1741
  %132 = fsub double %130, %61
  %133 = fadd nsz contract <8 x double> %131, %res.i1729
  %134 = fadd double %17, %132
  %res.i1672 = fsub nsz contract <8 x double> %res.i1788, %133
  %135 = fadd double %56, %134
  %res.i1671 = fadd nsz contract <8 x double> %res.i1732, %res.i1672
  %136 = fadd double %56, %135
  %res.i1670 = fadd nsz contract <8 x double> %res.i1732, %res.i1671
  %.repack2529 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 5, i32 0
  store double %136, double* %.repack2529, align 8
  %137 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 5, i32 1, i64 0
  %138 = bitcast [8 x double]* %137 to <8 x double>*
  store <8 x double> %res.i1670, <8 x double>* %138, align 8
  %139 = fneg double %20
  %140 = fsub double %139, %21
  %141 = fadd nsz contract <8 x double> %res.i1782, %res.i1781
  %142 = fsub double %140, %25
  %143 = fadd nsz contract <8 x double> %141, %res.i1779
  %144 = fadd double %142, %46
  %res.i1666 = fsub nsz contract <8 x double> %res.i1743, %143
  %.repack2532 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 6, i32 0
  store double %144, double* %.repack2532, align 8
  %145 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 6, i32 1, i64 0
  %146 = bitcast [8 x double]* %145 to <8 x double>*
  store <8 x double> %res.i1666, <8 x double>* %146, align 8
  %147 = fadd double %20, %21
  %148 = fadd double %147, %25
  %149 = fadd double %148, %28
  %res.i1663 = fadd nsz contract <8 x double> %143, %res.i1773
  %.repack2535 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 7, i32 0
  store double %149, double* %.repack2535, align 8
  %150 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %90, i64 7, i32 1, i64 0
  %151 = bitcast [8 x double]* %150 to <8 x double>*
  store <8 x double> %res.i1663, <8 x double>* %151, align 8
  %152 = fneg double %28
  %153 = fsub double %152, %30
  %154 = fadd nsz contract <8 x double> %res.i1773, %res.i1771
  %res.i1661 = fneg nsz contract <8 x double> %154
  %155 = load { double, [1 x [8 x double]] }*, { double, [1 x [8 x double]] }** %89, align 8
  %.repack2538 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 8, i32 0
  store double %153, double* %.repack2538, align 8
  %156 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 8, i32 1, i64 0
  %157 = bitcast [8 x double]* %156 to <8 x double>*
  store <8 x double> %res.i1661, <8 x double>* %157, align 8
  %158 = fsub double %28, %43
  %res.i1659 = fsub nsz contract <8 x double> %res.i1773, %res.i1749
  %159 = fadd double %34, %158
  %res.i1658 = fadd nsz contract <8 x double> %res.i1764, %res.i1659
  %.repack2541 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 9, i32 0
  store double %159, double* %.repack2541, align 8
  %160 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 9, i32 1, i64 0
  %161 = bitcast [8 x double]* %160 to <8 x double>*
  store <8 x double> %res.i1658, <8 x double>* %161, align 8
  %162 = fneg double %34
  %163 = fsub double %162, %36
  %164 = fadd nsz contract <8 x double> %res.i1764, %res.i1757
  %165 = fadd double %30, %163
  %res.i1655 = fsub nsz contract <8 x double> %res.i1771, %164
  %166 = fadd double %165, %39
  %res.i1654 = fadd nsz contract <8 x double> %res.i1655, %res.i1751
  %.repack2544 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 10, i32 0
  store double %166, double* %.repack2544, align 8
  %167 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 10, i32 1, i64 0
  %168 = bitcast [8 x double]* %167 to <8 x double>*
  store <8 x double> %res.i1654, <8 x double>* %168, align 8
  %.repack2547 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 11, i32 0
  store double %34, double* %.repack2547, align 8
  %169 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 11, i32 1, i64 0
  %170 = bitcast [8 x double]* %169 to <8 x double>*
  store <8 x double> %res.i1764, <8 x double>* %170, align 8
  %171 = fsub double %36, %39
  %res.i1652 = fsub nsz contract <8 x double> %res.i1757, %res.i1751
  %.repack2550 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 12, i32 0
  store double %171, double* %.repack2550, align 8
  %172 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 12, i32 1, i64 0
  %173 = bitcast [8 x double]* %172 to <8 x double>*
  store <8 x double> %res.i1652, <8 x double>* %173, align 8
  %174 = fsub double %43, %46
  %res.i1650 = fsub nsz contract <8 x double> %res.i1749, %res.i1743
  %.repack2553 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 13, i32 0
  store double %174, double* %.repack2553, align 8
  %175 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 13, i32 1, i64 0
  %176 = bitcast [8 x double]* %175 to <8 x double>*
  store <8 x double> %res.i1650, <8 x double>* %176, align 8
  %.repack2556 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 14, i32 0
  store double %48, double* %.repack2556, align 8
  %177 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 14, i32 1, i64 0
  %178 = bitcast [8 x double]* %177 to <8 x double>*
  store <8 x double> %res.i1741, <8 x double>* %178, align 8
  %179 = fneg double %56
  %180 = fsub double %179, %57
  %181 = fadd nsz contract <8 x double> %res.i1732, %res.i1731
  %182 = fadd double %52, %180
  %res.i1647 = fsub nsz contract <8 x double> %res.i1734, %181
  %.repack2559 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 15, i32 0
  store double %182, double* %.repack2559, align 8
  %183 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 15, i32 1, i64 0
  %184 = bitcast [8 x double]* %183 to <8 x double>*
  store <8 x double> %res.i1647, <8 x double>* %184, align 8
  %185 = fneg double %61
  %res.i1646 = fneg nsz contract <8 x double> %res.i1729
  %.repack2562 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 16, i32 0
  store double %185, double* %.repack2562, align 8
  %186 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 16, i32 1, i64 0
  %187 = bitcast [8 x double]* %186 to <8 x double>*
  store <8 x double> %res.i1646, <8 x double>* %187, align 8
  %.repack2565 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 17, i32 0
  store double %61, double* %.repack2565, align 8
  %188 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 17, i32 1, i64 0
  %189 = bitcast [8 x double]* %188 to <8 x double>*
  store <8 x double> %res.i1729, <8 x double>* %189, align 8
  %190 = fneg double %64
  %191 = fsub double %190, %65
  %192 = fadd nsz contract <8 x double> %res.i1723, %res.i1722
  %193 = fsub double %191, %69
  %194 = fadd nsz contract <8 x double> %192, %res.i1713
  %195 = fadd double %67, %193
  %res.i1642 = fsub nsz contract <8 x double> %res.i1720, %194
  %196 = fadd double %195, %72
  %res.i1641 = fadd nsz contract <8 x double> %res.i1642, %res.i1712
  %.repack2568 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 18, i32 0
  store double %196, double* %.repack2568, align 8
  %197 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 18, i32 1, i64 0
  %198 = bitcast [8 x double]* %197 to <8 x double>*
  store <8 x double> %res.i1641, <8 x double>* %198, align 8
  %199 = fsub double %69, %72
  %res.i1639 = fsub nsz contract <8 x double> %res.i1713, %res.i1712
  %.repack2571 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 19, i32 0
  store double %199, double* %.repack2571, align 8
  %200 = getelementptr inbounds { double, [1 x [8 x double]] }, { double, [1 x [8 x double]] }* %155, i64 19, i32 1, i64 0
  %201 = bitcast [8 x double]* %200 to <8 x double>*
  store <8 x double> %res.i1639, <8 x double>* %201, align 8
  ret void
}

There are many more named variables (instead of just #%) on this PR.

@chriselrod
Copy link
Contributor

The difference is large when the chunk sizes are not a power of 2.
Try using Chunk(7) instead of Chunk(8), like in the example.

@KristofferC
Copy link
Collaborator Author

On this PR there are a bunch of nsz contract fastmath flags on the operations? Any idea where those come from?

@KristofferC
Copy link
Collaborator Author

Try using Chunk(7) instead of Chunk(8), like in the example.

How do you measure this? Just so we do the same.

@chriselrod
Copy link
Contributor

You can test by using this in the script:

cfg = ForwardDiff.JacobianConfig(f!, du, u0, ForwardDiff.Chunk(5));
@time ForwardDiff.jacobian!(J, f!, du, u0, cfg);
@btime ForwardDiff.jacobian!($J, $f!, $du, $u0, $cfg);

I tested at a few different sizes, and at the very least, it did not seem to have a beneficial impact on runtime or compile time performance at the chunk sizes I tested (even though I naively thought the llvm looked better at a glance).

@KristofferC
Copy link
Collaborator Author

I meant how you measure the number of instructions. Are you using Cthulhu to step in or just directly calling the function with dual numbers?

@chriselrod
Copy link
Contributor

chriselrod commented Nov 22, 2021

Sorry, I apparently switched ForwardDiff commits in between my comments from 5 and 1 hour ago.
Now that I've checked out this commit again, I see a roughly >2x performance improvement; master:

julia> @time ForwardDiff.jacobian!(J, f!, du, u0, cfg);
  0.861934 seconds (4.71 M allocations: 253.155 MiB, 8.06% gc time, 99.99% compilation time)

julia> @btime ForwardDiff.jacobian!($J, $f!, $du, $u0, $cfg);
  2.163 μs (0 allocations: 0 bytes)

this PR:

julia> @time ForwardDiff.jacobian!(J, f!, du, u0, cfg);
  0.728831 seconds (3.94 M allocations: 221.083 MiB, 9.85% gc time, 99.99% compilation time)

julia> @btime ForwardDiff.jacobian!($J, $f!, $du, $u0, $cfg);
  1.164 μs (0 allocations: 0 bytes)

Lines of llvm are 723 vs 396 for me.
Master has a lot of instances like

  %93 = fmul double %.unpack1561, 0x3F4C2E33EFF19503
  %94 = extractelement <4 x double> %90, i32 0
  %95 = insertelement <7 x double> undef, double %94, i32 0
  %96 = extractelement <4 x double> %90, i32 1
  %97 = insertelement <7 x double> %95, double %96, i32 1
  %98 = extractelement <4 x double> %90, i32 2
  %99 = insertelement <7 x double> %97, double %98, i32 2
  %100 = extractelement <4 x double> %90, i32 3
  %101 = insertelement <7 x double> %99, double %100, i32 3
  %102 = extractelement <2 x double> %92, i32 0
  %103 = insertelement <7 x double> %101, double %102, i32 4
  %104 = extractelement <2 x double> %92, i32 1
  %105 = insertelement <7 x double> %103, double %104, i32 5
  %106 = insertelement <7 x double> %105, double %.unpack1563.unpack.unpack1576, i32 6

The actual assembly doesn't look nearly so bad, and uiCA predicts a much smaller difference than I observe:
master
PR

I meant how you measure the number of instructions. Are you using Cthulhu to step in or just directly calling the function with dual numbers?

Cthulhu. To count the number of lines, I copy/pasted into an editor.

EDIT:
I've only been looking at the vector mode jacobian code, but it's executing the chunk-mode Jacobian, so of course my benchmarks won't follow necessarily follow the difference in assembly or llvm. But it should only matter for the remainder (as it'd still be rhs! being called on the full chunk size).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants