Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes (SIGABRT) Under Test Automation #539

Open
clintmiller opened this issue Sep 27, 2023 · 6 comments
Open

Crashes (SIGABRT) Under Test Automation #539

clintmiller opened this issue Sep 27, 2023 · 6 comments

Comments

@clintmiller
Copy link

clintmiller commented Sep 27, 2023

Environment

Operating System

Intel Mac, x86_64

ProductName:		macOS
ProductVersion:		13.4.1
ProductVersionExtra:	(c)
BuildVersion:		22F770820d

TinyTDS Version and Information

[TinyTds][v2.1.5][tsql]: /usr/local/bin/tsql
Compile-time settings (established with the "configure" script)
                            Version: freetds v1.3.20
             freetds.conf directory: /usr/local/etc
     MS db-lib source compatibility: no
        Sybase binary compatibility: yes
                      Thread safety: yes
                      iconv library: yes
                        TDS version: 7.3
                              iODBC: no
                           unixodbc: yes
              SSPI "trusted" logins: no
                           Kerberos: yes
                            OpenSSL: yes
                             GnuTLS: no
                               MARS: yes

I'm actually pointing at a post-2.1.5 commit on tiny_tds: 23ed1e4, where #527 was merged.

FreeTDS Version

freetds: 1.3.20

SQL Server

Running against 2017 for Linux in a Docker container via Docker Desktop for Mac. SQL Server Authentication.

Description

In the course of trying to upgrade an application from Ruby 2.5/Rails 5.2.x to Ruby 3.1/Rails 6.1.x, I have noticed a significant increase in automated test failures (we're using rspec-rails). These failures seem take a couple of forms:

1. An EXC_BAD_ACCESS (SIGABRT).

Terminal output:

<big long memory region dump trimmed>

Abort trap: 6

OS Crash Report:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	    0x7ff8092e61f2 __pthread_kill + 10
1   libsystem_pthread.dylib       	    0x7ff80931dee6 pthread_kill + 263
2   libsystem_c.dylib             	    0x7ff809244b45 abort + 123
3   libruby.3.1.dylib             	       0x106e845e9 die + 9
4   libruby.3.1.dylib             	       0x106e84810 rb_bug_for_fatal_signal + 544
5   libruby.3.1.dylib             	       0x106fb475b sigsegv + 91
6   libsystem_platform.dylib      	    0x7ff80934b5ed _sigtramp + 29
7   ???                           	               0x0 ???
8   tiny_tds.bundle               	       0x10c34543e nogvl_dbsqlok + 88 (result.c:130) [inlined]
9   tiny_tds.bundle               	       0x10c34543e rb_tinytds_result_ok_helper + 109 (result.c:179) [inlined]
10  tiny_tds.bundle               	       0x10c34543e rb_tinytds_result_exec_helper + 126 (result.c:185)
11  tiny_tds.bundle               	       0x10c344528 rb_tinytds_result_do + 56 (result.c:528)
12  libruby.3.1.dylib             	       0x10704864d vm_call_cfunc_with_frame + 349
<snip>
Thread 1:
0   libsystem_kernel.dylib        	    0x7ff8092e629e poll + 10
1   libruby.3.1.dylib             	       0x1070004ac timer_pthread_fn + 140
2   libsystem_pthread.dylib       	    0x7ff80931e1d3 _pthread_start + 125
3   libsystem_pthread.dylib       	    0x7ff809319bd3 thread_start + 15

Thread 2:: connection_pool.rb:323
0   libsystem_kernel.dylib        	    0x7ff8092e629e poll + 10
1   libruby.3.1.dylib             	       0x106ff4acd rb_sigwait_sleep + 557
2   libruby.3.1.dylib             	       0x106ff5fb4 native_sleep + 500
3   libruby.3.1.dylib             	       0x106ff6b9b sleep_hrtime + 331
4   libruby.3.1.dylib             	       0x106f5f475 rb_f_sleep + 85
5   libruby.3.1.dylib             	       0x10704864d vm_call_cfunc_with_frame + 349
<snip>
Thread 3:: worker-1
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x106fffde5 gvl_acquire_common + 309
3   libruby.3.1.dylib             	       0x106ff7648 blocking_region_end + 216
4   libruby.3.1.dylib             	       0x106ff716f rb_nogvl + 175
5   tiny_tds.bundle               	       0x10c345433 nogvl_dbsqlok + 77 (result.c:129) [inlined]
6   tiny_tds.bundle               	       0x10c345433 rb_tinytds_result_ok_helper + 98 (result.c:179) [inlined]
7   tiny_tds.bundle               	       0x10c345433 rb_tinytds_result_exec_helper + 115 (result.c:185)
8   tiny_tds.bundle               	       0x10c344528 rb_tinytds_result_do + 56 (result.c:528)
9   libruby.3.1.dylib             	       0x10704864d vm_call_cfunc_with_frame + 349
<snip>

I'm pretty sure this was a simple unit test, given the running threads in the Ruby process. One thing that immediately catches my eye is that there appear to be two threads in the TinyTDS C code at the same time. I'm also curious about the nogvl_* functions, but only b/c of their name. I'm not sure I fully understand their purpose.

2. An EXC_CRASH (SIGABRT).

Terminal output:

/Users/clintmiller/.asdf/installs/ruby/3.1.4/lib/ruby/gems/3.1.0/gems/activerecord-sqlserver-adapter-6.1.3.0/lib/active_record/connection_adapters/sqlserver/database_statements.rb:463: warning: TinyTds: dbsqlsend() returned FAIL.

Assertion failed: (conn->in_net_tds == NULL), function tds_free_connection, file mem.c, line 1209.

OS Crash Report:

Thread 0::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	    0x7ff8092e629e poll + 10
1   libsybdb.5.dylib              	       0x1101c3ba6 tds_select + 221
2   libsybdb.5.dylib              	       0x1101c8419 tds_connection_network + 110
3   libsybdb.5.dylib              	       0x1101c8212 tds_read_packet + 235
4   libsybdb.5.dylib              	       0x1101afb94 tds_get_byte + 25
5   libsybdb.5.dylib              	       0x1101a8acf tds_process_tokens + 183
6   libsybdb.5.dylib              	       0x1101953e5 dbsqlok + 175
7   libruby.3.1.dylib             	       0x10ad8b158 rb_nogvl + 152
8   tiny_tds.bundle               	       0x11011c433 nogvl_dbsqlok + 77 (result.c:129) [inlined]
9   tiny_tds.bundle               	       0x11011c433 rb_tinytds_result_ok_helper + 98 (result.c:179) [inlined]
10  tiny_tds.bundle               	       0x11011c433 rb_tinytds_result_exec_helper + 115 (result.c:185)
11  tiny_tds.bundle               	       0x11011b528 rb_tinytds_result_do + 56 (result.c:528)
12  libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 1:
0   libsystem_kernel.dylib        	    0x7ff8092e629e poll + 10
1   libruby.3.1.dylib             	       0x10ad944ac timer_pthread_fn + 140
2   libsystem_pthread.dylib       	    0x7ff80931e1d3 _pthread_start + 125
3   libsystem_pthread.dylib       	    0x7ff809319bd3 thread_start + 15

Thread 2:: connection_pool.rb:323
0   libsystem_kernel.dylib        	    0x7ff8092e629e poll + 10
1   libruby.3.1.dylib             	       0x10ad88acd rb_sigwait_sleep + 557
2   libruby.3.1.dylib             	       0x10ad89fb4 native_sleep + 500
3   libruby.3.1.dylib             	       0x10ad8ab9b sleep_hrtime + 331
4   libruby.3.1.dylib             	       0x10acf3475 rb_f_sleep + 85
5   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 3:: Timeout stdlib thread
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8a8b9 sleep_forever + 457
5   libruby.3.1.dylib             	       0x10ad95c17 queue_sleep + 39
6   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
7   libruby.3.1.dylib             	       0x10ad95b62 queue_do_pop + 226
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 4:: server.rb:76
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad94276 thread_join_sleep + 230
5   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
6   libruby.3.1.dylib             	       0x10ad940bd thread_join + 157
7   libruby.3.1.dylib             	       0x10ad90d6e thread_join_m + 238
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 5:: puma reactor
0   libsystem_kernel.dylib        	    0x7ff8092e41ee kevent + 10
1   nio4r_ext.bundle              	       0x1191c6c69 kqueue_poll + 281 (ev_kqueue.c:102)
2   nio4r_ext.bundle              	       0x1191c3f55 ev_backend_poll + 21 (ev.c:4026)
3   libruby.3.1.dylib             	       0x10ad8b158 rb_nogvl + 152
4   nio4r_ext.bundle              	       0x1191c33b0 ev_run + 768 (ev.c:4216)
5   nio4r_ext.bundle              	       0x1191c902a NIO_Selector_run + 86 (selector.c:476) [inlined]
6   nio4r_ext.bundle              	       0x1191c902a NIO_Selector_select_synchronized + 186 (selector.c:431)
7   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
8   nio4r_ext.bundle              	       0x1191c8775 NIO_Selector_select + 101 (selector.c:409)
9   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 6:: puma threadpool reaper
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92bd7 native_cond_sleep + 455
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8ab9b sleep_hrtime + 331
5   libruby.3.1.dylib             	       0x10acf3475 rb_f_sleep + 85
6   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 7:: puma threadpool trimmer
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92bd7 native_cond_sleep + 455
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8ab9b sleep_hrtime + 331
5   libruby.3.1.dylib             	       0x10acf3475 rb_f_sleep + 85
6   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 8:: puma server
0   libsystem_kernel.dylib        	    0x7ff8092e8282 __select + 10
1   libruby.3.1.dylib             	       0x10ad8d8b9 do_select + 425
2   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
3   libruby.3.1.dylib             	       0x10ad8d5ca rb_thread_fd_select + 1178
4   libruby.3.1.dylib             	       0x10ac6b5dd select_call + 1293
5   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
6   libruby.3.1.dylib             	       0x10ac5eb6c rb_f_select + 220
7   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 9:: worker.rb:96
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8a8b9 sleep_forever + 457
5   libruby.3.1.dylib             	       0x10ad95c17 queue_sleep + 39
6   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
7   libruby.3.1.dylib             	       0x10ad95b62 queue_do_pop + 226
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 10:: worker-1
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8a8b9 sleep_forever + 457
5   libruby.3.1.dylib             	       0x10ad95c17 queue_sleep + 39
6   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
7   libruby.3.1.dylib             	       0x10ad95b62 queue_do_pop + 226
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 11:: worker-2
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8a8b9 sleep_forever + 457
5   libruby.3.1.dylib             	       0x10ad95c17 queue_sleep + 39
6   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
7   libruby.3.1.dylib             	       0x10ad95b62 queue_do_pop + 226
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 12:: worker-1
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8a8b9 sleep_forever + 457
5   libruby.3.1.dylib             	       0x10ad95c17 queue_sleep + 39
6   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
7   libruby.3.1.dylib             	       0x10ad95b62 queue_do_pop + 226
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 13:: worker-1
0   libsystem_kernel.dylib        	    0x7ff8092e20ee __psynch_cvwait + 10
1   libsystem_pthread.dylib       	    0x7ff80931e758 _pthread_cond_wait + 1242
2   libruby.3.1.dylib             	       0x10ad92c06 native_cond_sleep + 502
3   libruby.3.1.dylib             	       0x10ad89e30 native_sleep + 112
4   libruby.3.1.dylib             	       0x10ad8a8b9 sleep_forever + 457
5   libruby.3.1.dylib             	       0x10ad95c17 queue_sleep + 39
6   libruby.3.1.dylib             	       0x10ac24ece rb_ensure + 350
7   libruby.3.1.dylib             	       0x10ad95b62 queue_do_pop + 226
8   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>
Thread 14 Crashed:: worker-2
0   libsystem_kernel.dylib        	    0x7ff8092e61f2 __pthread_kill + 10
1   libsystem_pthread.dylib       	    0x7ff80931dee6 pthread_kill + 263
2   libsystem_c.dylib             	    0x7ff809244b45 abort + 123
3   libsystem_c.dylib             	    0x7ff809243e5e __assert_rtn + 314
4   libsybdb.5.dylib              	       0x1101ce32f tds_free_connection.cold.1 + 35
5   libsybdb.5.dylib              	       0x1101a6509 tds_free_connection + 368
6   libsybdb.5.dylib              	       0x1101a671f tds_free_socket + 215
7   libsybdb.5.dylib              	       0x110194a27 dbclose + 239
8   tiny_tds.bundle               	       0x1101197af rb_tinytds_close + 63 (client.c:266)
9   libruby.3.1.dylib             	       0x10addc64d vm_call_cfunc_with_frame + 349
<snip>

Obviously, this case with more threads is a system/integration test that started a webserver.

Other Notes

It's interesting to note, that the application boots and runs in my development environment w/o any issues. I'm not sure how it will do with a production workload given the crashes I'm seeing in our test suite.

Next Steps

Making a minimal reproduction test-case would be useful, but I'm not quite sure where to start. On Ruby 2.5/Rails 5.2, our test suite rarely (if ever), crashed in this manner. Is there any other additional information I can provide or debug builds of TinyTDS I could use to reveal more helpful information?

@andyundso
Copy link
Contributor

Hi, couple of questions:

  • Against which SQL server do you run the test suite? What kind of of authentication do you run?
  • The crash reports seem from a Mac system. With test automation, do you mean just the test suite of your project or do you run automated tests against Mac on a continuous integration system, like GitHub Actions?
  • If you update your tiny_tds to the mentioned commit 23ed1e4 before you did the Ruby and Rails upgrade, can you reproduce these issues?
  • Do you run ARM or Intel on your Mac?

@clintmiller
Copy link
Author

Updated above w/ answers- also copied here for convenience, thank you @andyundso.

Against which SQL server do you run the test suite? What kind of of authentication do you run?

Running against 2017 for Linux in a Docker container via Docker Desktop for Mac. SQL Server Authentication. I don't think I'm having an authentication problem as my app (and tests) are able to connect to the database and execute queries.

The crash reports seem from a Mac system. With test automation, do you mean just the test suite of your project or do you run automated tests against Mac on a continuous integration system, like GitHub Actions?

By "test automation", I simply meant the execution of the test suite for this project. As part of this upgrade process, I will get the suite running in an automated CI system, either w/ updates to our existing, but dated Jenkins CI installation, or in GitHub Actions. I have not done this yet.

If you update your tiny_tds to the mentioned commit 23ed1e4 before you did the Ruby and Rails upgrade, can you reproduce these issues?

I will try this against our Rails 5.2 branch, assuming the newer TinyTDS is backwards compatible both w/ Rails 5.2 (should be, I think), and Ruby 2.5.x (??).

Do you run ARM or Intel on your Mac?

This is on an Intel Mac.

@andyundso
Copy link
Contributor

Disclaimer: I'm unfamiliar with the C code in tiny_tds, just trying to help out :).

I also own an Intel Mac, the same OS version as yours, the same FreeTDS version. I maintain two Rails projects, both with RSpec, but one runs Ruby 3.0, the other 3.1. I checked if I also ran into these issues when using the version of tiny_tds you mentioned in your issue and the SQL server 2017 in Docker. I have run about ten runs on both apps so far with no problems. Are your crashes more infrequent than every tenth run?

In general, from what I read in the code, the nogvl stuff executes code on a separate Ruby thread but releases the Global VM Lock, as we no longer execute Ruby code, but instead talk with FreeTDS, who then talks to MSSQL. Once the function finishes, it re-acquires the GVL.

EXC_BAD_ACCESS means we write memory somewhere where we should not. EXC_CRASH reacts to a SIGABRT, likely raised of freetds because of the assertion error that it hits.

Is there some consistency when you hit these errors? For example, I assume you run your test suite in random order. Once it crashes and you re-run the test suite with the same seed, does it crash again with the same error?

As mentioned, it would be interesting to know if you also hit this error when updating tiny_tds, but run the Ruby 2.5 / Rails 5.2 version of your app. tiny_tds should work with Ruby 2.5, at least our CI is green :)

@bf4
Copy link
Contributor

bf4 commented Oct 11, 2023

similar error when in rails console on Ruby 3.1.2, Rails 7.0, TinyTDS 2.1.5, freetds v1.3.20 on M1 mac running against various MS Sql Server servers e.g. Microsoft SQL Server 2016 (SP3-CU1-GDR) (KB5021128) - 13.0.7024.30 (X64)

usually after I've made a bunch of queries and have had the console open a while, but only on exit

Assertion failed: (conn->in_net_tds == NULL), function tds_free_connection, file mem.c, line 1209.
Abort trap: 6

@clintmiller
Copy link
Author

I was able to reliably trigger this with a single spec example and it's beginning to form some type of working theory for me. This is an excerpt from a Rails controller spec:

  describe 'GET finish' do
    let(:product_download) { create(:product_download) }
    let(:params) do
      { membership: product_download.membership_id, log: product_download.id, bytes: 123456 }
    end

    it 'closes the product download log log' do
      get :finish, params: params
      updated_log = ProductDownload.find(product_download.id) # <= This is the boom!
      ...
    end
  end

Test output:

Igniter::DownloadController
  GET finish
/Users/clintmiller/.asdf/installs/ruby/3.1.4/lib/ruby/gems/3.1.0/gems/activerecord-sqlserver-adapter-6.1.3.0/lib/active_record/connection_adapters/sqlserver/database_statements.rb:463: warning: TinyTds: dbsqlsend() returned FAIL.

Assertion failed: (conn->in_net_tds == NULL), function tds_free_connection, file mem.c, line 1209.
Abort trap: 6

One other important detail is that my application is a multi-tenant app, where the current tenant is a per-request concept that can be switched based on criteria like the hostname requested. One of the things that tenant switching did* is switch the database connection with the Apartment gem, which we had* forked to support the activerecord-sqlserver-adapter.

Something must be happening lifetime of these database connections when this spec is run. The database connection being used by the Rails server handling the get :finish request is closed (the request is over after all?) but is then attempted to be used again by the following line? I'm not sure that's exactly it.

*Did and *had b/c we've actually removed Apartment and the tenant switching functionality that used it to change the database connections. The tenancy concept is still in place, and requests still switch tenants as before, but we aren't manipulating the database connections during the tenant switch any longer.

@bf4
Copy link
Contributor

bf4 commented Oct 18, 2023

@clintmiller have you tried reproducing bundling against main? there may be an unreleased fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants