Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACL2(p) can hang when profiling #638

Open
MattKaufmann opened this issue Sep 7, 2016 · 6 comments
Open

ACL2(p) can hang when profiling #638

MattKaufmann opened this issue Sep 7, 2016 · 6 comments

Comments

@MattKaufmann
Copy link
Contributor

Here is how I was able to get ACL2(p) to hang with
waterfall-parallelism on, both in Linux and on a Mac. I've seen this
in lots of books when trying a regression, but I'm about to push a
change to books/build/make_cert_help.pl that avoids this problem
during regressions -- though this example shows that the problem still
remains if you profile with waterfall-parallelism on.

First, in an ACL2 directory, build ACL2 and ACL2(p) and clean the
books. Then (takes only a few seconds):

cd books/clause-processors/
../build/cert.pl -j 8 --acl2 <your_acl2> generalize

Then start ACL2(p) and evaluate these forms.

(set-waterfall-parallelism t)
(profile 'prove) ; can use profile-fn in raw Lisp instead
(include-book "std/portcullis" :dir :system)
(set-debugger-enable t) ; optional
(ld "generalize.lisp") ; or certify the book

For me, the proof got stuck in the proof of SYMBOL-N-UNIQUE-N. Here's
a key part (from the top) of the backtrace.

1 > :b
 (25817328) : 0 (%PROCESS-WAIT-ON-SEMAPHORE-PTR #<A Foreign Pointer [gcable] #xB1303> 16777215 0 "semaphore wait" NIL) 437
 (258173B0) : 1 (WAIT-ON-SEMAPHORE #<CCL:SEMAPHORE #x302015741AAD> NIL "semaphore wait") 173
 (258173E8) : 2 (WAIT-ON-SEMAPHORE #<CCL:SEMAPHORE #x302015741AAD> :NOTIFICATION NIL :TIMEOUT NIL) 141
 (25817420) : 3 (MT-FUTURE-READ #S(MT-FUTURE :INDEX 699 :VALUE NIL ...)) 621
 (25817470) : 4 (WATERFALL1-TREE@PAR-PARALLEL 4 ((0) NIL . 1) ((# # #) (# # #) (# # #) (# # #)) ((SIMPLIFY-CLAUSE # # HIT-REWRITE # ...) (PREPROCESS-CLAUSE # # HIT # ...)) ((# NIL) (# # # # # ...) (NIL # # NIL) (EQUAL # #) (EQUAL # #) ...) NIL NIL NIL ((4640 # # # # ...) (ENABLED-ARRAY-0 . 5000) (#\E #\N #\A #\B #\L ...) . 0) ((EVENT-LANDMARK GLOBAL-VALUE # # DEFTHM ...) (INTERN-IN-PACKAGE-OF-SYMBOL-UNIQUE ABSOLUTE-EVENT-NUMBER . 10554) (PROOF-SUPPORTERS-ALIST GLOBAL-VALUE # # #) (INTERN-IN-PACKAGE-OF-SYMBOL-UNIQUE RUNIC-MAPPING-PAIRS #) (INTERN-IN-PACKAGE-OF-SYMBOL-UNIQUE THEOREM IMPLIES # #) ...) (DEFTHM . SYMBOL-N-UNIQUE-N) ACL2_INVISIBLE::|The Live State Itself| 536869624) 3133

One thing I noticed is that if I first traced the function
futures-resources-available, by inserting the following just before
(ld "generalize.lisp"), then the output from the trace was
interleaved.

(value :q)
(old-trace futures-resources-available)
(lp)

So I wonder if futures-resources-available needs some sort of a lock
or other method to make it atomic; but I'm just grasping at straws.

David has suggested that I file this issue against him, though he
plans not to fix the problem unless perhaps it prevents users from
being productive.

MattKaufmann pushed a commit that referenced this issue Sep 7, 2016
…ssue #638) and in part by turning off waterfall-parallelism in a few books.

Also, removed item from :doc
unsupported-waterfall-parallelism-features about memoization not being
supported with parallel execution, since that combination has been
supported for some time.  However, in its place (following a
suggestion from David Rager), added an item about profiling possibly
hanging with waterfall-parallelism enabled, pointing to GitHub Issue
#638.  (I have not yet propagated this change to doc.lisp or the
online manual, but I will likely do that pretty soon.)
@ragerdl
Copy link
Member

ragerdl commented Sep 7, 2016

Thanks for the report @MattKaufmann.

The inter-leaved futures-resources-available output is probably a red herring. That function can be entered and exited in parallel. Looking at its definition, it's starting to remind me of why it's okay for it to also execute in parallel.

This would be interesting to look into, but it's a potentially deep dive. So, I don't plan to look at it anymore unless it's bothering someone. The chances of making things worse via a "good" change are non-trivial. Also, given I currently know nothing about how CCL implements parallelism, and given there's probably not a whole lot of CCL testing of multi-threading combined with profiling, it wouldn't surprise me if there were a CCL bug (but that's not where I would start).

@MattKaufmann, thanks for fixing the cert.pl issue so that profiling is disabled. I'll close this unless someone else raises objections.

@acoglio
Copy link
Member

acoglio commented Jun 27, 2022

@MattKaufmann @ragerdl Do you think that this Issue can be closed?

@MattKaufmann
Copy link
Contributor Author

@acoglio I don't know what's best, but I have no objection to closing it.

@ragerdl
Copy link
Member

ragerdl commented Jul 2, 2022 via email

@acoglio
Copy link
Member

acoglio commented Jul 2, 2022

If it's still a bug it seems best to keep it open. Just checking!

@MattKaufmann
Copy link
Contributor Author

I don't know if it's best to keep this open. David is probably the only one who knows enough about the ACL2(p) implementation to have a chance at fixing the problem with reasonable effort, and I don't blame him at all for not being interested in doing that (he's moved on). I'm pretty sure that I've done all the debugging I want to do on this (I've never signed up for ACL2(p) maintenance, even though I've done some). I suppose I'd be willing to incorporate fixes into the ACL2 sources from someone else, IF those fixes only affect ACL2(p), not ACL2. But I'd be surprised if anyone takes that on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants