New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat #1728: add stop command for ipfs-cluster-follow #2003
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Two things that make this more difficult than it looks:
- first: it doesn't unpin things (it would need to unpin everything from IPFS). Despite what Leave cluster and remove all pins in the process #1728 says, I would only do that upon
--cleanup
. For that to work IPFS would need to be running, and the command should fail cleanly if it is not, without doing anything. Unpinning should probably be done directly talking to the daemon API with an adhoc ipfs connector. - second: to shutdown the follower peer, assuming that it is running, you would need to figure out which process it is and send a signal to it (there is no other way). You could do that by figuring out who is listening on the socket used by the service (
socketAddress
), but this assumes the follower is running locally with a unix socket. If not then it should gracefully error since there is nothing we can do (no shutdown endpoint in the service API). Write a method next to socketAddress to find the PID of the process. I'm not sure how that is done but there's probably a way. Otherwise we need to write a pid file in /var/run etc. which is more hassle.
Thanks for your tips!
If you have any futher suggestions, plz let me know :D |
I made a little change to make stop command logic clearer.
BTW, why this PR not be merged in v1.10, |
v1.10 is still not released. This is still on my todo but not P0. |
|
||
// if unpin flag is set, check if the ipfs-cluster-follow is running and unpin everything | ||
if c.Bool("unpin") { | ||
if _, err := os.Stat(filepath.Join(absPath, "api-socket")); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be hardened, given that api-socket can be overwritten with CLUSTER_RESTAPI_HTTPLISTENMULTIADDRESS.
What should be done instead is to getClient(...)
and then call Version(ctx)
. If the call succeeds then we can proceed with unpinning, otherwise we can abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it would be worth to let getClient return the client endp
, and have an localEndpoint(endp)
function that extracts the path of the unix socket from the multiaddress, or errors when not using a unix socket (in which case we cannot kill the process).
out := make(chan api.Pin, 1024) | ||
errCh := make(chan error, 1) | ||
|
||
go func() { | ||
defer close(errCh) | ||
errCh <- client.Allocations(ctx, api.AllType, out) | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this below, after we have successfully created a connector, otherwise no point in allocating channels and launching the allocations query.
func killFollower(absPath string) error { | ||
conn, err := net.Dial("unix", filepath.Join(absPath, "api-socket")) | ||
if err != nil { | ||
return err | ||
} | ||
defer conn.Close() | ||
connFile, err := conn.(*net.UnixConn).File() | ||
if err != nil { | ||
return err | ||
} | ||
ucred, err := syscall.GetsockoptUcred(int(connFile.Fd()), syscall.SOL_SOCKET, syscall.SO_PEERCRED) | ||
if err != nil { | ||
return err | ||
} | ||
followerP, err := os.FindProcess(int(ucred.Pid)) | ||
if err != nil { | ||
return err | ||
} | ||
return followerP.Signal(syscall.SIGTERM) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel too good about this solution. My first question would be what is the behavior on Windows and the rest of platforms we compile to (i.e. Windows)? Does it work? If not, we should move the relevant code to a file with the compiler directives to build only on working platforms, while showing a meaningful error for the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this indeed doesn't work on Windows, another option would be to have ipfs-cluster-follow run
directly write its pid into a file, which can be read and terminated directly, without having to resort to socket tricks.
The pid-file creation code would be part of cmdutils
to be able to re-use it. And ofc needs to be removed when the process dies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's not work on Windows, and even we get the pid of ipfs-cluster-follow on Windows, maybe we also cannot gracefully terminate process on Windows. See
- https://stackoverflow.com/questions/55092139/gracefully-terminate-a-process-on-windows/62733194#62733194
- os: Interrupt is not sendable on Windows golang/go#6720 (comment)
- https://github.com/iwdgo/sigintwindows
- Windows cannot directly send SIGTERM signal to process like Unix-like Systems.
- We can use
taskkill /PID <pid> /F
or below code to terminate process easily, but it will directly close the process and we cannot do anything before closing processes
func KillFollower(pid int) error {
handle, _ := syscall.OpenProcess(syscall.PROCESS_TERMINATE, false, uint32(pid))
defer syscall.CloseHandle(handle)
return syscall.TerminateProcess(handle, 0)
}
- The one way to gracefully terminate process is use
GenerateConsoleCtrlEvent
, but we must start ipfs-cluster-follow withREATE_NEW_PROCESS_GROUP
flag to get the pid of "console process group". - Maybe we can only add a self close function on
Client
interface bypassing limitations of the Windows.
@@ -306,6 +338,27 @@ func socketAddress(absPath, clusterName string) (multiaddr.Multiaddr, error) { | |||
return ma, nil | |||
} | |||
|
|||
func killFollower(absPath string) error { | |||
conn, err := net.Dial("unix", filepath.Join(absPath, "api-socket")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of filepath.Join(absPath, "api-socket")
should be extracted from whatever socketAddress() returns, as that is the function that decides where this socket file lives. Following the previous comment, better to just pass the full path of the socket to this function which has been obtained before and verified to be a working local socket.
Thanks for your useful feedback, I will reconsider and fix it within a week! |
@hsanjuan idk how to gracefully terminate process on Windows, would you mind that add shutdown api to restapi.go only for enable ipfs-cluster-follow stop command on Windows?(it can also make other unix Systems' implementation more elegant, without having to resort to socket tricks |
I am a newbie of ipfs-cluster and interested in this project. So I pick up this to learn the basic knowledge of ipfs-cluster. Can you please help me review @RubenKelevra @hsanjuan :D