Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat #1728: add stop command for ipfs-cluster-follow #2003

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

loomts
Copy link

@loomts loomts commented Nov 28, 2023

I am a newbie of ipfs-cluster and interested in this project. So I pick up this to learn the basic knowledge of ipfs-cluster. Can you please help me review @RubenKelevra @hsanjuan :D

Copy link
Collaborator

@hsanjuan hsanjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Two things that make this more difficult than it looks:

  • first: it doesn't unpin things (it would need to unpin everything from IPFS). Despite what Leave cluster and remove all pins in the process #1728 says, I would only do that upon --cleanup. For that to work IPFS would need to be running, and the command should fail cleanly if it is not, without doing anything. Unpinning should probably be done directly talking to the daemon API with an adhoc ipfs connector.
  • second: to shutdown the follower peer, assuming that it is running, you would need to figure out which process it is and send a signal to it (there is no other way). You could do that by figuring out who is listening on the socket used by the service (socketAddress), but this assumes the follower is running locally with a unix socket. If not then it should gracefully error since there is nothing we can do (no shutdown endpoint in the service API). Write a method next to socketAddress to find the PID of the process. I'm not sure how that is done but there's probably a way. Otherwise we need to write a pid file in /var/run etc. which is more hassle.

cmd/ipfs-cluster-follow/commands.go Outdated Show resolved Hide resolved
cmd/ipfs-cluster-follow/main.go Outdated Show resolved Hide resolved
cmd/ipfs-cluster-follow/commands.go Outdated Show resolved Hide resolved
@loomts
Copy link
Author

loomts commented Dec 5, 2023

Thanks for your tips!

  • I check api-socket file exist first, then unpinEverything and killFollower.
    As you said

    For that to work IPFS would need to be running, and the command should fail cleanly if it is not, without doing anything.

    if ipfs and ipfs-cluster-follow daemon all running, the api-socket will exist, so I just check this file for ensure IFPS daemon running.

  • After referencing signal issue, the killFollower method use syscall.GetsockoptUcred.

    It run as expected in my wsl ubuntu, but I not really sure in MacOS.

If you have any futher suggestions, plz let me know :D

@loomts loomts requested a review from hsanjuan December 5, 2023 05:05
@hsanjuan hsanjuan added this to the Release v1.0.8 milestone Jan 6, 2024
@hsanjuan hsanjuan self-assigned this Jan 30, 2024
cmd/ipfs-cluster-follow/commands.go Outdated Show resolved Hide resolved
cmd/ipfs-cluster-follow/commands.go Outdated Show resolved Hide resolved
cmd/ipfs-cluster-follow/commands.go Outdated Show resolved Hide resolved
cmd/ipfs-cluster-follow/commands.go Outdated Show resolved Hide resolved
cmd/ipfs-cluster-follow/main.go Outdated Show resolved Hide resolved
@loomts loomts requested a review from hsanjuan April 23, 2024 05:00
@loomts
Copy link
Author

loomts commented May 2, 2024

I made a little change to make stop command logic clearer.

  1. check unpin flag, then unpin everything if ipfs-cluster-follow daemon running, otherwise it will return.
  2. attempts to kill the ipfs-cluster-follow process. But if the process is not running, it will continue.
  3. check cleanup flag, then remove myCluster folder.

BTW, why this PR not be merged in v1.10, maybe this feature is not so much useful?

@hsanjuan
Copy link
Collaborator

hsanjuan commented May 3, 2024

v1.10 is still not released. This is still on my todo but not P0.


// if unpin flag is set, check if the ipfs-cluster-follow is running and unpin everything
if c.Bool("unpin") {
if _, err := os.Stat(filepath.Join(absPath, "api-socket")); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be hardened, given that api-socket can be overwritten with CLUSTER_RESTAPI_HTTPLISTENMULTIADDRESS.

What should be done instead is to getClient(...) and then call Version(ctx). If the call succeeds then we can proceed with unpinning, otherwise we can abort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it would be worth to let getClient return the client endp, and have an localEndpoint(endp) function that extracts the path of the unix socket from the multiaddress, or errors when not using a unix socket (in which case we cannot kill the process).

Comment on lines +600 to +606
out := make(chan api.Pin, 1024)
errCh := make(chan error, 1)

go func() {
defer close(errCh)
errCh <- client.Allocations(ctx, api.AllType, out)
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this below, after we have successfully created a connector, otherwise no point in allocating channels and launching the allocations query.

Comment on lines +341 to +360
func killFollower(absPath string) error {
conn, err := net.Dial("unix", filepath.Join(absPath, "api-socket"))
if err != nil {
return err
}
defer conn.Close()
connFile, err := conn.(*net.UnixConn).File()
if err != nil {
return err
}
ucred, err := syscall.GetsockoptUcred(int(connFile.Fd()), syscall.SOL_SOCKET, syscall.SO_PEERCRED)
if err != nil {
return err
}
followerP, err := os.FindProcess(int(ucred.Pid))
if err != nil {
return err
}
return followerP.Signal(syscall.SIGTERM)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel too good about this solution. My first question would be what is the behavior on Windows and the rest of platforms we compile to (i.e. Windows)? Does it work? If not, we should move the relevant code to a file with the compiler directives to build only on working platforms, while showing a meaningful error for the rest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this indeed doesn't work on Windows, another option would be to have ipfs-cluster-follow run directly write its pid into a file, which can be read and terminated directly, without having to resort to socket tricks.

The pid-file creation code would be part of cmdutils to be able to re-use it. And ofc needs to be removed when the process dies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's not work on Windows, and even we get the pid of ipfs-cluster-follow on Windows, maybe we also cannot gracefully terminate process on Windows. See

  1. Windows cannot directly send SIGTERM signal to process like Unix-like Systems.
  2. We can use taskkill /PID <pid> /F or below code to terminate process easily, but it will directly close the process and we cannot do anything before closing processes
func KillFollower(pid int) error {
	handle, _ := syscall.OpenProcess(syscall.PROCESS_TERMINATE, false, uint32(pid))
	defer syscall.CloseHandle(handle)
	return syscall.TerminateProcess(handle, 0)
}
  1. The one way to gracefully terminate process is use GenerateConsoleCtrlEvent, but we must start ipfs-cluster-follow with REATE_NEW_PROCESS_GROUP flag to get the pid of "console process group".
  2. Maybe we can only add a self close function on Client interface bypassing limitations of the Windows.

@@ -306,6 +338,27 @@ func socketAddress(absPath, clusterName string) (multiaddr.Multiaddr, error) {
return ma, nil
}

func killFollower(absPath string) error {
conn, err := net.Dial("unix", filepath.Join(absPath, "api-socket"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of filepath.Join(absPath, "api-socket") should be extracted from whatever socketAddress() returns, as that is the function that decides where this socket file lives. Following the previous comment, better to just pass the full path of the socket to this function which has been obtained before and verified to be a working local socket.

@loomts
Copy link
Author

loomts commented May 6, 2024

Thanks for your useful feedback, I will reconsider and fix it within a week!

@loomts
Copy link
Author

loomts commented May 15, 2024

@hsanjuan idk how to gracefully terminate process on Windows, would you mind that add shutdown api to restapi.go only for enable ipfs-cluster-follow stop command on Windows?(it can also make other unix Systems' implementation more elegant, without having to resort to socket tricks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants