Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory path collision error - pipeline that fails #8383

Open
amandinesoub opened this issue Nov 22, 2022 · 2 comments
Open

Directory path collision error - pipeline that fails #8383

amandinesoub opened this issue Nov 22, 2022 · 2 comments
Labels

Comments

@amandinesoub
Copy link

What happened?:

We have implemented a data architecture based on Pachyderm. Our first pipeline (called textblocks), having already processed a large number of files (27402 files), is completely blocked when we try to add a new file to be processed.

More specifically, the strange behaviour observed is as follows :

  • When we start the "textblocks" pipeline (pachctl start pipeline textblocks) we have all the pods created and the pipeline goes into running status.
pachctl list pipeline

NAME                  VERSION INPUT                                                                                 CREATED      STATE / LAST JOB  DESCRIPTION   
textblocks            1       dump_file:/**.pdf                                                                     5 weeks ago  running / success description               
kubectl get pods

NAME                           READY   STATUS    RESTARTS   AGE
etcd-0                         1/1     Running   0          26h
pachd-55f54bb966-ntfhk         1/1     Running   0          26h
pg-bouncer-7b855cb797-zzj4q    1/1     Running   0          26h
pipeline-textblocks-v1-98tjl   2/2     Running   0          126m
postgres-0                     1/1     Running   0          26h

The pipeline, having already processed the files that have been put in for a while, automatically starts to run and is quickly put in success status, as no new files are to be processed - so far so good.

  • With the pipeline still on, we add just one new file to process, with a port-forward enabled. Then the pipeline goes back into status run, and after a few moments goes into failure status. Looking at the available pods logs, we see a somewhat specific error, which seems to have caused the pipeline to fail to process. This error appears to be "error":"file / directory path collision (/PARIS/2019/PARIS_2019_02_5fOdx.pdf)".
kubectl logs pipeline-textblocks-v1-98tjl --all-containers | grep error

{"pipelineName":"textblocks","workerId":"pipeline-textblocks-v1-98tjl","master":true,"ts":"2022-11-22T11:25:33.888003325Z","message":"errored transform spawner process: rpc error: code = Unknown desc = cannot clear finished commit"}
{"pipelineName":"textblocks","workerId":"pipeline-textblocks-v1-98tjl","master":true,"ts":"2022-11-22T11:25:33.892265167Z","message":"master: error running the master process, retrying in 11.04660288s: rpc error: code = Unknown desc = cannot clear finished commit"}
2022-11-22T11:25:33Z INFO pfs.API.InspectCommit {"duration":0.73315825,"request":{"commit":{"branch":{"repo":{"name":"dump_file","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"wait":4},"response":{"commit":{"branch":{"repo":{"name":"dump_file","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"origin":{"kind":1},"parent_commit":{"branch":{"repo":{"name":"dump_file","type":"user"},"name":"master"},"id":"c886358946a54b3287e77d60a7fd6d67"},"started":{"seconds":1669116332,"nanos":866730000},"finishing":{"seconds":1669116332,"nanos":866730000},"finished":{"seconds":1669116333,"nanos":844833000},"error":"file / directory path collision (/PARIS/2019/PARIS_2019_02_5fOdx.pdf)","size_bytes_upper_bound":3796419426,"details":{"size_bytes":3796419426,"compacting_time":{"nanos":376350981},"validating_time":{"nanos":270155412}}}} 
2022-11-22T11:25:33Z INFO transaction.API.BatchTransaction {"request":{"requests":[{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"meta"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"update_job_state":{"job":{"pipeline":{"name":"textblocks"},"id":"76ac31a145d040fcaff5923058c43eb0"},"state":4,"reason":"inputs failed: dump_file","stats":{}}}]}} 
2022-11-22T11:25:33Z INFO transaction.API.BatchTransaction {"duration":0.01107051,"request":{"requests":[{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"meta"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"update_job_state":{"job":{"pipeline":{"name":"textblocks"},"id":"76ac31a145d040fcaff5923058c43eb0"},"state":4,"reason":"inputs failed: dump_file","stats":{}}}]},"response":{"transaction":{"id":"275b4daeb0ce4640926309636307d743"},"requests":[{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"meta"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"update_job_state":{"job":{"pipeline":{"name":"textblocks"},"id":"76ac31a145d040fcaff5923058c43eb0"},"state":4,"reason":"inputs failed: dump_file","stats":{}}}],"responses":[{},{},{}],"started":{"seconds":1669116333,"nanos":857966665}}} 
2022-11-22T11:25:33Z INFO pfs.API.InspectCommit {"duration":0.005549267,"request":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"wait":1},"response":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"origin":{"kind":2},"parent_commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"c886358946a54b3287e77d60a7fd6d67"},"started":{"seconds":1669116332,"nanos":866730000},"finishing":{"seconds":1669116333,"nanos":856609000},"direct_provenance":[{"repo":{"name":"dump_file","type":"user"},"name":"master"},{"repo":{"name":"textblocks","type":"spec"},"name":"master"}],"error":"inputs failed: dump_file"}} 
2022-11-22T11:25:33Z ERROR pfs.API.ClearCommit {"duration":0.002795112,"error":"cannot clear finished commit","request":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"}},"stack":["github.com/pachyderm/pachyderm/v2/src/server/pfs/server.(*driver).clearCommit\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/server/pfs/server/driver.go:1608","github.com/pachyderm/pachyderm/v2/src/server/pfs/server.(*apiServer).ClearCommit\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/server/pfs/server/api_server.go:306","github.com/pachyderm/pachyderm/v2/src/server/pfs/server.(*validatedAPIServer).ClearCommit\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/server/pfs/server/val_server.go:126","github.com/pachyderm/pachyderm/v2/src/pfs._API_ClearCommit_Handler.func1\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/pfs/pfs.pb.go:5109","github.com/pachyderm/pachyderm/v2/src/internal/middleware/auth.(*Interceptor).InterceptUnary\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/internal/middleware/auth/interceptor.go:292","google.golang.org/grpc.getChainUnaryHandler.func1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:921","github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1\n\t/Users/avigil/go/pkg/mod/github.com/opentracing-contrib/go-grpc@v0.0.0-20180928155321-4b5a12d3ff02/server.go:44","google.golang.org/grpc.getChainUnaryHandler.func1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:921","github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1\n\t/Users/avigil/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.1-0.20191002090509-6af20e3a5340/server_metrics.go:108","google.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:907","github.com/pachyderm/pachyderm/v2/src/pfs._API_ClearCommit_Handler\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/pfs/pfs.pb.go:5111","google.golang.org/grpc.(*Server).processUnaryRPC\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1082","google.golang.org/grpc.(*Server).handleStream\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405","google.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"]} 
2022-11-22T11:25:33Z ERROR pps.API.SubscribeJob {"duration":169.124623479,"error":"context canceled","request":{"pipeline":{"name":"textblocks"},"details":true},"stack":null} 

Although I have tried to delete the file (/PARIS/2019/PARIS_2019_02_5fOdx.pdf) cited in the error in the logs, this does not seem to resolve the pipeline. Indeed, a new log appears, similar to the previous one, but specifying another file name.
Having checked, however, we have no duplicate files that have been added to the architecture. Therefore, no path collision errors should appear, in all logic.

What you expected to happen?:

That the pipeline runs smoothly and processes the newly added file without any problems, and switches to success status.

How to reproduce it (as minimally and precisely as possible)?:
No idea ... really sorry about that.

Environment?:

  • Kubernetes version (use kubectl version):
kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:35:46Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8-gke.1900", GitCommit:"79209257257c051b27df67c567755783eda93353", GitTreeState:"clean", BuildDate:"2022-07-15T09:23:51Z", GoVersion:"go1.17.11b7", Compiler:"gc", Platform:"linux/amd64"}
  • Pachyderm CLI and pachd server version (use pachctl version):
pachctl version

COMPONENT           VERSION             
pachctl             2.0.6               
pachd               2.0.6      
  • Cloud provider (e.g. aws, azure, gke) or local deployment (e.g. minikube vs dockerized k8s):
    GKE

  • If you deployed with helm, the values you used (helm get values pachyderm):

helm get values pachyderm

USER-SUPPLIED VALUES:
deployTarget: GOOGLE
pachd:
  externalService:
    enabled: true
  storage:
    google:
      bucket: datalchemy-datatest
      cred: | JSON
  • OS (e.g. from /etc/os-release):
    Debian
cat /etc/debian_version 
10.13
  • Others:

Your help would be greatly appreciated. Thank you in advance!

@BOsterbuhr
Copy link
Contributor

Thank you @amandinesoub for opening this issue. I see you are currently on an unsupported version of pachyderm (2.0.6) as a first step can you please upgrade to the latest version of pachyderm (2.4.0).
You should not have to run pachctl start pipeline either. When you put new data into the input repo the pipeline will automatically start.

@amandinesoub
Copy link
Author

Thank you for your reply. We have taken your advice and updated the version of pachyderm. It seems to work fine.
Best, Amandine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants