Directory path collision error - pipeline that fails #8383

amandinesoub · 2022-11-22T14:04:25Z

What happened?:

We have implemented a data architecture based on Pachyderm. Our first pipeline (called textblocks), having already processed a large number of files (27402 files), is completely blocked when we try to add a new file to be processed.

More specifically, the strange behaviour observed is as follows :

When we start the "textblocks" pipeline (pachctl start pipeline textblocks) we have all the pods created and the pipeline goes into running status.

pachctl list pipeline

NAME                  VERSION INPUT                                                                                 CREATED      STATE / LAST JOB  DESCRIPTION   
textblocks            1       dump_file:/**.pdf                                                                     5 weeks ago  running / success description

kubectl get pods

NAME                           READY   STATUS    RESTARTS   AGE
etcd-0                         1/1     Running   0          26h
pachd-55f54bb966-ntfhk         1/1     Running   0          26h
pg-bouncer-7b855cb797-zzj4q    1/1     Running   0          26h
pipeline-textblocks-v1-98tjl   2/2     Running   0          126m
postgres-0                     1/1     Running   0          26h

The pipeline, having already processed the files that have been put in for a while, automatically starts to run and is quickly put in success status, as no new files are to be processed - so far so good.

With the pipeline still on, we add just one new file to process, with a port-forward enabled. Then the pipeline goes back into status run, and after a few moments goes into failure status. Looking at the available pods logs, we see a somewhat specific error, which seems to have caused the pipeline to fail to process. This error appears to be "error":"file / directory path collision (/PARIS/2019/PARIS_2019_02_5fOdx.pdf)".

kubectl logs pipeline-textblocks-v1-98tjl --all-containers | grep error

{"pipelineName":"textblocks","workerId":"pipeline-textblocks-v1-98tjl","master":true,"ts":"2022-11-22T11:25:33.888003325Z","message":"errored transform spawner process: rpc error: code = Unknown desc = cannot clear finished commit"}
{"pipelineName":"textblocks","workerId":"pipeline-textblocks-v1-98tjl","master":true,"ts":"2022-11-22T11:25:33.892265167Z","message":"master: error running the master process, retrying in 11.04660288s: rpc error: code = Unknown desc = cannot clear finished commit"}
2022-11-22T11:25:33Z INFO pfs.API.InspectCommit {"duration":0.73315825,"request":{"commit":{"branch":{"repo":{"name":"dump_file","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"wait":4},"response":{"commit":{"branch":{"repo":{"name":"dump_file","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"origin":{"kind":1},"parent_commit":{"branch":{"repo":{"name":"dump_file","type":"user"},"name":"master"},"id":"c886358946a54b3287e77d60a7fd6d67"},"started":{"seconds":1669116332,"nanos":866730000},"finishing":{"seconds":1669116332,"nanos":866730000},"finished":{"seconds":1669116333,"nanos":844833000},"error":"file / directory path collision (/PARIS/2019/PARIS_2019_02_5fOdx.pdf)","size_bytes_upper_bound":3796419426,"details":{"size_bytes":3796419426,"compacting_time":{"nanos":376350981},"validating_time":{"nanos":270155412}}}} 
2022-11-22T11:25:33Z INFO transaction.API.BatchTransaction {"request":{"requests":[{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"meta"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"update_job_state":{"job":{"pipeline":{"name":"textblocks"},"id":"76ac31a145d040fcaff5923058c43eb0"},"state":4,"reason":"inputs failed: dump_file","stats":{}}}]}} 
2022-11-22T11:25:33Z INFO transaction.API.BatchTransaction {"duration":0.01107051,"request":{"requests":[{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"meta"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"update_job_state":{"job":{"pipeline":{"name":"textblocks"},"id":"76ac31a145d040fcaff5923058c43eb0"},"state":4,"reason":"inputs failed: dump_file","stats":{}}}]},"response":{"transaction":{"id":"275b4daeb0ce4640926309636307d743"},"requests":[{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"meta"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"finish_commit":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"error":"inputs failed: dump_file","force":true}},{"update_job_state":{"job":{"pipeline":{"name":"textblocks"},"id":"76ac31a145d040fcaff5923058c43eb0"},"state":4,"reason":"inputs failed: dump_file","stats":{}}}],"responses":[{},{},{}],"started":{"seconds":1669116333,"nanos":857966665}}} 
2022-11-22T11:25:33Z INFO pfs.API.InspectCommit {"duration":0.005549267,"request":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"wait":1},"response":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"},"origin":{"kind":2},"parent_commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"c886358946a54b3287e77d60a7fd6d67"},"started":{"seconds":1669116332,"nanos":866730000},"finishing":{"seconds":1669116333,"nanos":856609000},"direct_provenance":[{"repo":{"name":"dump_file","type":"user"},"name":"master"},{"repo":{"name":"textblocks","type":"spec"},"name":"master"}],"error":"inputs failed: dump_file"}} 
2022-11-22T11:25:33Z ERROR pfs.API.ClearCommit {"duration":0.002795112,"error":"cannot clear finished commit","request":{"commit":{"branch":{"repo":{"name":"textblocks","type":"user"},"name":"master"},"id":"76ac31a145d040fcaff5923058c43eb0"}},"stack":["github.com/pachyderm/pachyderm/v2/src/server/pfs/server.(*driver).clearCommit\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/server/pfs/server/driver.go:1608","github.com/pachyderm/pachyderm/v2/src/server/pfs/server.(*apiServer).ClearCommit\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/server/pfs/server/api_server.go:306","github.com/pachyderm/pachyderm/v2/src/server/pfs/server.(*validatedAPIServer).ClearCommit\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/server/pfs/server/val_server.go:126","github.com/pachyderm/pachyderm/v2/src/pfs._API_ClearCommit_Handler.func1\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/pfs/pfs.pb.go:5109","github.com/pachyderm/pachyderm/v2/src/internal/middleware/auth.(*Interceptor).InterceptUnary\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/internal/middleware/auth/interceptor.go:292","google.golang.org/grpc.getChainUnaryHandler.func1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:921","github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1\n\t/Users/avigil/go/pkg/mod/github.com/opentracing-contrib/go-grpc@v0.0.0-20180928155321-4b5a12d3ff02/server.go:44","google.golang.org/grpc.getChainUnaryHandler.func1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:921","github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1\n\t/Users/avigil/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.1-0.20191002090509-6af20e3a5340/server_metrics.go:108","google.golang.org/grpc.chainUnaryServerInterceptors.func1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:907","github.com/pachyderm/pachyderm/v2/src/pfs._API_ClearCommit_Handler\n\t/Users/avigil/Pachyderm/Pachyderm_releases_2.x/pachyderm/src/pfs/pfs.pb.go:5111","google.golang.org/grpc.(*Server).processUnaryRPC\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1082","google.golang.org/grpc.(*Server).handleStream\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405","google.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/Users/avigil/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"]} 
2022-11-22T11:25:33Z ERROR pps.API.SubscribeJob {"duration":169.124623479,"error":"context canceled","request":{"pipeline":{"name":"textblocks"},"details":true},"stack":null}

Although I have tried to delete the file (/PARIS/2019/PARIS_2019_02_5fOdx.pdf) cited in the error in the logs, this does not seem to resolve the pipeline. Indeed, a new log appears, similar to the previous one, but specifying another file name.
Having checked, however, we have no duplicate files that have been added to the architecture. Therefore, no path collision errors should appear, in all logic.

What you expected to happen?:

That the pipeline runs smoothly and processes the newly added file without any problems, and switches to success status.

How to reproduce it (as minimally and precisely as possible)?:
No idea ... really sorry about that.

Environment?:

Kubernetes version (use kubectl version):

kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:35:46Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8-gke.1900", GitCommit:"79209257257c051b27df67c567755783eda93353", GitTreeState:"clean", BuildDate:"2022-07-15T09:23:51Z", GoVersion:"go1.17.11b7", Compiler:"gc", Platform:"linux/amd64"}

Pachyderm CLI and pachd server version (use pachctl version):

pachctl version

COMPONENT           VERSION             
pachctl             2.0.6               
pachd               2.0.6

Cloud provider (e.g. aws, azure, gke) or local deployment (e.g. minikube vs dockerized k8s):
GKE
If you deployed with helm, the values you used (helm get values pachyderm):

helm get values pachyderm

USER-SUPPLIED VALUES:
deployTarget: GOOGLE
pachd:
  externalService:
    enabled: true
  storage:
    google:
      bucket: datalchemy-datatest
      cred: | JSON

OS (e.g. from /etc/os-release):
Debian

cat /etc/debian_version 
10.13

Others:

Your help would be greatly appreciated. Thank you in advance!

The text was updated successfully, but these errors were encountered:

BOsterbuhr · 2022-11-22T14:54:36Z

Thank you @amandinesoub for opening this issue. I see you are currently on an unsupported version of pachyderm (2.0.6) as a first step can you please upgrade to the latest version of pachyderm (2.4.0).
You should not have to run pachctl start pipeline either. When you put new data into the input repo the pipeline will automatically start.

amandinesoub · 2023-02-13T09:39:51Z

Thank you for your reply. We have taken your advice and updated the version of pachyderm. It seems to work fine.
Best, Amandine

amandinesoub added the bug label Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directory path collision error - pipeline that fails #8383

Directory path collision error - pipeline that fails #8383

amandinesoub commented Nov 22, 2022

BOsterbuhr commented Nov 22, 2022

amandinesoub commented Feb 13, 2023

Directory path collision error - pipeline that fails #8383

Directory path collision error - pipeline that fails #8383

Comments

amandinesoub commented Nov 22, 2022

BOsterbuhr commented Nov 22, 2022

amandinesoub commented Feb 13, 2023