AWS S3 source support for configuring CSV format delimiter during profiling #10367

esselius · 2024-04-24T11:35:30Z

I'm trying to ingest metadata from CSV-files using ; as delimiter, but the source is hard coded to use ,

datahub/metadata-ingestion/src/datahub/ingestion/source/s3/source.py

Lines 363 to 372 in ec21b01

    
           elif ext.endswith(".csv"): 
        
               # see https://sparkbyexamples.com/pyspark/pyspark-read-csv-file-into-dataframe 
        
               df = self.spark.read.csv( 
        
                   file, 
        
                   header="True", 
        
                   inferSchema="True", 
        
                   sep=",", 
        
                   ignoreLeadingWhiteSpace=True, 
        
                   ignoreTrailingWhiteSpace=True, 
        
               )

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-25T01:49:58Z

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

esselius added the bug Bug report label Apr 24, 2024

github-actions bot added the stale label May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS S3 source support for configuring CSV format delimiter during profiling #10367

AWS S3 source support for configuring CSV format delimiter during profiling #10367

esselius commented Apr 24, 2024

github-actions bot commented May 25, 2024

AWS S3 source support for configuring CSV format delimiter during profiling #10367

AWS S3 source support for configuring CSV format delimiter during profiling #10367

Comments

esselius commented Apr 24, 2024

github-actions bot commented May 25, 2024