Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not convert string to float: 'data' #2

Open
multimeric opened this issue Apr 5, 2019 · 14 comments · May be fixed by #4
Open

could not convert string to float: 'data' #2

multimeric opened this issue Apr 5, 2019 · 14 comments · May be fixed by #4

Comments

@multimeric
Copy link

multimeric commented Apr 5, 2019

I'm also getting the following error when I try to create a plot:

$ chronqc plot -o chromqc/ chronqc_db/chronqc.stats.sqlite AshTrio chronqc_db/chronqc.default.json -f                                                                   
Started ChronQC                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 211, in _prep_values                                                                                                            
    values = ensure_float64(values)                                                                                                                                                                         
  File "pandas/_libs/algos_common_helper.pxi", line 311, in pandas._libs.algos.ensure_float64                                                                                                               
ValueError: could not convert string to float: 'data'                                                                                                                                                       
                                                                                                                                                                                                            
During handling of the above exception, another exception occurred:                                                                                                                                         
                                                                                                                                                                                                            
Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/local/bin/chronqc", line 11, in <module>                                                                                                                                                       
    sys.exit(main())                                                                                                                                                                                        
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main                                                                                                                       
    args.func(args)                                                                                                                                                                                         
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot                                                                                                                    
    chronqc_plot.main(args)                                                                                                                                                                                 
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main                                                                                                                  
    df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)                                                                                                                              
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev                                                                                                        
    df_dup_all = rolling_mean(df_dup_all, Duplicates, win)                                                                                                                                                  
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean                                                                                                          
    df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]                                                                                                                                
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1728, in mean                                                                                                                   
    return super(Rolling, self).mean(*args, **kwargs)                                                                                                                                                       
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1072, in mean                                                                                                                   
    return self._apply('roll_mean', 'mean', **kwargs)                                                                                                                                                       
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 841, in _apply                                                                                                                  
    values = self._prep_values(b.values)                                                                                                                                                                    
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 214, in _prep_values                                                                                                            
    "".format(values.dtype))                                                                                                                                                                                
TypeError: cannot handle this type -> object                     

My input files are:

$ cat chronqc_db/chronqc.default.json
[
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-avg_sequence_length"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_duplicates"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_fails"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_gc"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-total_sequences"
        }
    }
]
$ sqlite3 -cmd 'SELECT * from chronqc_stats_data;' chronqc_db/chronqc.stats.sqlite 
FastQC|all_sections|NIST7086_CGTACTAG_L002_R2_001|/mnt/data/TD01-GV1001_L2_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L2_R2/fastqc_data.txt|2019-01-22 00:00:00|data|95.229524961251|28.1154060602541|25.0|49.0|21523781.0|AshTrio
FastQC|all_sections|TD01-GV1001_L2.R1|/mnt/data/TD01-GV1001_L1_R1/fastqc_data.txt|/mnt/data/TD01-GV1001_L1_R1/fastqc_data.txt|2019-01-22 00:00:00|data|97.0165501126405|29.1116572397277|25.0|49.0|21523781.0|AshTrio
FastQC|all_sections|TD01-GV1001_L3.R2|/mnt/data/TD01-GV1001_L3_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L3_R2/fastqc_data.txt|2019-01-22 00:00:00|data|94.9493237371004|27.7428646507917|25.0|49.0|19865573.0|AshTrio
FastQC|all_sections|TD01-GV1001_L1.R2|/mnt/data/TD01-GV1001_L1_R2/fastqc_data.txt|/mnt/data/TD01-GV1001_L1_R2/fastqc_data.txt|2019-01-22 00:00:00|data|95.2768624146094|27.794053686974|25.0|49.0|21168890.0|AshTrio
FastQC|all_sections|TD01-GV1001_L3.R1|/mnt/data/TD01-GV1001_L3_R1/fastqc_data.txt|/mnt/data/TD01-GV1001_L3_R1/fastqc_data.txt|2019-01-22 00:00:00|data|96.7453750264339|28.8361458354103|25.0|49.0|19865573.0|AshTrio
FastQC|all_sections|TD06-GV1010_R2|/mnt/data/TD06-GV1010_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1010_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|22.5016825110702|16.6666666666667|48.0|73074989.0|AshTrio
FastQC|all_sections|TD06-GV1009_R2|/mnt/data/TD06-GV1009_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1009_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|21.3620029114608|16.6666666666667|48.0|64304319.0|AshTrio
FastQC|all_sections|TD06-GV1008_R2|/mnt/data/TD06-GV1008_R2_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1008_R2_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|23.5528852817332|16.6666666666667|48.0|75193388.0|AshTrio
FastQC|all_sections|TD06-GV1009_R1|/mnt/data/TD06-GV1009_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1009_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|21.1391622609802|16.6666666666667|48.0|64304319.0|AshTrio
FastQC|all_sections|TD06-GV1008_R1|/mnt/data/TD06-GV1008_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1008_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|23.5838138955758|16.6666666666667|48.0|75193388.0|AshTrio
FastQC|all_sections|TD06-GV1010_R1|/mnt/data/TD06-GV1010_R1_fastqc/fastqc_data.txt|/mnt/data/TD06-GV1010_R1_fastqc/fastqc_data.txt|2019-01-25 00:00:00|data|126.0|22.3713007497403|16.6666666666667|48.0|73074989.0|AshTrio
@multimeric
Copy link
Author

It seems to relate to df_dup_all.rolling(win).mean() and how that expects all the columns to be floats perhaps? It's quite strange.

@nilesh-tawari
Copy link
Owner

@TMiguelT can you print the headers of chronqc_stats_data as well?

@nilesh-tawari
Copy link
Owner

Seems you are trying to plot column with all values as data in your example, hence it is showing error.

@multimeric
Copy link
Author

Do you mean the SQLite3 database headers?

@nilesh-tawari
Copy link
Owner

@TMiguelT yes

@multimeric
Copy link
Author

multimeric commented Apr 8, 2019

Okay let me run through the steps I performed (after running multiQC):

Successfully created the database:

$ chronqc database --create --run-date-info run_date_info.csv -o . multiqc/multiqc_data/multiqc_general_stats.txt AshTrio                                               
Running ChronQC |##################################################| 100.0%                                                                                                                                 
Created ChronQC db: /mnt/chronqc_db/chronqc.stats.sqlite with 6 records                                                                                                                                     
Created ChronQC default JSON file: /mnt/chronqc_db/chronqc.default.json. Customize the JSON as needed before generating ChronQC plots. 

Then immediately after, I tried to plot (I didn't modify the JSON or the database in any way):

$ chronqc plot -o . chronqc_db/chronqc.stats.sqlite AshTrio chronqc_db/chronqc.default.json -f                                                                          
Started ChronQC                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                          
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 211, in _prep_values                                                                                                            
    values = ensure_float64(values)                                                                                                                                                                         
  File "pandas/_libs/algos_common_helper.pxi", line 311, in pandas._libs.algos.ensure_float64                                                                                                               
ValueError: could not convert string to float: 'Run-1'                                                                                                                                                      
                                                                                                                                                                                                            
During handling of the above exception, another exception occurred:                                                                                                                                         

Traceback (most recent call last):
  File "/usr/local/bin/chronqc", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
    args.func(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
    chronqc_plot.main(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
    df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
    df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
    df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1728, in mean
    return super(Rolling, self).mean(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1072, in mean
    return self._apply('roll_mean', 'mean', **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 841, in _apply
    values = self._prep_values(b.values)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 214, in _prep_values
    "".format(values.dtype))
TypeError: cannot handle this type -> object

Here's the SQLite table with headers:

$ sqlite3 chronqc_db/chronqc.stats.sqlite -header 'select * from chronqc_stats_data'
Sample|Run|Date|FastQC_mqc-generalstats-fastqc-avg_sequence_length|FastQC_mqc-generalstats-fastqc-percent_duplicates|FastQC_mqc-generalstats-fastqc-percent_fails|FastQC_mqc-generalstats-fastqc-percent_gc|FastQC_mqc-generalstats-fastqc-total_sequences|Panel
TD06-GV1008_R1|Run-1|2018-01-01 00:00:00|126.0|23.5838138955758|16.6666666666667|48.0|75193388.0|AshTrio
TD06-GV1008_R2|Run-1|2018-01-01 00:00:00|126.0|23.5528852817332|16.6666666666667|48.0|75193388.0|AshTrio
TD06-GV1009_R1|Run-1|2018-02-01 00:00:00|126.0|21.1391622609802|16.6666666666667|48.0|64304319.0|AshTrio
TD06-GV1009_R2|Run-1|2018-02-01 00:00:00|126.0|21.3620029114608|16.6666666666667|48.0|64304319.0|AshTrio
TD06-GV1010_R1|Run-1|2018-03-01 00:00:00|126.0|22.3713007497403|16.6666666666667|48.0|73074989.0|AshTrio
TD06-GV1010_R2|Run-1|2018-03-01 00:00:00|126.0|22.5016825110702|16.6666666666667|48.0|73074989.0|AshTrio

@multimeric
Copy link
Author

And here's the database as a more readable table:

Sample Run Date FastQC_mqc-generalstats-fastqc-avg_sequence_length FastQC_mqc-generalstats-fastqc-percent_duplicates FastQC_mqc-generalstats-fastqc-percent_fails FastQC_mqc-generalstats-fastqc-percent_gc FastQC_mqc-generalstats-fastqc-total_sequences Panel
TD06-GV1008_R1 Run-1 2018-01-01 00:00:00 126.0 23.5838138955758 16.6666666666667 48.0 75193388.0 AshTrio
TD06-GV1008_R2 Run-1 2018-01-01 00:00:00 126.0 23.5528852817332 16.6666666666667 48.0 75193388.0 AshTrio
TD06-GV1009_R1 Run-1 2018-02-01 00:00:00 126.0 21.1391622609802 16.6666666666667 48.0 64304319.0 AshTrio
TD06-GV1009_R2 Run-1 2018-02-01 00:00:00 126.0 21.3620029114608 16.6666666666667 48.0 64304319.0 AshTrio
TD06-GV1010_R1 Run-1 2018-03-01 00:00:00 126.0 22.3713007497403 16.6666666666667 48.0 73074989.0 AshTrio
TD06-GV1010_R2 Run-1 2018-03-01 00:00:00 126.0 22.5016825110702 16.6666666666667 48.0 73074989.0 AshTrio

@nilesh-tawari
Copy link
Owner

You need to customize the JSON, in your first example above you were trying to plot column with 'data' string, and in your example above you are plotting "Run" column with all values as "Run-1". Both strings 'data' and 'Run-1' are causing the errors. To customize the JSON simply see what you are plotting "y-value" and make sure it is "numerical value". Hope this helps.

@multimeric
Copy link
Author

multimeric commented Apr 8, 2019

But my JSON is fine. All of the y_value fields are numerical. I don't mention the Run field anywhere:

$ cat chronqc_db/chronqc.default.json 
[
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-avg_sequence_length"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_duplicates"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_fails"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-percent_gc"
        }
    },
    {
        "table_name": "chronqc_stats_data",
        "chart_type": "time_series_with_mean_and_stdev",
        "chart_properties": {
            "y_value": "FastQC_mqc-generalstats-fastqc-total_sequences"
        }
    }
]

@multimeric
Copy link
Author

I'm happy to give you the files if it would help debug the issue

@multimeric
Copy link
Author

I'm getting this same issue with the example files here: https://github.com/nilesh-tawari/ChronQC/tree/master/examples/multiqc_example_1.

From within examples/multiqc_example_1, I run:

 $ chronqc database --create --run-date-info run_date_info.csv -o . multiqc_data/multiqc_general_stats.txt SOMATIC
Running ChronQC |##################################################| 100.0% 
Created ChronQC db: /home/michael/Programming/ChronQC/examples/multiqc_example_1/chronqc_db/chronqc.stats.sqlite with 100 records
Created ChronQC default JSON file: /home/michael/Programming/ChronQC/examples/multiqc_example_1/chronqc_db/chronqc.default.json. Customize the JSON as needed before generating ChronQC plots.

$ chronqc plot -o . chronqc_db/chronqc.stats.sqlite SOMATIC chronqc_db/chronqc.default.json
Started ChronQC
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 222, in _prep_values
    values = _ensure_float64(values)
  File "pandas/_libs/algos_common_helper.pxi", line 3182, in pandas._libs.algos.ensure_float64
  File "pandas/_libs/algos_common_helper.pxi", line 3187, in pandas._libs.algos.ensure_float64
ValueError: could not convert string to float: 'CHH13847 (15.7), CHH13848 (16.18), CHH13846 (15.21), CHH13849 (16.8)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/chronqc", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 168, in main
    args.func(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc.py", line 21, in run_plot
    chronqc_plot.main(args)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 676, in main
    df_chart = mean_and_stdev(df, column_name, win=win, per_sample=per_sample)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 218, in mean_and_stdev
    df_dup_all = rolling_mean(df_dup_all, Duplicates, win)
  File "/usr/local/lib/python3.6/dist-packages/chronqc/chronqc_plot.py", line 186, in rolling_mean
    df_dup_all['mean'] = df_dup_all.rolling(win).mean().round(2)[Duplicates]
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1605, in mean
    return super(Rolling, self).mean(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 1058, in mean
    return self._apply('roll_mean', 'mean', **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 844, in _apply
    values = self._prep_values(b.values)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/window.py", line 225, in _prep_values
    "".format(values.dtype))
TypeError: cannot handle this type -> object

@multimeric
Copy link
Author

I've investigated this some more, and the last version of pandas this works with is 0.22.0. If you use 0.23.0 or higher, this error will occur.

@multimeric multimeric linked a pull request Jun 25, 2019 that will close this issue
@nilesh-tawari
Copy link
Owner

Thank you @TMiguelT for investigating it I will look at it and fix to work with updated pandas.

@multimeric
Copy link
Author

multimeric commented Oct 24, 2019

Excellent, that would be a much better solution than my version pinning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants