Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to treat variable as continuous measure #109

Open
sgummidipundi opened this issue Dec 27, 2020 · 1 comment
Open

Unable to treat variable as continuous measure #109

sgummidipundi opened this issue Dec 27, 2020 · 1 comment

Comments

@sgummidipundi
Copy link

Hello! Would just like to say fantastic package and great syntax for the function.

I seem to be having an issue with creating a table with continuous values. I'm sure I am probably doing something incorrectly on my end since it is basic functionality. When I try to do an easy example with a single continuous variable I get an output like below:

image

It is odd because clearly it is reading it as non-normal as I have specified (as indicated by the 'median [Q1, Q3]) but it seems to only give counts and frequencies, essentially treating it as categorical. I have also verified that the variable is of type float64. Is there any suggestions on how I can proceed and have it treat it as a continuous measure?

Thanks in advance

@tompollard
Copy link
Owner

Hi @sgummidipundi, you've raised a good point, which is that there is no "continuous" argument. At the moment, tableone expects you to define the categorical variables using the "categorical" argument. Anything else is then treated as continuous. I can see how this is confusing, especially when (as in your case) there are no categorical variables.

If you don't specify which variables are categorical, then then tableone attempts to guess (and, from your example, clearly doesn't do a great job!). In your example, you would need to provide an empty categorical argument. I've tried to recreate the example below:

1. Generate sample data

# import packages
import pandas as pd
import tableone
# create sample dataframe
x = ([0.0] * 41639 + 
     [0.2] * 3 +
     [0.25] * 1 +
     [1] * 3 +
     [10] * 806 +
     [100] * 816 +
     [1000] * 1488 +
     [10000] * 57 +
     [100000] * 3 +
     [11000] * 2 +
     [117000] * 7 +
     [12] * 1 +
     [1200] * 267 +
     [12000] * 51)

data = pd.DataFrame(x, columns=["x"])

2. Create summary table, allowing tableone to guess the data type

Based on the large number of observations and the limited number of unique values, tableone (incorrectly!) guesses that x is categorical

t1 = tableone.tableone(data)
print(t1.tabulate(tablefmt = "github"))
Missing Overall
n 45144
x, n (%) 0.0 0 41639 (92.2)
0.2 3 (0.0)
0.25 1 (0.0)
1.0 3 (0.0)
10.0 806 (1.8)
100.0 816 (1.8)
1000.0 1488 (3.3)
10000.0 57 (0.1)
100000.0 3 (0.0)
11000.0 2 (0.0)
117000.0 7 (0.0)
12.0 1 (0.0)
1200.0 267 (0.6)
12000.0 51 (0.1)

3. Create summary table with the categorical argument

t2 = tableone.tableone(data, categorical=[])
print(t2.tabulate(tablefmt = "github"))
Missing Overall
n 45144
x, mean (SD) 0 93.5 (1764.8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants