Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA’s not allowed #4

Open
mpascariu opened this issue Aug 2, 2018 · 1 comment
Open

NA’s not allowed #4

mpascariu opened this issue Aug 2, 2018 · 1 comment

Comments

@mpascariu
Copy link
Owner

Testing the ungroup package in the Bangladesh data.
Issue raised by Jonas Scholey in January 22, 2018.

ug <- pclm2D(x = bang_Dx_2d$age, y = bang_Dx_2d[,-1],
                         offset = bang_Nx_2d[,-1], nlast = 20)
##  Error: 'y' contains NA values

NA’s not allowed. The problem with the Bangladesh data is that the last age group is not the same across the years, e.g. 1975 to 1981 the last observed age is 65+, 1982 to 1983 its 80+ and after that its 85+. The varying open age is the source of NA’s in higher ages.

Right now my only option would be to truncate all life-tables to have a last age group of 65+ and then run the 2D plcm on these truncated life-tables. Not good as I throw away data this way.

I’m not sure about the proper solution. I know that PCLM needs a rectangular surface, so part of the
solution would be to have a fixed omega (closing age of life-table) instead of a fixed nlast parameter. This ensures that even if the starting age of the last age group varies, the upper limit of the last age-group will not. Therefore I propose you have an omega (closing age of life-table) parameter instead of a nlast parameter.

So one solution would be for PCLM to recode any series of NAs over the last ages to 0s and redistribute the last observed Dx and Nx from [last observed x, omega]. You’d still have to think about how to deal with a cohort that dies out before reaching the last age group. . .

More generally, I strongly prefer an na.rm option as present in many base R functions. As PCLM is a smoother, it may very well be used to smooth over missing values. Missings due to varying last-age groups are tricky, but missings in between are not, you can simply smooth over them.

Ok. But for now let’s stick to truncating my data. The data is truncated so that the last age group is 65 across all periods. Accordingly I choose a higher nlast.

age_trunc <- bang_Dx_2d$age[1:18]
anyNA(age_trunc)
## [1] FALSE

bang_Dx_2d_trunc <- bang_Dx_2d[1:18, -1]
anyNA(bang_Dx_2d_trunc)
## [1] FALSE

bang_Nx_2d_trunc <- bang_Nx_2d[1:18, -1]
anyNA(bang_Nx_2d_trunc)
## [1] FALSE

ug <- pclm2D(x = age_trunc, y = bang_Dx_2d_trunc,
offset = bang_Nx_2d_trunc, nlast = 40)
## Ungrouping offset Ungrouping data
## Error in if ((d < tol || dd < 0.1) && it >= 4) break: missing value where TRUE/FALSE needed

Don’t know what went wrong here.

@mpascariu
Copy link
Owner Author

The last example is working now:

> ug <- pclm2D(x = age_trunc, y = bang_Dx_2d_trunc,
+              offset = bang_Nx_2d_trunc, nlast = 40)
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 26s   Ungrouping data  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant