You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the domain_mapping() function only allows 1 to 1 mapping, when reading what to auto categorise based on the lookup file. If attempting to map a variable to multiple domains using the lookup file, the function behaves buggy. For example, tested with following rows in look up file:
AVAIL_FROM_DT,Health,8
AVAIL_FROM_DT, Unsure,0
which resulted in function asking to manually classify the AVAIL_FROM_DT variable (and this is what shows up in the log). Both rows in the lookup file are essentially ignored, but function still assumes the variable as being auto-classified with the sanity check at the end:
! Please check the auto categorised data elements are accurate:
[1] DataClass DataElement Domain_code
<0 rows> (or 0-length row.names)
Either the functionality can be removed all together and added in readme docs, or can have functionality to allow mapping of variable to multiple domains.
Quick fix for now
The README file mentions that the lookup file can only deal with 1:1 mappings (i.e. do not list the same DataElement on more than one row in the lookup file). This is fine for now - as the purpose of the lookup file is to categorise variables into pre-defined domains that do not overlap ([0] NO MATCH / UNSURE, [1] METADATA, [2] ALF ID, [3] OTHER ID, [4] DEMOGRAPHICS). If the user provided their own lookup file, this is where the bug could become relevant. One to many mappings are still possible if DataElements are not included in the lookup file
The text was updated successfully, but these errors were encountered:
RayStick
changed the title
Allowing for one to many mappings (variables to domains)
Allowing for one to many mappings in the lookup (variables to domains)
Apr 2, 2024
RayStick
changed the title
Allowing for one to many mappings in the lookup (variables to domains)
Allowing for one to many mappings via the lookup (variables to domains)
Apr 2, 2024
RayStick
changed the title
Allowing for one to many mappings via the lookup (variables to domains)
Allowing for one to many mappings (variables to domains) via the lookup
Apr 2, 2024
I think the quick fix (explained above) is okay so I am closing this issue. Thanks for raising it @Rainiefantasy, as it led me to changing the README file to make it clear how it should be made. If you think not, please let me know and I will re-open :D
Currently the domain_mapping() function only allows 1 to 1 mapping, when reading what to auto categorise based on the lookup file. If attempting to map a variable to multiple domains using the lookup file, the function behaves buggy. For example, tested with following rows in look up file:
which resulted in function asking to manually classify the AVAIL_FROM_DT variable (and this is what shows up in the log). Both rows in the lookup file are essentially ignored, but function still assumes the variable as being auto-classified with the sanity check at the end:
Either the functionality can be removed all together and added in readme docs, or can have functionality to allow mapping of variable to multiple domains.
Quick fix for now
The README file mentions that the lookup file can only deal with 1:1 mappings (i.e. do not list the same DataElement on more than one row in the lookup file). This is fine for now - as the purpose of the lookup file is to categorise variables into pre-defined domains that do not overlap ([0] NO MATCH / UNSURE, [1] METADATA, [2] ALF ID, [3] OTHER ID, [4] DEMOGRAPHICS). If the user provided their own lookup file, this is where the bug could become relevant. One to many mappings are still possible if DataElements are not included in the lookup file
The text was updated successfully, but these errors were encountered: