Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to enable PKChunking #124

Open
charithlw opened this issue Sep 13, 2022 · 4 comments
Open

Error when trying to enable PKChunking #124

charithlw opened this issue Sep 13, 2022 · 4 comments
Assignees
Labels
bug Unintended behavior that should be corrected
Milestone

Comments

@charithlw
Copy link

Issue description

Hi, I'm running into an issue trying to enable PKChunking on a query I'm making to SF. I am referring to your answer on SO here but I keep getting the error message Error in if (grepl("/services/data/v[0-9]{2}.[0-9]{1}/jobs/ingest", url)) { : the condition has length > 1.

Do you have any idea what might be going wrong?

Have included the query, verbose output and session info. I'm new to salesforce and your package, so please let me know what other information might help you troubleshoot this issue.

Query

campaign_member_query_test <- glue(
  "
  SELECT Campaign.Business_Unit_Affiliation__c,
    Campaign.Product__c,
    CM_Contact_CSID__c,
    Contact_Has_Test_Flag__c,
    Event_Registration__c,
    Event_Registration_Date__c,
    Event_Amount_Raised__c,
    RC_Registrant_Status__c,
    RC_RegistrationDate__c
  FROM CampaignMember
  "
)

campaign_members <- sf_query(soql = campaign_member_query_test,
                             object_name = "CampaignMember",
                             api_type = "Bulk 1.0",
                             PKChunkingHeader = list(`Sforce-Enable-PKChunking` = TRUE),
                             interval_seconds = 10,
                             max_attempts = 400, verbose = TRUE)

Verbose output

--HTTP Request----------------
POST https://ncca-ltd.my.salesforce.com/services/async/54.0/job
--Headers---------------------
Accept: application/xml; Content-Type: application/xml; Sforce-Enable-PKChunking: TRUE; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--Body------------------------
<?xml version="1.0" encoding="UTF-8"?>
<jobInfo xmlns="http://www.force.com/2009/06/asyncapi/dataload">
  <operation>query</operation>
  <object>CampaignMember</object>
  <concurrencyMode>Parallel</concurrencyMode>
  <contentType>CSV</contentType>
</jobInfo>


--HTTP Request----------------
POST https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: text/csv; charset=UTF-8; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
--Body------------------------
Uploaded TXT file: C:\Users\Charith\AppData\Local\Temp\RtmpuGaHBg\file5e7c3a9e62bd
Attempt #1

--HTTP Request----------------
GET https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch/7512x00000JkKEaAAN
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: ; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca

--HTTP Request----------------
GET https://ncca-ltd.my.salesforce.com/services/async/54.0/job/7502x00000EnbMjAAJ/batch
--Headers---------------------
Accept: application/json, text/xml, application/xml, */*; Content-Type: ; X-SFDC-Session: 00D28000001KHkz!AQsAQJy6fp88C4qo6pARBwqxJTBXESEt1fI4TX7p.lWSM2kB_QjOMb_TsK.sOVrAFy1byzSsMbJXk3A7qHCsOeoopqORRwca
Attempt #1
Error in if (grepl("/services/data/v[0-9]{2}.[0-9]{1}/jobs/ingest", url)) { : 
  the condition has length > 1

Session Info

Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23 ucrt)
 os       Windows 10 x64 (build 22000)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_Australia.utf8
 ctype    English_Australia.utf8
 tz       Australia/Sydney
 date     2022-09-13
 rstudio  2022.07.1+554 Spotted Wakerobin (desktop)
 pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version   date (UTC) lib source
 afp.data    * 0.1.0     2022-09-07 [1] local
 anytime       0.3.9     2020-08-27 [1] CRAN (R 4.2.1)
 askpass       1.1       2019-01-13 [1] CRAN (R 4.2.1)
 assertthat    0.2.1     2019-03-21 [1] CRAN (R 4.2.1)
 base64enc     0.1-3     2015-07-28 [1] CRAN (R 4.2.0)
 bit           4.0.4     2020-08-04 [1] CRAN (R 4.2.1)
 bit64         4.0.5     2020-08-30 [1] CRAN (R 4.2.1)
 cachem        1.0.6     2021-08-19 [1] CRAN (R 4.2.1)
 callr         3.7.0     2021-04-20 [1] CRAN (R 4.2.1)
 cli           3.3.0     2022-04-25 [1] CRAN (R 4.2.1)
 crayon        1.5.1     2022-03-26 [1] CRAN (R 4.2.1)
 curl          4.3.2     2021-06-23 [1] CRAN (R 4.2.1)
 data.table    1.14.2    2021-09-27 [1] CRAN (R 4.2.1)
 DBI           1.1.3     2022-06-18 [1] CRAN (R 4.2.1)
 devtools      2.4.3     2021-11-30 [1] CRAN (R 4.2.1)
 digest        0.6.29    2021-12-01 [1] CRAN (R 4.2.1)
 dplyr       * 1.0.9     2022-04-28 [1] CRAN (R 4.2.1)
 ellipsis      0.3.2     2021-04-29 [1] CRAN (R 4.2.1)
 evaluate      0.15      2022-02-18 [1] CRAN (R 4.2.1)
 fansi         1.0.3     2022-03-24 [1] CRAN (R 4.2.1)
 fastmap       1.1.0     2021-01-25 [1] CRAN (R 4.2.1)
 fs            1.5.2     2021-12-08 [1] CRAN (R 4.2.1)
 generics      0.1.2     2022-01-31 [1] CRAN (R 4.2.1)
 glue        * 1.6.2     2022-02-24 [1] CRAN (R 4.2.1)
 hms           1.1.1     2021-09-26 [1] CRAN (R 4.2.1)
 htmltools     0.5.2     2021-08-25 [1] CRAN (R 4.2.1)
 httpuv        1.6.5     2022-01-05 [1] CRAN (R 4.2.1)
 httr          1.4.3     2022-05-04 [1] CRAN (R 4.2.1)
 janitor       2.1.0     2021-01-05 [1] CRAN (R 4.2.1)
 jsonlite      1.8.0     2022-02-22 [1] CRAN (R 4.2.1)
 knitr         1.39      2022-04-26 [1] CRAN (R 4.2.1)
 later         1.3.0     2021-08-18 [1] CRAN (R 4.2.1)
 lifecycle     1.0.1     2021-09-24 [1] CRAN (R 4.2.1)
 lubridate     1.8.0     2021-10-07 [1] CRAN (R 4.2.1)
 magrittr      2.0.3     2022-03-30 [1] CRAN (R 4.2.1)
 memoise       2.0.1     2021-11-26 [1] CRAN (R 4.2.1)
 mime          0.12      2021-09-28 [1] CRAN (R 4.2.0)
 openssl       2.0.2     2022-05-24 [1] CRAN (R 4.2.1)
 pillar        1.7.0     2022-02-01 [1] CRAN (R 4.2.1)
 pkgbuild      1.3.1     2021-12-20 [1] CRAN (R 4.2.1)
 pkgconfig     2.0.3     2019-09-22 [1] CRAN (R 4.2.1)
 pkgload       1.3.0     2022-06-27 [1] CRAN (R 4.2.1)
 prettyunits   1.1.1     2020-01-24 [1] CRAN (R 4.2.1)
 processx      3.6.1     2022-06-17 [1] CRAN (R 4.2.1)
 promises      1.2.0.1   2021-02-11 [1] CRAN (R 4.2.1)
 ps            1.7.1     2022-06-18 [1] CRAN (R 4.2.1)
 purrr         0.3.4     2020-04-17 [1] CRAN (R 4.2.1)
 R6            2.5.1     2021-08-19 [1] CRAN (R 4.2.1)
 Rcpp          1.0.8.3   2022-03-17 [1] CRAN (R 4.2.1)
 readr         2.1.2     2022-01-30 [1] CRAN (R 4.2.1)
 remotes       2.4.2     2021-11-30 [1] CRAN (R 4.2.1)
 rlang         1.0.3     2022-06-27 [1] CRAN (R 4.2.1)
 rlist         0.4.6.2   2021-09-03 [1] CRAN (R 4.2.1)
 rmarkdown     2.14      2022-04-25 [1] CRAN (R 4.2.1)
 rstudioapi    0.13      2020-11-12 [1] CRAN (R 4.2.1)
 salesforcer * 1.0.1     2022-03-01 [1] CRAN (R 4.2.1)
 sessioninfo   1.2.2     2021-12-06 [1] CRAN (R 4.2.1)
 shiny         1.7.1     2021-10-02 [1] CRAN (R 4.2.1)
 snakecase     0.11.0    2019-05-25 [1] CRAN (R 4.2.1)
 stringi       1.7.6     2021-11-29 [1] CRAN (R 4.2.0)
 stringr       1.4.0     2019-02-10 [1] CRAN (R 4.2.1)
 tibble        3.1.7     2022-05-03 [1] CRAN (R 4.2.1)
 tidyselect    1.1.2     2022-02-21 [1] CRAN (R 4.2.1)
 tzdb          0.3.0     2022-03-28 [1] CRAN (R 4.2.1)
 usethis       2.1.6     2022-05-25 [1] CRAN (R 4.2.1)
 utf8          1.2.2     2021-07-24 [1] CRAN (R 4.2.1)
 vctrs         0.4.1     2022-04-13 [1] CRAN (R 4.2.1)
 vroom         1.5.7     2021-11-30 [1] CRAN (R 4.2.1)
 withr         2.5.0     2022-03-03 [1] CRAN (R 4.2.1)
 xfun          0.31      2022-05-10 [1] CRAN (R 4.2.1)
 XML           3.99-0.10 2022-06-09 [1] CRAN (R 4.2.0)
 xml2          1.3.3     2021-11-30 [1] CRAN (R 4.2.1)
 xtable        1.8-4     2019-04-21 [1] CRAN (R 4.2.1)
 yaml          2.3.5     2022-02-21 [1] CRAN (R 4.2.0)
 zip           2.2.0     2021-05-31 [1] CRAN (R 4.2.1)
@charithlw
Copy link
Author

As an aside, when I try that query without the line PKChunkingHeader = list('Sforce-Enable-PKChunking' = TRUE) in the call to sf_query, I get the following error message:

Error:
! Column name `result` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `repaired_names()`:
! Names must be unique.These names are duplicated:
  * "result" at locations 1 and 2.

Not sure if that helps or confuses... 😆

@charithlw
Copy link
Author

Perhaps I should add (although it may be more appropriate to submit as a separate issue?) that when I try using the Bulk 2.0 API I get the following error message:

Error in rbindlist(l = l, fill = fill, idcol = idcol, ...) :                                                                                                                              
  Class attribute on column 7 of item 2 does not match with column 9 of item 1.

I actually get this error message pretty commonly when using the Bulk 2.0 option with sf_query, which is why I was originally using Bulk 1.0.

Not sure how to debug this...

StevenMMortimer added a commit that referenced this issue Sep 17, 2022
Attempt to fix the error caused by having more than one query batch created for ones using the PKChunking option (#124)

Also, this commit includes changes to as_tibble() that make the name repair argument explicit.
StevenMMortimer added a commit that referenced this issue Sep 17, 2022
Provide this argument in an attempt to fix issues like the one mentioned in #124 where the sf_query() function bombs out because of duplicated column names in the tibble.
@StevenMMortimer
Copy link
Owner

Thanks @charithlw for flagging these issues! They seem to be 3 different, but all slightly related issues, so I'll try to address each in the order you've mentioned them.

Issue 1: Error in if (grepl("/services/data/...

PKChunked queries are a little bit unique in that they create one bulk batch that has a state "NotProcessed" that provides the chunking instructions and needs to be ignored, and then all of the chunks are created as separate batches. The existing code wasn't handling those batches properly and sending more than one a time to poll the result; hence, the issue about more than 1 URL. I've updated the underlying functions behind sf_query() to fix the issue in this commit. It would be great if you could install the package from the GitHub dev branch and let me know if it fixes the issue you were having:

# install.packages("remotes")
remotes::install_github("StevenMMortimer/salesforcer", ref = "dev")

Issue 2: Error:! Column name result must not be duplicated.

It's not obvious which function is generating the error. My guess is one of the as_tibble() calls because that will trigger a check for column name issues. I've updated all references to as_tibble() in the package code with the argument .name_repair = "unique" so that all instances have a name repair method specified and its the same across all. I'm not sure it will fix the issue, but probably a good fix to make regardless.

Issue 3: Error in rbindlist...

It looks like the {{data.table}} package is working on ways to silence/ignore errors where the binding of different lists into a single data.frame can ignore the attributes of the columns (Rdatatable/data.table#5446). My guess is that the query is pulling some values and coalescing a column to a date and another batch might have all NULLs for that column and it's returned as a boolean (which can happen). All that being said, I can't really dig into your specific issue and we have to wait until the {{data.table}} package sorts things out. In the meantime, have you tried running the query with the Bulk 2.0 and specifying the argument guess_types=FALSE? This will cast everything as a character and you can convert it after it has all been put together. This could be one workaround, but you'd have to try and see if it works.

@StevenMMortimer StevenMMortimer added the bug Unintended behavior that should be corrected label Sep 17, 2022
@StevenMMortimer StevenMMortimer added this to the v1.0.2 milestone Sep 17, 2022
@charithlw
Copy link
Author

Hi @StevenMMortimer so can confirm that Issue 3 was solved with the workaround that you suggested which is to use guess_types = FALSE. So happy about that! I'll test out the other 2 and get back to you shortly. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended behavior that should be corrected
Projects
None yet
Development

No branches or pull requests

2 participants