Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add BulkImport APIs and cron #966

Open
wants to merge 21 commits into
base: feat/bulk-import-base
Choose a base branch
from

Conversation

anku255
Copy link
Contributor

@anku255 anku255 commented Mar 20, 2024

Summary of change

(A few sentences about this PR)

Related issues

Test Plan

(Write your test plan here. If you changed any code, please provide us with clear instructions on how you verified your
changes work. Bonus points for screenshots and videos!)

Documentation changes

(If relevant, please create a PR in our docs repo, or create a checklist here
highlighting the necessary changes)

Checklist for important updates

  • Changelog has been updated
    • If there are any db schema changes, mention those changes clearly
  • coreDriverInterfaceSupported.json file has been updated (if needed)
  • pluginInterfaceSupported.json file has been updated (if needed)
  • Changes to the version if needed
    • In build.gradle
  • If added a new paid feature, edit the getPaidFeatureStats function in FeatureFlag.java file
  • Had installed and ran the pre-commit hook
  • If there are new dependencies that have been added in build.gradle, please make sure to add them
    in implementationDependencies.json.
  • Update function getValidFields in io/supertokens/config/CoreConfig.java if new aliases were added for any core config (similar to the access_token_signing_key_update_interval config alias).
  • Issue this PR against the latest non released version branch.
    • To know which one it is, run find the latest released tag (git tag) in the format vX.Y.Z, and then find the
      latest branch (git branch --all) whose X.Y is greater than the latest released tag.
    • If no such branch exists, then create one from the latest released branch.
  • If added a foreign key constraint on app_id_to_user_id table, make sure to delete from this table when deleting the user as well if deleteUserIdMappingToo is false.

@anku255 anku255 changed the base branch from master to feat/bulk-import-base March 20, 2024 07:30
@anku255 anku255 requested a review from sattvikc April 18, 2024 11:20
build.gradle Show resolved Hide resolved
@@ -26,6 +26,8 @@
import io.supertokens.pluginInterface.multitenancy.TenantIdentifier;
import io.supertokens.pluginInterface.multitenancy.exceptions.TenantOrAppNotFoundException;
import io.supertokens.webserver.api.accountlinking.*;
import io.supertokens.webserver.api.bulkimport.BulkImportAPI;
import io.supertokens.webserver.api.bulkimport.DeleteBulkImportUserAPI;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this file, but you need to update the CDI stuff in WebserverAPI.java as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/main/java/io/supertokens/bulkimport/BulkImport.java Outdated Show resolved Hide resolved
src/main/java/io/supertokens/bulkimport/BulkImport.java Outdated Show resolved Hide resolved
// so we have to use an array.
String[] errorMessage = { e.getMessage() };

if (e instanceof StorageTransactionLogicException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add the error stack in the message? So the saved error can look like:
{
errorMsg: "...",
errorStack: "..."
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a utils function somewhere in the code to get the error stack from an exception as a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should provide error stack. We don't expect our users to debug the issue by looking at the code. The error message should be clear enough to understand which field has the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea, but users may ask on support that they are getting this error and then for us, it will be way easier to know from where. This can be skipped only if the error messages are globally unique.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to add error codes in the error message to make them globally unique.

Comment on lines 61 to 63
if (arr.size() == 0) {
throw new ServletException(new WebserverAPI.BadRequestException("Field name 'ids' cannot be an empty array"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, just return status code 200 ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the right JSON output format ofc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to return an error because calling this API with no ids is not a valid case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes it harder for people to call the API. Cause if they have a loop in which they are calling the API, they will be forced to check if the input is empty before calling the API (as opposed to being flexible enough to allow them to check it after and then exit the loop)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I have updated it.

JsonObject input = InputParser.parseJsonObjectOrThrowError(req);
JsonArray users = InputParser.parseArrayOrThrowError(input, "users", false);

if (users.size() <= 0 || users.size() > BulkImport.MAX_USERS_TO_ADD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case it's 0, you just return 200 status code with correct json output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to return an error because calling this API with 0 users is not a valid case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes it harder for people to call the API. Cause if they have a loop in which they are calling the API, they will be forced to check if the input is empty before calling the API (as opposed to being flexible enough to allow them to check it after and then exit the loop)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I have updated it.

this.allUserRoles = allUserRoles;
this.allExternalUserIds = new HashSet<>();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the bulkimport user type, we also need to allow plain text password cause it will be used during lazy migration task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done.

public class ProcessBulkImportUsers extends CronTask {

public static final String RESOURCE_KEY = "io.supertokens.ee.cronjobs.ProcessBulkImportUsers";
private Map<String, SQLStorage> userPoolToStorageMap = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to allow parallelism:

  • Allow devs to configure the parallelism. Default is 1.
  • The prop can be called BULK_MIGRATION_PARALLELISM. This is a SaaS protected prop and can be added to PROTECTED_CONFIGS in CoreConfig.java. It should have the @NotConflictingInApp annotation.
  • You can make a pool of proxy storages here based on the value of this prop, and create those number of threads as well.

.collect(JsonArray::new, JsonArray::add, JsonArray::addAll);

JsonObject errorResponseJson = new JsonObject();
errorResponseJson.add("errors", errors);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is an example of how this output looks like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
  "errors": [
    "Role role1 does not exist.",
    "Role role2 does not exist.",
    "Invalid tenantId: t1 for emailpassword recipe.",
    "Invalid tenantId: t1 for thirdparty recipe.",
    "Invalid tenantId: t1 for passwordless recipe."
  ]
}

}

JsonObject input = InputParser.parseJsonObjectOrThrowError(req);
JsonObject jsonUser = InputParser.parseJsonObjectOrThrowError(input, "user", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think the input should have a user prop. Cause then its like:

{
   user: {...}
}

Instead. it should just have all the details in the root of the json input

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return StorageUtils.getBulkImportStorage(storage).getBulkImportUsersCount(appIdentifier, status);
}

public static synchronized AuthRecipeUserInfo importUser(Main main, AppIdentifier appIdentifier,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a lot of code repetition between this func and the processUser func in the cronjob. Please refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it.

Comment on lines 136 to 142
// `BulkImportProxyStorage` uses `BulkImportProxyConnection`, which overrides the `.commit()` method on the Connection object.
// The `initStorage()` method runs `select * from table_name limit 1` queries to check if the tables exist but these queries
// don't get committed due to the overridden `.commit()`, so we need to manually commit the transaction to remove any locks on the tables.

// Without this commit, a call to `select * from bulk_import_users limit 1` in `doesTableExist()` locks the `bulk_import_users` table,
// causing other queries to stall indefinitely.
bulkImportProxyStorage.commitTransactionForBulkImportProxyStorage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be moved to the storage layer? it can be override the initStorage function, call the super impl, and then call commitTransactionForBulkImportProxyStorage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 293 to 294
StorageTransactionLogicException exception = (StorageTransactionLogicException) e;
errorMessage[0] = exception.actualException.getMessage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageQueryException usually means something like the db is down or something. All errors that are due to semantic issues have a different type (like emailalreadyexistsexception). Please double check error types

@anku255 anku255 mentioned this pull request May 29, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants