Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration to salesforce #16

Open
dgarnitz opened this issue Aug 15, 2023 · 8 comments
Open

Add integration to salesforce #16

dgarnitz opened this issue Aug 15, 2023 · 8 comments

Comments

@dgarnitz
Copy link
Owner

Vectorflow should be able to ingest raw data from Salesforce.

Some open questions to explore prior to implementation:

  • can this we done through the existing API or does it need a separate ingestion worker
  • how to share credentials / do this securely
  • what file formats can be expected
@asadnhasan
Copy link

Thanks for raising this feature request. Ingesting data from Salesforce into VectorFlow is definitely something we should explore supporting. Here are some initial thoughts on your questions:

Regarding ingestion - It looks like the Salesforce REST API provides options for exporting data in JSON, XML, and CSV formats. I think the best approach would be to build a separate lightweight ingestion worker specifically for Salesforce data. This worker could handle authentication with Salesforce using OAuth, make API calls to export data, do any needed parsing/validation, and then pass the transformed data to VectorFlow's main ingestion pipeline.

For security - OAuth should allow secure authentication for the ingestion worker to access Salesforce. We can encrypt any credentials stored in configuration. Restricting the worker to only access the needed Salesforce data exports will also be important.

Suggested file formats - The Salesforce API supports JSON, XML and CSV. CSV may be the easiest to work with in VectorFlow if we can get full data exports. For more targeted exports, JSON or XML may be required. Some parsing would be needed in the ingestion worker before passing data to VectorFlow in a supported format like Parquet.

Possible Next Steps, which can be further worked on:

Exploring Salesforce OAuth authentication flows for the ingestion worker
Test sample data exports from Salesforce API in JSON, XML and CSV
Prototype basic ingestion worker to extract sample export, parse data, and write to Parquet
Evaluation of how exported data maps to VectorFlow's expected input schema (Important)

@dgarnitz
Copy link
Owner Author

How feasible would it be to use this: https://llamahub.ai/l/tools-salesforce?

I don't think we should have a separate salesforce worker. An endpoint, /salesforce in the existing API should do the trick. Can you choose what format (i.e. JSON or CSV) that the data is exported in?

@mmabrouk
Copy link

Small note: I'd look into how airbyte solve this:
https://github.com/airbytehq/airbyte/tree/f54bd550aae9b4bf19220b50af47da0adc3b4ff1/airbyte-integrations/connectors/source-salesforce

@dgarnitz
Copy link
Owner Author

We are planning to add an Airbyte connector, maybe we can access the salesforce data through that

@david-vectorflow
Copy link
Collaborator

@asadnhasan are you still planning on working on this?

@asadnhasan
Copy link

@david-vectorflow Yes, David I am still working on it.

@dgarnitz
Copy link
Owner Author

dgarnitz commented Apr 4, 2024

@asadnhasan hey do you still have an interest in building this out?

@syedzaidi-kiwi
Copy link

Yes absolutely, I would love to contribute. Will come with something in 2-3 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants