Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added delete document functionality #464

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

azaylamba
Copy link
Contributor

Issue #149 :

Description of changes: Added delete functionality for all types of documents (Files, Texts, Q&A and Websites). The feature deletes the documents from S3 upload bucket, S3 processed bucket, DynamoDB documents table, OpenSearch index and also updates DynamoDB workspaces table. Following are the major code changes:

  1. Added delete button on UI for each row of the documents.
  2. Added confirmation dialog via Modal so that user can Cancel/Delete the document from there.
  3. Created AWS step function to use State Machines and delete document workflow. This way, the whole process is organised and is automatically rolled back if any of the operation in the step function fails.

Major components and their working is as below:

  1. documents-tab.tsx has functionality related to delete button and handling of confirmation Modal.
  2. documents-client.ts has function deleteDocument to hit the backend API.
  3. delete_document function in lib/chatbot-api/functions/api-handler/routes/documents.py handles the API request
  4. deleteDocumentWorkflow is created in lib/rag-engines/workspaces/index.ts
  5. delete-document.ts has internal structure of Delete document workflow
  6. The lambda function to handle the workflow is written in lib/rag-engines/workspaces/functions/delete-document-workflow/delete/index.py
  7. The execution of state machine starts in delete_document function of lib/shared/layers/python-sdk/python/genai_core/documents.py
  8. The actual deletion of documents happens in delete_open_search_document function of lib/shared/layers/python-sdk/python/genai_core/opensearch/delete.py

Request flow would be like documents-client -> documents.py (api handler) -> documents.py (genai_core) -> index.py (delete-document-workflow) -> delete.py (genai_core/opensearch)

As part of this change, also updated version of opensearch-py which was initially updated as calling direct http methods was not allowed in earlier version but later on calling http methods was not required. Kept this change for future perspective as it would have no impact.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

azaylamba and others added 2 commits April 21, 2024 00:33
Added delete functionality for all types of documents (Files, Texts, Q&A and Websites). The feature deletes the documents from S3 upload bucket, S3 processed bucket, DynamoDB documents table, OpenSearch index and also updates DynamoDB workspaces table.
Following are the major code changes:
1. Added delete button on UI for each row of the documents.
2. Added confirmation dialog via Modal so that user can Cancel/Delete the document from there.
3. Created AWS step function to use State Machines and delete document workflow. This way, the whole process is organised and is automatically rolled back if any of the operation in the step function fails.

Major components and their working is as below:
1. documents-tab.tsx has functionality related to delete button and handling of confirmation Modal.
2. documents-client.ts has function deleteDocument to hit the backend API.
3. delete_document function in lib/chatbot-api/functions/api-handler/routes/documents.py handles the API request
4. deleteDocumentWorkflow is created in lib/rag-engines/workspaces/index.ts
5. delete-document.ts has internal structure of Delete document workflow
6. The lambda function to handle the workflow is written in lib/rag-engines/workspaces/functions/delete-document-workflow/delete/index.py
7. The execution of state machine starts in delete_document function of lib/shared/layers/python-sdk/python/genai_core/documents.py
8. The actual deletion of documents happens in delete_open_search_document function of lib/shared/layers/python-sdk/python/genai_core/opensearch/delete.py

Request flow would be like documents-client -> documents.py (api handler) -> documents.py (genai_core) -> index.py (delete-document-workflow) -> delete.py (genai_core/opensearch)

As part of this change, also updated version of opensearch-py which was initially updated as calling direct http methods was not allowed in earlier version but later on calling http methods was not required. Kept this change for future perspective as it would have no impact.
@azaylamba
Copy link
Contributor Author

@massi-ang @bigadsoleiman Could you please review this?

sql.SQL("DELETE FROM {table} WHERE document_id = %s").format(table=table_name),
(document_id,),
)
print(f"Deleted document {document_id} from {table_name}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a missing cursor commit. Add cursor.connection.commit()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out @gbone-restore. I have added the commit statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

3 participants