Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAS-148 | Data export #465

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

PAS-148 | Data export #465

wants to merge 22 commits into from

Conversation

jonashendrickx
Copy link
Member

@jonashendrickx jonashendrickx commented Mar 6, 2024

Ticket

Description

We need to be able to export individual applications with all their relevant data, so customers can migrate to self-hosted instances.

Shape

Given applications with either a lot of users or enterprise applications with event logging enabled can end up having a lot of records.

Considerations:

  • We need to be able to enable paging easily at a later stage.
  • CSV:
    • RFC-4180 compliant
    • Easy to enable paging later
    • CsvHelper has a lot of capable functionality included to help us achieve paging easily later.
    • UTF-8 encoded

Remarks:

  • Wasn't meant to include paging at this stage.
  • Wasn't meant to include encryption at this stage.

Split to one file per entity type. Every file is a csv document compliant with RFC-4180. Splitting to a file per record makes it also easier theoretically to enable paging at a later stage when a file grows to a certain size.

POST /backup/schedule: Schedules a new backup job.

GET /backup/jobs: Retrieves a list of backup jobs with their relevant status past or present.

Example response:

[
{  "jobId": "guid", "status": "pending", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
{  "jobId": "guid", "status": "failed", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
{  "jobId": "guid", "status": "running", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
{  "jobId": "guid", "status": "completed", "createdAt": "ISO-8160", "lastUpdatedAt": "ISO-8160" },
]

The data stored is stored as a UTF-8 encoded byte array.

public class ArchiveJob : PerTenant
{
    public Guid Id { get; set; }

    public AccountMetaInformation Application { get; set; }

    public DateTime CreatedAt { get; set; }

    public DateTime? UpdatedAt { get; set; }

    public JobStatus Status { get; set; } = JobStatus.Pending;

    public List<Archive> Archives { get; set; } = new();
}
public class Archive : PerTenant
{
    public Guid Id { get; set; }

    /// <summary>
    /// The identifier of the backup job that created this archive.
    /// </summary>
    public Guid JobId { get; set; }

    public DateTime CreatedAt { get; set; }

    public Type? Entity { get; set; }

    [MaxLength(100 * 1024 * 1024, ErrorMessage = "Data cannot be larger than 100MB.")]
    public byte[] Data { get; set; }

    public AccountMetaInformation Application { get; set; } = null!;
}

The upload process would likely involve a user to create a new organization, and import an old app by uploading the exported files. The restoration would only be able to start if the schema's match and when all documents were successfully uploaded.

ApplicationEvents

An additional migration for ApplicationEvents had to be included, as it was not inheriting from PerTenant according to the conventions we had laid out for DbTenantContext. This was then essentially messing with the generics I had implemented in BackupWorkerService. This would have otherwise significantly impacted maintainability as well.

To prevent any unnecessary migrations from being executed which could result into data loss, the database column was mapped manually to its original value. Just the snapshot was essentially updated to reflect the mapping from the column name to the property (CLR).

Screenshots

n/a

Checklist

I did the following to ensure that my changes were tested thoroughly:

  • Unit tests
  • Integration tests

I did the following to ensure that my changes did not introduce new security vulnerabilities:

  • Secured endpoints with secret key.
  • Sanitization for macro's was ignored, given these backups are not meant to be opened in a program like Microsoft Excel.

Copy link

codecov bot commented Mar 7, 2024

Codecov Report

Attention: Patch coverage is 47.23502% with 1374 lines in your changes are missing coverage. Please review.

Project coverage is 33.91%. Comparing base (a194a43) to head (0574e1e).

Files Patch % Lines
...7_MakeApplicationEventInheritPerTenant.Designer.cs 0.00% 503 Missing ⚠️
.../Sqlite/20240311143910_AddBackupTables.Designer.cs 0.00% 502 Missing ⚠️
...vice/Migrations/Mssql/MsSqlContextModelSnapshot.cs 0.00% 97 Missing ⚠️
...ce/Migrations/Sqlite/SqliteContextModelSnapshot.cs 0.00% 97 Missing ⚠️
...igrations/Sqlite/20240311143910_AddBackupTables.cs 0.00% 65 Missing ⚠️
src/Service/Backup/BackupWorkerService.cs 0.00% 50 Missing ⚠️
src/Service/Backup/BackupBackgroundService.cs 0.00% 29 Missing ⚠️
src/Service/Backup/BackupService.cs 78.57% 6 Missing and 3 partials ⚠️
src/Service/Models/Archive.cs 0.00% 6 Missing ⚠️
src/Common/Backup/Mapping/EntityFrameworkMap.cs 64.28% 3 Missing and 2 partials ⚠️
... and 6 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #465      +/-   ##
==========================================
+ Coverage   32.63%   33.91%   +1.28%     
==========================================
  Files         504      525      +21     
  Lines       26394    28985    +2591     
  Branches      819      833      +14     
==========================================
+ Hits         8613     9831    +1218     
- Misses      17670    19038    +1368     
- Partials      111      116       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonashendrickx jonashendrickx changed the title WIP | Data export PAS-148 | Data export Mar 11, 2024
@jonashendrickx jonashendrickx marked this pull request as ready for review March 11, 2024 15:25
@jonashendrickx jonashendrickx requested a review from a team as a code owner March 11, 2024 15:25
}

private async Task BackupEntityAsync<TEntity>(Guid groupId, string tenant) where TEntity : PerTenant
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abergs & @jrmccannon For this I required to make ApplicationEvent inherit from PerTenant.

@@ -87,9 +90,10 @@ protected override void OnModelCreating(ModelBuilder modelBuilder)
modelBuilder.Entity<ApplicationEvent>(builder =>
{
builder.HasKey(x => x.Id);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds backwards compatibility, so we don't need to migrate any data, it just maps it to the new CLR property.

@abergs
Copy link
Member

abergs commented Mar 12, 2024

Re-iterating earlier comment: Let's review this PR but do not merge it until we've shipped some of the things currently in main to prod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants