Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: refactoring #3754

Open
tombuildsstuff opened this issue Feb 5, 2024 · 0 comments
Open

Meta: refactoring #3754

tombuildsstuff opened this issue Feb 5, 2024 · 0 comments
Assignees

Comments

@tombuildsstuff
Copy link
Member

tombuildsstuff commented Feb 5, 2024

Due to the rapidly shifting requirements from when Pandora was first built, there’s a decent chunk of tech debt which requires cleanup.

This issue looks to outline the different phases required to get there, which will both unify the models and terminology used in the codebase, reduce the complexity of importer-rest-api-specs - and allow us to roll out support for Microsoft Graph.

Whilst the phases are outlined in the next section - I want to call out the notes section at the end, and specifically the updated terminology section - with the intention of standardising the technology.

Phases

These phases need to be done as a part of the refactor - some of these need to be done sequentially, others can be done in parallel - but ultimately all are needed.

  1. ✅ Preparation/lift & shift. tools/data-api-sdk: Phase 1 and Phase 3 #3784
  2. ✅ Updating ./tools/data-api to use the new Models. tools/data-api: refactoring to use the new Data API SDK [Phase 2] #3786
  3. ✅ Building out the updated SDK for ./tools/data-api-sdk. tools/data-api-sdk: Phase 1 and Phase 3 #3784
  4. ✅ Refactoring ./tools/data-api-differ to use the new SDK. tools/data-api-differ: refactoring to use the new Data API SDK [Phase 4] #3789
  5. ✅ Refactoring ./tools/generator-terraform to use the new SDK. tools/generator-terraform: refactoring to use the new Data API SDK [Phase 5] #3800
  6. ✅ Refactoring ./tools/generator-go-sdk to use the new SDK. (Foundational work in tools/generator-go-sdk: refactoring prior to moving to the new Data API SDK/Models [Phase 6 Part 1] #3802 / Type changes in tools/generator-go-sdk: refactoring to use the Data API SDK models [Phase 6 Part 2] #3804)
  7. ✅ Refactoring ./tools/wrapper-automation to use the new SDK. tools/wrapper-automation and automation: updating to use the new Data API SDK types [Phase 7] #3790
  8. ✅ Refactoring ./tools/importer-rest-api-specs to use the new test assertion helpers.
  9. ✅ Preparation for cleanup of ./tools/importer-rest-api-specs.
  10. ✅ Refactoring dataapigeneratorjson.
  11. 🔨 Building out ./tools/data-api-repository.
  12. 🔨 Refactoring ./tools/importer-rest-api-specs.
  13. 🔨 Updating ./tools/importer-rest-api-specs/components/terraform to use the new SDK types throughout.
  14. ✅ Updating ./tools/importer-rest-api-specs/components/parser to use the new SDK types.
  15. Updating ./tools/importer-msgraph-metadata so that it uses the new repository.
  16. Updating ./tools/version-bumper and ./tools/importer-rest-api-specs.
  17. Removing ./tools/sdk.

(✅ = done / 📋 = waiting for review / 🔨 = in progress)

At the end of it, the resulting directory structure of ./tools should look similar to below:

├── data-api
│   ├── internal
│   │   ├── commands
│   │   └── endpoints
├── data-api-differ
│   ├── internal
│   │   ├── changes
│   │   ├── commands
│   │   ├── differ
│   │   ├── log
│   │   └── views
├── data-api-repository
│   ├── internal
│   │   ├── storage
│   │   └── transforms
├── data-api-sdk
│   ├── v1
│   │   └── models
├── generator-go-sdk
│   ├── internal
│   │   ├── featureflags
│   │   └── generator
├── generator-terraform
│   ├── internal
│   │   ├── cmd
│   │   ├── generator
│   │   │   ├── datasource
│   │   │   ├── definitions
│   │   │   ├── helpers
│   │   │   ├── mappings
│   │   │   ├── models
│   │   │   ├── pluginsdkattributes
│   │   │   ├── resource
│   │   │   │   └── docs
├── importer-msgraph-metadata
│   ├── internal
│   │   ├── cmd
│   │   └── stages
│   │   │   └── apidefinitions
├── importer-rest-api-specs
│   ├── internal
│   │   ├── cmd
│   │   ├── featureflags
│   │   ├── stages
│   │   │   ├── parser (or `apidefinitions`?)
│   │   │   │   ├── discovery
│   │   │   │   └── parser
│   │   │   │   │   ├── cleanup
│   │   │   │   │   ├── commonschema
│   │   │   │   │   ├── constants
│   │   │   │   │   ├── dataworkarounds
│   │   │   │   │   ├── internal
│   │   │   │   │   ├── resourceids
│   │   │   │   │   └── testdata
│   │   │   ├── terraform
│   │   │   │   ├── examples
│   │   │   │   ├── helpers
│   │   │   │   ├── resources
│   │   │   │   ├── schema
│   │   │   │   │   └── processors
│   │   │   │   └── testing

├── version-bumper
│   └── internal
│   │   └── cmd
└── wrapper-automation
│   └── internal
│   │   └── cmd

Phase 1: preparation/lift & shift

This phase involves scaffolding out ./tools/data-api-sdk so that we have a foundation to work from. Notably this stage only moves the models and scaffolds out the functionality - but does not build out the SDK clients.

This phase involves the initial scaffolding out of ./tools/data-api-sdk, including:

  1. Scaffolding out the shell for ./internal/data-api-sdk. (tools/data-api-sdk: scaffolding out the Data API SDK #3761)
  2. Implements the Source Data Type concept (see below). (tools/data-api-sdk: scaffolding out the Data API SDK #3761)
  3. Implement the /health endpoint in the SDK. (tools/data-api-sdk: scaffolding out the Data API SDK #3761)
  4. Copy ./tools/internal/data-api/models into ./tools/data-api-sdk/v1/models - intentionally with no changes but prefixing each filename with wip_ to enable review of each type.
  5. Work through each of the duplicated models, renaming as needed - but introducing a temporary temp_aliases.go file, which contains type aliases for any renamed types. This both enables both us to quickly refactor to the existing models - by using deprecations to identify the relevant renames.
  6. Any helper methods (e.g. GolangTypeName, TopLevelObjectDefinition) should be moved from the SDK into a separate package.

Phase 2: updating ./tools/data-api to use the new Models

This phase involves updating the Data API such that it returns the types defined in ./tools/data-api-sdk, rather than ./tools/data-api/models.

Notably this does not yet (temporarily) split out the repositories into ./tools/data-api-repository - this is done in a later stage.

Phase 3: building out the updated SDK for ./tools/data-api-sdk

This phase builds out both the endpoints in the SDK - and a friendlier means of retrieving all of the data in one go (so that we can define this once, rather than variants in each tool).

The result of this should be Request and Response objects similar to the Go SDK that we’re generating, for consistency purposes.

Phase 4: refactoring ./tools/data-api-differ to use the new SDK

This phase involves refactoring ./tools/data-api-differ so that information is sourced using the SDK located at ./tools/data-api-sdk rather than ./tools/sdk/resourcemanager.

Currently this tool manually hits the health endpoint to check the Data API is available - this should be updated to use the (new) SDK method.

This also includes adopting the Source Data Type concept as a CLI argument (e.g. ./data-api-differ breaking-changes resource-manager [..]).

Phase 5: refactoring ./tools/generator-terraform to use the new SDK

This phase involves refactoring ./tools/generator-terraform so that it uses the SDK located at ./tools/data-api-sdk rather than ./tools/sdk/resourcemanager.

It’d be worth doing some general cleanup/evaluation at this point (including adopting an internal package).

This also includes adopting the Source Data Type concept as a CLI argument (e.g. ./generator-terraform resource-manager [..]).

Phase 6: refactoring ./tools/generator-go-sdk to use the new SDK

This phase involves refactoring ./tools/generator-go-sdk so that it uses the SDK located at ./tools/data-api-sdk rather than ./tools/sdk/resourcemanager.

This also includes adopting the Source Data Type concept as a CLI argument (e.g. ./generator-go-sdk resource-manager [..]).

It’d be worth doing some general cleanup/evaluation at this point (including adopting an internal package) - plus potentially removing support for the old base layer (if it’s around as much work as porting the old one over temporarily).

Once this is done, we should be able to integrate @manicminer’s Microsoft Graph extensions (such as Common Types support) from #2721 - meaning that during Phase 15, we should be able to see the results of generating a Graph SDK end-to-end to validate everything works as expected.

Phase 7: refactoring ./tools/wrapper-automation to use the new SDK

This phase involves updating ./tools/wrapper-automation to use the new SDK. Currently this tool manually hits the health endpoint to check the Data API is available - this should be updated to use the (new) SDK method.

Phase 8: refactoring ./tools/importer-rest-api-specs to use the new test assertion helpers

This phase involves updating ./tools/importer-rest-api-specs/components/parser to use the new test assertion helpers.

This should both dramatically reduce the size of the tests (making them more readable) but also makes refactoring easier down the line.

Phase 9: preparation for cleanup of ./tools/importer-rest-api-specs

This stage involves updating the ./tools/importer-rest-api-specs/components/dataapigeneratorjson package to consume the new SDK types as an input.

This will likely include extending ./tools/importer-rest-api-specs/components/transformer to include transforming to the new SDK model types, to enable a gradual transition (since we need to update each stage of importer-rest-api-specs to return/consume these types, done in future phases).

Phase 10: refactoring dataapigeneratorjson

This stage involves updating dataapigeneratorjson so that it takes just the Service (including all API Versions) to write those to disk - which enables us to split this logic out in the next stage.

Phase 11: building out ./tools/data-api-repository

This phase involves building out ./tools/data-api-repository:

  1. Copying the types used for storage on disk into ./tools/data-api-repository/models/storage.
  2. Updating ./tools/importer-rest-api-specs/components/terraform/dataapigeneratorjson to use these new models.
  3. Updating ./tools/data-api/internal/repositories to use these new models.
  4. Moving the repository from ./tools/data-api/internal/repositories to ./tools/data-api-repository.
  5. Consolidating the repository from ./tools/importer-rest-api-specs/components/terraform/dataapigeneratorjson into ./tools/data-api-repository.
  6. Cleanup: removing the types used for storage on disk into ./tools/data-api-repository/internal/storage (since these are an implementation detail).
  7. Cleanup: consolidating the transformation logic into ./tools/data-api-repository/internal/transforms.

At the end of this stage, all logic related to reading/writing the API Definitions to disk should be contained within ./tools/data-api-repository (which implements a single interface).

Phase 12: refactoring ./tools/importer-rest-api-specs

This phase involves:

  1. Removing the schema and segments commands - since these are no longer needed.
  2. Refactoring the logic for both the Discovery/Parsing of API Definitions, and the Building of Terraform Definitions into packages, presumably named parser (or apidefinitions?) and terraform.
  3. Consolidating the stages from ./tools/importer-rest-api-specs/pipeline into the new parser/terraformpackages above.

Phase 13: updating ./tools/importer-rest-api-specs/components/terraform to use the new SDK types throughout

This phase involves updating the ./components/terraform package to both consume these types - and use them throughout.

Phase 14: updating ./tools/importer-rest-api-specs/components/parser to use the new SDK types

This phase involves updating ./components/parser within ./tools/importer-rest-api-specs to return the new SDK types.

Whilst this package is unlikely to be able to use these types throughout - since we require additional context/information as we’re going along - in some cases (namely the ObjectDefinition types) this should be possible.

At this stage we should be able to remove the deprecated aliases from ./tools/data-api-sdk.

Phase 15: updating ./tools/importer-msgraph-metadata so that it uses the new repository.

This phase involves both updating ./tools/importer-msgraph-metadata so that it accounts for the breaking changes in this upstream PR - and so that it writes the API Definitions to disk using the new repository.

Given ./tools/generator-go-sdk will have been updated at this point, at this point we should be able to generate the Microsoft Graph SDK end-to-end.

-> Note: Initially we’re planning to import only a subset of the available services (namely the types used in manicminer/hamilton and hashicorp/terraform-provider-azuread) - but this will expand in time.

Phase 16: updating ./tools/version-bumper and ./tools/importer-rest-api-specs

This phase involves moving the *.hcl types and parsing logic from ./tools/sdk/config into (working name) ./tools/config-sdk.

Phase 17: removing ./tools/sdk

This phase involves removing the old SDK at ./tools/sdk - since this should no longer be using used.

Notes

Source Data Type concept

As discussed offline, when adding new sources of Data (e.g. Microsoft Graph or Data Plane) we currently need to work through each tool and update it to account for that.

This concept pulls the Data Source definitions out of the Data API - giving us a consistent set of constants we can use across the project - but also allowing us to define a slice of AvailableSourceDataTypes.

This means that downstream tooling (such as ./tools/data-api-differ, ./tools/generator-go-sdk and ./tools/generater-terraform) can use these types - and by using the AvailableSourceDataTypes slice, allows these to have “basic” support for the new Source Data type simply by adding it to the slice. This also means that we can use these values in the CLI for consistency (e.g. ./generator-terraform resource-manager [..] and ./generator-terraform microsoft-graph [..]).

Updated Terminology

Whilst the majority of the codebase uses consistent terminology, we should use this opportunity to standardise this - namely using:

Core

  • Service to refer to a Service within the Azure API (e.g. Compute or ContainerService).
  • API Version to refer to the specific API version for a Service (e.g. 2020-01-01 for the Service Compute).
  • API Resource to refer to the grouping of resources within an API Version.
  • Source Data Type to refer to the Type of Source Data (e.g. ResourceManager or MicrosoftGraph) containing one or more Services.
  • Source Data Location (or Origin?) to refer where the Source Data was retrieved from - for example Azure/azure-rest-api-specs, microsoftgraph/msgraph-metadata, handwritten.

SDK Related Terminology

  • SDK Model to refer to a Model returned by the SDK.
  • SDK Field to refer to a Field within an SDK Model.
  • SDK Object Definition to refer to the Object Definition used in an SDK Field.
  • SDK Object Definition Type - refers to the Type of value that this SDK Object Definition represents (e.g. a Boolean, String, Reference).
  • SDK Constant to refer to a Constant returned by the SDK.
  • SDK Constant Type to refer to the Type of Constant (e.g. integer, float or string).
  • SDK Constant Value - used to describe the key, value and a description for a value associated with an SDK Constant.
  • SDK Operation to refer to a given operation within the Azure API (e.g. POST /endpoint).
  • SDK Option Object Definition to refer to the Object Definition type used for the Operation.
  • SDK Date Format - refers to the Date Format used in the SDK. Currently we only support RFC3339 but additional types are defined/used in the Azure API.

Removals / Notes:

  • SDK Field Validation will be removed. This exists from when the Terraform Schema and SDK Models used the same models - but this approach was abandoned due to it being problematic.
  • CustomFieldType used in importer-rest-api-specs will be consolidated into SDK Object Definition. This exists from before ObjectDefinition was fully populated, and has been glue-code/needing cleanup ever since.

Resource ID Related Terminology

  • Resource ID - refers to the object describing a Resource ID, containing at least one Resource ID Segment Type.
  • Resource ID Segment Type - refers to the Type of Segment (User Specified, Resource Group Name) within a Resource ID.

Terraform Related Terminology

  • Terraform Definition - refers to the wrapper type used to describe zero or more Terraform Resource Definition(s) and (in the future) zero of more Terraform Data Source Definition(s).
  • Terraform Resource Definition to refer to the description of a Terraform Resource (e.g. a label of X, these models, uses these SDK methods).
  • Terraform Data Source Definition to refer to the description of a Terraform Data Source (e.g. a label of X, these models, uses these SDK methods) - a placeholder for the future.
  • Terraform Schema Model to refer to a Model used in the Terraform Schema for a given Terraform Resource Definition or (in the future) Terraform Data Source Definition.
  • Terraform Schema Field to refer to a Field within a Terraform Schema Model.
  • Terraform Schema Object Definition - refers to the Object Definition used within the Terraform Schema.
  • Terraform Schema Object Definition Type - refers to the Type of value used for this Terraform Schema Object Definition.
  • Terraform Method Definition - describes the SDK Operation and the timeout which should be used for this Terraform Method (Create, Read, Update, Delete). This is used in both the Terraform Resource Definition and (in the future) in the Terraform Data Source Definition.
  • Terraform Documentation Definition - describes the documentation (category, description, example usage) for a Terraform Resource Definition and Terraform Data Source Definition.
  • Terraform Schema Field Documentation Definition - describes the documentation (currently markdown, potentially "related examples" in the future) for a Terraform Resource Definition and Terraform Schema Field.
  • Terraform Schema Field Validation Definition - describes the validation used for a Terraform Schema Field.
  • Terraform Schema Field Validation Type - describes the type of validation used for a Terraform Schema Field (e.g. PossibleValues or Range [in the future]).
  • Terraform Schema Field Possible Values Definition - describes the values used for the Possible Values validation (e.g. these allowed values using this type).
  • Terraform Schema Field Possible Values Type - describes the type of values used for the Possible Values validation (e.g. float, int, string).
  • Terraform Resource Tests Definition - describes the tests used for a Terraform Resource Definition.
  • Terraform Data Source Tests Definition - describes the tests used for a Terraform Data Source Definition - a placeholder for the future.
  • Terraform Test Definition - describes a test, comprising a Terraform Configuration and a reference to the associated Terraform Test Data Variables.
  • Terraform Test Data Variables - describes the variables used within the test data for the Terraform Test Definition.

Notes:

  • Today the Terraform Schema items are named Resource (with the intention of Data Source using a different struct) - but this assumes we're flattening these down.

Mappings related Terminology

(TODO, but this is unchanged from today)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant