Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure and format for airtable class objects #15

Open
elipousson opened this issue Jun 5, 2023 · 2 comments
Open

Structure and format for airtable class objects #15

elipousson opened this issue Jun 5, 2023 · 2 comments

Comments

@elipousson
Copy link

elipousson commented Jun 5, 2023

This is a bit of a brain dump so I'm hoping it makes sense but here is a quick summary of the existing class structure and potential new directions/alternate approaches.

Background

The new implementation of airtable class objects in the pull request for migrating the package to {httr2} #14 includes four S3 object types:

  • airtable
  • airtable_base_schema
  • airtable_table_schema
  • airtable_fields_schema

An airtable object is a list with a base ID, table ID or name, user-facing base url, and API request url. A table url is included if table is specified as a table ID (but not as a table name). A base name/description, user permissions level, and table name can optionally be included. The base name (called description for the airtable object to avoid a conflict with the table name value) is pulled from the metadata if description is NULL. The table value is assumed to also be the name name if the supplied table is not a table ID. One or more views can be included (although multiple views will break other functions).

airtable_base_schema is a list that includes airtable_table_schema and airtable_fields_schema as components.

These were all part of the existing development branch but the pull request converted them from environment objects to list objects to use the S3 vectors classes functions available through vctrs.

Possible challenges with existing airtable class objects

Currently, the airtable class objects represent a single table but the values are not validated when the object is created so it is possible to create an airtable object that has a table and view that does not exist within the specified base. Would we want to add validation so this is no longer possible? If so, is an airtable object effectively just a subset of the data included within an airtable_table_schema?

I'd also prefer a clearer hierarchy where there is an object that represents a single base, an object that represents a single table, and an object that represents a single view (potentially using sub-classes, e.g. airtable, airtable_tbl, airtable_view).

Correspondence between API and class structure

I one goal for the class structure of the package should be a clear correspondence between the package class structure and the data model built in to the Airtable Web API.

Right now I think the airtable and airtable_table_schema are both close equivalents to the table model: https://airtable.com/developers/web/api/model/table-model We could add a similar object to serve as an equivalent for a table config object: https://airtable.com/developers/web/api/model/table-config

airtable_fields_schema should exist as (or be convertible to) an array of field config objects: https://airtable.com/developers/web/api/field-model

Ideally, when create_table() is implemented, it could take an airtable_table_schema and convert it into a create an identically structured table. Similarly, a airtable_fields_schema could be use to add a set of fields using create_field() (or some additional function, e.g. create_fields()).

The view metadata endpoint is part of the Enterprise API but it effectively includes an implicit view object model that we could also use as a base for an additional class or sub-class: https://airtable.com/developers/web/api/get-view-metadata

Object type for airtable class objects

While the option to return to an environment object as a base might make it easier to set the active view, I think I prefer the idea of sticking with the infrastructure that vctrs offers for vector/list-style S3 objects.

@elipousson
Copy link
Author

elipousson commented Jun 10, 2023

As some food for thought on alternate structures for representing Airtable base data in R, I put together a function that converts a base into a dm object (created with the dm package).

library(rairtable)
library(rlang)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:rlang':
#> 
#>     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
#>     flatten_raw, invoke, splice

options("rairtable.pat_default" = "TEST_AIRTABLE_PAT")

dm_airtable <- function(url = NULL, ..., cell_format = "json") {
  base <- rairtable:::get_base_id(
    url = url, ...
  )
  
  table_list <- list_base_tables(base = base)
  
  table_fields <-
    map(
      seq_along(table_list[["name"]]),
      function(i) {
        tbl_fields <- table_list[["fields"]][[i]]
        
        tbl_options <- tbl_fields[["options"]]
        
        vctrs::new_data_frame(
          list(
            "name" = tbl_fields[["name"]],
            "id" = tbl_fields[["id"]],
            "is_primaryFieldId" = tbl_fields[["id"]] %in% table_list[["primaryFieldId"]],
            "linkedTableId" = tbl_options[["linkedTableId"]],
            "prefersSingleRecordLink" = tbl_options[["prefersSingleRecordLink"]],
            "inverseLinkFieldId" = tbl_options[["inverseLinkFieldId"]]
          )
        )
      }
    )
  
  table_fields <- set_names(
    table_fields,
    table_list[["name"]]
  )
  
  table_pk_cols <-
    map(
      table_fields,
      function(x) {
        x[x[["is_primaryFieldId"]], ][["name"]]
      }
    )
  
  table_pk_cols <- as.character(table_pk_cols)
  
  table_records <- map(
    table_list[["id"]],
    function(id) {
      list_records(
        base = base,
        table = id,
        cell_format = cell_format
      )
    }
  )
  
  id_col <- getOption("rairtable.id_col", "airtable_record_id")
  
  base_dm <- dm::new_dm(
    tables = set_names(
      table_records,
      table_list[["name"]]
    )
  )
  
  table_names <- set_names(table_list[["name"]], table_list[["id"]])
  
  for (i in seq_along(table_names)) {
    tbl_name <- table_names[[i]]
    tbl_pk_col <- table_pk_cols[[i]]
    
    base_dm <-
      dm::dm_add_pk(
        dm = base_dm,
        table = !!sym(tbl_name),
        columns = !!sym(tbl_pk_col)
      )
    
    linked_fields <-
      table_fields[[i]][!is.na(table_fields[[i]][["linkedTableId"]]), ]
    
    if (nrow(linked_fields) > 0) {
      for (r in c(1:nrow(linked_fields))) {
        ref_table <- table_names[[linked_fields[["linkedTableId"]][[r]]]]
        
        # ref_col <- id_col
        
        # if (cell_format != "json") {
          # Compare table_names and table_pk_cols to identify the primaryfieldID for the ref_table
          ref_table_fields <-
            table_fields[[ref_table]]
          
          inverse_link_field <-
            linked_fields[["inverseLinkFieldId"]][[r]]
          
          
          if (!any(is.na(inverse_link_field))) {
            ref_col <-
              ref_table_fields[ref_table_fields[["id"]] == inverse_link_field, ][["name"]]
          } else {
            ref_col <- tbl_pk_col
          }
        # }
        
        base_dm <-
          dm::dm_add_fk(
            dm = base_dm,
            table = !!sym(tbl_name),
            columns = !!sym(linked_fields[["name"]][[r]]),
            ref_table = !!sym(ref_table),
            ref_columns = !!sym(ref_col)
          )
      }
    }
  }
  
  base_dm
}

# https://airtable.com/shrJ4mMhUfh2hD5Ew
rairtable_dm <- dm_airtable(
  url = "https://airtable.com/appWAFKLoOMO4HzOD/tblzdIsyVI8TrAgca/viwcKRa10BYSBRee4?blocks=hide"
  )

rairtable_dm |> 
  dm::dm_draw(view_type = "all")

Created on 2023-06-09 with reprex v2.0.2

@elipousson
Copy link
Author

A few more ideas based on my continued work with table configurations and models for the pull request:

  • airtable_field_schema should be renamed airtable_field_model and should include field names, ids, types, and options (for consistency with the Airtable Web API documentation)
  • airtable_table_schema should be renamed airtable_table_model (again for consistency)
  • The airtable function should be able to take a airtable_table_schema/airtable_table_model object and create an airtable object. This would allow airtable_base to simply return an airtable_base_schema object rather than the current arrangement where it returns a list with a base ID string, an airtable_base_schema, and a list of airtable objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant