INF-427: Download country/institution data in CSV format #56

alexmassen-hane · 2022-11-29T07:56:03Z

Use fetch to grab the json from Cloudflare KV storage, parse it to CSV and allow it to be downloaded when user clicks the button.

Need to figure out a few kinks with how Cloudflare worker-api gives it's data.

alexmassen-hane · 2023-01-04T03:02:11Z

Modified the API worker to pull the entity data from KV storage, parse the timeseries and repository data into a CSV like arrays and then zip them up for download when the link of /api/download/[entity.type]/[entity.id] is hit.

The clickable download link for the zip on the COKI OA website for the specific country or institution appears as

alexmassen-hane · 2023-01-04T05:45:43Z

Instead including the schemas in a separate file in the zip, I have just added a link to the data page for the user to investigate. Also shortened the names of the CSVs to "{entity.id}_repositories.csv" and "{entity.id}_yearly_aggregated_stats.csv" .

alexmassen-hane · 2023-02-07T06:42:18Z

Having to force flexsearch to stay at v0.7.2 as there are issues with type definitions in v0.7.3. When the issue has been fixed we can considering upgrading.

nextapps-de/flexsearch#342
nextapps-de/flexsearch#364

Also added necessary config so it will ignore library files when doing the yarn types:check step in the tests, as there are conflicts with the current node and cloudflare type definitions.

jdddog

Hey Alex, thanks for the PR, it is looking good, the main feedback is about:

Using the MetadataLink object for the download UI functionality.
Using routing features from itty-router so that you don't have to manually parse route parameters.
And using a JSON to CSV converter library as it will reduce the amount of code that we have to write for conversion and handle things such as quoting when commas are in strings etc.

jdddog · 2023-02-14T03:35:32Z

workers-api/src/types.ts

@@ -105,3 +122,17 @@ export interface EntityRequest extends Request {
  };
  query: {};
 }
+
+export interface dataRequest extends Request {


As dataRequest is an interface for an object rather than a function it should use CamelCase.

Also take a look at how the params for EntityRequest is defined so that you don't have to manually parse the entityType and id fields in downloadDataZipHandler.

workers-api/src/types.ts

workers-api/src/router.test.ts

workers-api/src/router.ts

workers-api/src/downloadDataZip.ts

e2e/downloadZIP.spec.ts

jdddog · 2023-02-14T04:38:56Z

workers-api/src/downloadDataZip.ts

+export const parseEntityTimeseriesToCSV = async (entityData: Array<Year>) => {
+  const dataToCSV = [];
+
+  // Obtain headers from the Entity data.
+  // Scan through all years for columns that exist.
+  var numColumns = 0;
+  var columnHeaders: Array<string> = ["year"];
+  for (const dataLine of entityData) {
+    if (numColumns < Object.keys(dataLine.stats).length) {
+      numColumns = Object.keys(dataLine.stats).length;
+      for (const column in dataLine.stats) {
+        columnHeaders.push(`${column}`);
+      }
+    }
+  }
+  dataToCSV.push(columnHeaders);
+
+  // Transform data from the enitity object to csv like array
+  var dataLineTimeSeries: Array<string> = [""];
+  for (const entityTimeSeriesData of entityData) {
+    const statsArray = Object.keys(entityTimeSeriesData.stats).map(key => `${entityTimeSeriesData.stats[key]}`);
+    dataLineTimeSeries = [`${entityTimeSeriesData.year}`].concat(statsArray);
+    dataToCSV.push(dataLineTimeSeries);
+  }
+
+  // Add commas, line breaks and join into large string
+  let csvContent: string = dataToCSV.map(e => e.join(",")).join("\n");
+
+  return csvContent;
+};
+
+export const parseEntityRepositoriesToCSV = async (entityData: Array<Repository>) => {
+  const dataToCSV = [];
+
+  // Obtain headers from the Entity data.
+  var numColumns = 0;
+  var columnHeaders: Array<string> = [];
+  for (const dataLine of entityData) {
+    if (numColumns < Object.keys(dataLine).length) {
+      numColumns = Object.keys(dataLine).length;
+      for (const column in dataLine) {
+        columnHeaders.push(`${column}`);
+      }
+    }
+  }
+  dataToCSV.push(columnHeaders);
+
+  // Transform data from the enitity object to csv like array
+  // Some of the repository names have commas in them and need to be removed
+  for (const entityRepositories of entityData) {
+    const repository = Object.keys(entityRepositories).map(key => `${entityRepositories[key]}`.replaceAll(",", ""));
+    dataToCSV.push(repository);
+  }
+
+  // Add the commas and line breaks
+  let csvContent: string = dataToCSV.map(e => e.join(",")).join("\n");
+
+  return csvContent;
+};


I feel like there would be less code to maintain and it would be more robust to use a JSON to CSV library, e.g. https://juanjodiaz.github.io/json2csv/#/parsers/parser. You could use the Parser object.

It would be more robust because they already handle string quoting when there are commas in the repository names etc.

workers-api/src/downloadDataZip.ts

jdddog · 2023-02-14T05:23:06Z

components/details/DownloadDataLink.tsx

+        if (data != undefined) {
+          var a = document.createElement("a");
+          a.href = window.URL.createObjectURL(data);
+          a.download = `COKI_data_${entity.entity_type}_${entity.id}.zip`;


Could we call the file Country name + COKI Dataset.zip, e.g "Mali COKI Dataset.zip"?

components/details/DownloadDataLink.tsx

jdddog

Hey Alex, the changes look great. It looks like Typescript support is coming for the json2csv package, so we can update it when that is released.

I've left a few smaller comments. I also need to do a few things, including making the icon, checking about the authors and license of the jszip library.

workers-api/jest.config.js

jdddog · 2023-03-06T21:29:02Z

workers-api/package.json

@@ -35,8 +37,11 @@
    "typescript": "^4.4.4"
  },
  "dependencies": {
-    "flexsearch": "^0.7.21",
+    "@json2csv/node": "^6.1.2",


It should be @json2csv/plainjs rather than @json2csv/node.

jdddog · 2023-03-06T21:29:49Z

workers-api/package.json

    "itty-router": "^2.6.6",
+    "jszip": "^3.10.1",


Apache might be compatible with GPL, but GPL is not compatible with Apache, so I need to check with Cameron to make sure the dual licensing is fine.

workers-api/src/json2csv.d.ts

workers-api/src/downloadZip.ts

workers-api/src/downloadZip.test.ts

Co-authored-by: Alex Massen-Hane <alex_mh23@outlook.com> Co-authored-by: Jamie Diprose <5715104+jdddog@users.noreply.github.com>

alexmassen-hane changed the title ~~Allow users to download data from the website for each country and institution~~ INF-427: Download country/institution data in CSV format Nov 29, 2022

alexmassen-hane force-pushed the INF-427-download-csv branch 2 times, most recently from eb74fa4 to c4dca76 Compare January 4, 2023 02:15

alexmassen-hane force-pushed the INF-427-download-csv branch from 01ab518 to 634a894 Compare January 13, 2023 03:40

alexmassen-hane force-pushed the INF-427-download-csv branch from 6b944fa to 03c4b62 Compare February 3, 2023 06:18

jdddog marked this pull request as ready for review February 13, 2023 23:17

jdddog self-requested a review February 13, 2023 23:18

jdddog requested changes Feb 14, 2023

View reviewed changes

jdddog self-requested a review March 6, 2023 21:23

jdddog requested changes Mar 6, 2023

View reviewed changes

jdddog force-pushed the INF-427-download-csv branch 3 times, most recently from 81a6984 to a6d46a0 Compare April 3, 2023 00:42

CSV download feature

692f613

Co-authored-by: Alex Massen-Hane <alex_mh23@outlook.com> Co-authored-by: Jamie Diprose <5715104+jdddog@users.noreply.github.com>

jdddog force-pushed the INF-427-download-csv branch from a6d46a0 to 692f613 Compare April 3, 2023 01:18

jdddog merged commit 9d3ce36 into develop Apr 3, 2023
3 checks passed

jdddog deleted the INF-427-download-csv branch September 12, 2023 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INF-427: Download country/institution data in CSV format #56

INF-427: Download country/institution data in CSV format #56

alexmassen-hane commented Nov 29, 2022

alexmassen-hane commented Jan 4, 2023

alexmassen-hane commented Jan 4, 2023 •

edited

alexmassen-hane commented Feb 7, 2023

jdddog left a comment

jdddog Feb 14, 2023

jdddog Feb 14, 2023

jdddog Feb 14, 2023

jdddog left a comment

jdddog Mar 6, 2023

jdddog Mar 6, 2023

INF-427: Download country/institution data in CSV format #56

INF-427: Download country/institution data in CSV format #56

Conversation

alexmassen-hane commented Nov 29, 2022

alexmassen-hane commented Jan 4, 2023

alexmassen-hane commented Jan 4, 2023 • edited

alexmassen-hane commented Feb 7, 2023

jdddog left a comment

Choose a reason for hiding this comment

jdddog Feb 14, 2023

Choose a reason for hiding this comment

jdddog Feb 14, 2023

Choose a reason for hiding this comment

jdddog Feb 14, 2023

Choose a reason for hiding this comment

jdddog left a comment

Choose a reason for hiding this comment

jdddog Mar 6, 2023

Choose a reason for hiding this comment

jdddog Mar 6, 2023

Choose a reason for hiding this comment

alexmassen-hane commented Jan 4, 2023 •

edited