Process IEP documents to extract student strengths, parent vision and dates #1937

kevinrobinson · 2018-07-24T15:01:19Z

The intent here is to make these most important pieces for the start of the school year information more visible on the profile page, in line with work towards student voice and "best light." Other bits of the IEP, like goals and service grid are out of scope. This is a Somerville-only feature.

Here's a draft of what might be involved:

Update IepImportJob to track and log total counts for PDFs processed, IepDocument records updated, and records created. This will involve hashing the document to see if it has changed, and storing that hash in IepDocument, similar to the approach with importing student photos.
Update IepImportJob to collect each newly created or updated IepDocument
Write a function that takes an IepDocument that has a PDF stored in S3, and extracts the student strengths, parent vision and dates from the PDF document. A starting point might be https://github.com/studentinsights/studentinsights/compare/feature/iep-at-a-glance-parsing, or some plain Ruby library.
Update IepImportJob to add a step at the end that accepts the list of created or updated IepDocuments and for each one and extracts the student strengths, parent vision, dates and stores that in the IepDocument record. Count and log the total count of IepDocument records that have changed, and count and log the PDF documents that couldn't be processed. Also send a Rollbar warning for each IepDocument record that could not be processed, sending the id without any other data.
At this point, we should know which documents can't be parsed, and can work to get that number down to zero.
Ship import job, run it, verify manually the documents for the past week.
Run a backfill on past documents, checking a sample to verify them manually. Additionally, the logging and Rollbar warnings should report any documents that aren't processed successfully.

The text was updated successfully, but these errors were encountered:

kevinrobinson · 2019-09-27T19:15:17Z

Some of this was done in https://github.com/studentinsights/studentinsights/blob/master/app/lib/iep_text_parser.rb, where documents are parsed on-demand. Performance is fine, but the main issue is variability in the data format.

That work was motivated by looking for information relevant to reading for the reader profile.

kevinrobinson added the production label Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process IEP documents to extract student strengths, parent vision and dates #1937

Process IEP documents to extract student strengths, parent vision and dates #1937

kevinrobinson commented Jul 24, 2018

kevinrobinson commented Sep 27, 2019 •

edited

Process IEP documents to extract student strengths, parent vision and dates #1937

Process IEP documents to extract student strengths, parent vision and dates #1937

Comments

kevinrobinson commented Jul 24, 2018

kevinrobinson commented Sep 27, 2019 • edited

kevinrobinson commented Sep 27, 2019 •

edited