Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process IEP documents to extract student strengths, parent vision and dates #1937

Open
kevinrobinson opened this issue Jul 24, 2018 · 1 comment

Comments

@kevinrobinson
Copy link
Contributor

The intent here is to make these most important pieces for the start of the school year information more visible on the profile page, in line with work towards student voice and "best light." Other bits of the IEP, like goals and service grid are out of scope. This is a Somerville-only feature.

Here's a draft of what might be involved:

  • Update IepImportJob to track and log total counts for PDFs processed, IepDocument records updated, and records created. This will involve hashing the document to see if it has changed, and storing that hash in IepDocument, similar to the approach with importing student photos.
  • Update IepImportJob to collect each newly created or updated IepDocument
  • Write a function that takes an IepDocument that has a PDF stored in S3, and extracts the student strengths, parent vision and dates from the PDF document. A starting point might be https://github.com/studentinsights/studentinsights/compare/feature/iep-at-a-glance-parsing, or some plain Ruby library.
  • Update IepImportJob to add a step at the end that accepts the list of created or updated IepDocuments and for each one and extracts the student strengths, parent vision, dates and stores that in the IepDocument record. Count and log the total count of IepDocument records that have changed, and count and log the PDF documents that couldn't be processed. Also send a Rollbar warning for each IepDocument record that could not be processed, sending the id without any other data.
  • At this point, we should know which documents can't be parsed, and can work to get that number down to zero.
  • Ship import job, run it, verify manually the documents for the past week.
  • Run a backfill on past documents, checking a sample to verify them manually. Additionally, the logging and Rollbar warnings should report any documents that aren't processed successfully.
@kevinrobinson
Copy link
Contributor Author

kevinrobinson commented Sep 27, 2019

Some of this was done in https://github.com/studentinsights/studentinsights/blob/master/app/lib/iep_text_parser.rb, where documents are parsed on-demand. Performance is fine, but the main issue is variability in the data format.

That work was motivated by looking for information relevant to reading for the reader profile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant