Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DIBELS data, and potentially the import process #1940

Open
kevinrobinson opened this issue Jul 24, 2018 · 6 comments
Open

Improve DIBELS data, and potentially the import process #1940

kevinrobinson opened this issue Jul 24, 2018 · 6 comments

Comments

@kevinrobinson
Copy link
Contributor

Currently, the DibelsRow class ignores subjects, and collapses everything together. New Bedford and Somerville export data in different shapes, so this impacts them differently.

We may want to consider solving this differently, and simplifying the human flow of data from reading teacher > Insights, rather than routing this through IT, Aspen, export, import as well.

Somerville

For somerville, DibelsRows mostly makes sense, since the "subject" is redundant with the "name" and just indicates what time of the year the assessment was (eg, fall / winter / spring). We throw this away in the import, but it's encoded in the assessment_name. for somerville, it also looks like the date taken might not be inaccurate - all records are set to the same handful of dates across years (eg, 2017-01-01). This could be happening on UH or JB's end but seems like it probably doesn't reflect the actual assessment date. See emails "Winter DIBELS for Upload to Aspen" and "Spring DIBELS and F&P Data for Aspen Import" for more, although these don't represent all exchanges, and the format for data exchange has changed over time. All these are generated from a "master" XLS that UH has.

assessment_growth is a collapsing of several rows in the format "NC: 28,NW: 8,O: 29", and we need to investigate if there are bugs there - spot checking Somerville student #5475, it doesn't match the "Spring DIBELS and F&P Data for Aspen Import" spreadsheet. The CSV says this is "NWFCS" - Nonsense Word Fluency-Correct Sounds, but Insights says this is "PSF" - Phoneme Segmentation Fluency. for this student, it's the difference between being at "K-End" benchmark or the "1-Mid" benchmark. Based on that, we should spot check these some more, and see how prevalent they are.

New Bedford

For new bedford, they use assessment_subject to store things like "PSF-Phoneme Segmentation" and just put "DIBELS" in the "assessment_name" field. And since we consider the "subject" part of the Assessment record, but throw it away during import in DibelsRow, we don't have this information.

We should fix this, by either uploading directly to insights in a particular format, or rewriting DibelsRow and having it work differently by district, while also splitting DibelsAssessment out into its own Insights table.

Also, New Bedford DIBELs data isn't working correctly yet for another reason. There's a validation on StudentAssessment and it says "unique data point for student and assessment_id"
The Dibels#find_assessment_id code relies on this, using find
...but since there's only one DIBELS Assessment in Insights...
...and NB has different dibels data points, all on the same day...
...this collapses all kinds of data points on top of each other...
...and when combined with #find_assessment_id...
This means that the importer reads a row, finds the assessment, blows away the scores. And then does this for each DIBELS record for a student. If the ordering is consisent and defined, this would be idempotent But it's not necessarily, so depending on the sort order of the export, we could get trashing on these scores on every import job. Email KR for a test case.

@kevinrobinson
Copy link
Contributor Author

Looking at app[run.3869] there are ~2000 lines in the logs with things like:

DibelsRow: couldn't parse DIBELS benchmark: Benchmark
DibelsRow: couldn't parse DIBELS benchmark: Check ISF
DibelsRow: couldn't parse DIBELS benchmark: Benchmark
DibelsRow: couldn't parse DIBELS benchmark: Benchmark
DibelsRow: couldn't parse DIBELS benchmark: Benchmark
DibelsRow: couldn't parse DIBELS benchmark: #DIV/0!

@alexsoble can you check out this issue and update this to reflect the latest status based on the awesome work you did in #1969 and #1973?

@alexsoble
Copy link
Member

Those lines are for incoming DIBELS rows that do not have a benchmark score that matches "CORE", "STRATEGIC", or "INTENSIVE". I sent an email to Uri and you asking if we should do anything differently for rows where Benchmark == "Benchmark".

Somerville

DIBELS data is now stored in its own table, which strong validations both at the database level and the Rails level. This means no more invalid data.

There are now no more "#DIV/0!" or other invalid values anywhere in the product. The tradeoff is that subtest results don't show up in the Student Profile like they used to. But subtest results are now stored in our database separately from benchmark scores, and we could easily visualize them. We could even do a "hover for subtest results" or "click for subtests" UI element for DIBELS when subtest results exist. Note that all this applies just to Somerville.

New Bedford

All DIBELS data has been deleted from the New Bedford instance, since we still need to get handle on how best to import, store, and visualize New Bedford's DIBELS data.

~

@kevinrobinson shall we close this issue and make a new one that says "Import and display DIBELS data for New Bedford"?

@alexsoble
Copy link
Member

Update from Uri: If it says Benchmark that’s another word for Core (wonder if that’s what they use in New Bedford)

@kevinrobinson
Copy link
Contributor Author

@alexsoble awesome, thanks! 👍

I think there's more work here for Somerville too that is described in the issue, and my understanding is that there's still some bits we'd ned to do to finish those off. I'll write out what I see so I can remember later, but I think let's consider this handed-off and thanks for updating this!

Somerville

  • the bug case mentioned above for Somerville student 5475 looks good!
  • store the test "number" during the year, exported in assessment_name and assessment_subject in the export
  • date_taken seems like date uploaded, clarify whether this is accurate or not (esp. for when graphing next to services vs. something rougher like the test "number").
  • investigate scores that say Check - ($ cat assessment_export.txt | grep "Check " or see comment for logging during import process. These look like they are students being flagged for follow-up on particular subtests, but also that this convention hasn't been used in ~8 years so probably good to explicitly drop these records.
  • investigate scores that are #DIV/0! ($ cat assessment_export.txt | grep "DIV/ " or see comment for logging during import process. This looks like it is bad data coming from Aspen (or the spreadsheet) and only from one point 5 years ago, so probably good to explicitly drop these records.

Subtest scores

  • subtest scores are stored in DibelsResult#subtest_scores (with the overall score category stripped) but aren't processed or validated yet. If we want to show these, we need to split them out. This may be simpler to approach by just reading the spreadsheet directly, since these string formats are formed from the Excel code, and might be simpler to just get the spreadsheet of these directly and remove the layers of processing.

New Bedford

  • preserve exported data about subtests - if we imported as-is with the new table formats, we don't have a place to store the numeric score or subtest name. So doing this needs a Somerville migration, separate table by district, or figuring out a standard format for getting data on subtests, and removing some of the collapsing that's happening upstream in the Excel code that we then have to unwind.

@kevinrobinson
Copy link
Contributor Author

This is quite stale, a newer thread of work is related to #2472.

@kevinrobinson
Copy link
Contributor Author

To finish this off, we should deprecate code related to DibelsResult and consider removing it altogether. This would simplify for educators given other reading work happen, but the first step is deprecation comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants