-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
589/add year selector #642
base: develop
Are you sure you want to change the base?
Conversation
I'm not understanding what the 'Year' dropdown in the filter section does. Does that mean year of employment? |
Yes, so for example, if someone were trying to find an officer they interacted with in 2018, they could select "2018" in the dropdown. @camfassett pointed out in this issue that as we obtain rosters for 2019, we want historic data to remain accessible to the public. It's a good way to see how officer's assignments or roster sizes change over time. |
Could someone please review this when they get a chance? Thanks! 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! I left a couple comments, mostly about avoiding static lists of years. Let me know what you think about that.
56a7759
to
f606b88
Compare
@r4v5 @redshiftzero Could one of you please review this PR? @dismantl already reviewed it once and his main feedback was to remove the hardcoded lists of years and to instead use a |
f606b88
to
6a8e1c7
Compare
48e1e60
to
b4fbab1
Compare
|
b4fbab1
to
1704181
Compare
dd010ad
to
bb076a1
Compare
OpenOversight/app/utils.py
Outdated
@@ -62,6 +62,14 @@ def dept_choices(): | |||
return db.session.query(Department).all() | |||
|
|||
|
|||
def year_choices(): | |||
years = [year[0] for year in db.session.query(cast(func.extract('year', Officer.employment_date), Integer)).filter(Officer.employment_date.isnot(None)).order_by(func.extract('year', Officer.employment_date))] | |||
years_deduped = list(set(years)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this isn't a super efficient way to dedupe the years. Please let me know if anyone can think of a more performant approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just hardcode the years to be from like the year we founded OO or something (open to discussion how far back to go), to the current year?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the current code isn't ideal, I prefer it to hardcoding a list of years. I think hardcoding is brittle and vulnerable to breakage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand what exactly we want to get here. I think whatever years we offer to select should be an uninterrupted range. If it was 1990, 1995, 2003 and 2010 a user would probably be somewhat confused. If you want 2005, do you select 2003 or 2010? I would either hardcode the start year (maybe 2014 or something, I don't think we have much data from before that) and go until current year or take the lowest that's in the database and offer every year from that year on until current year
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal with year_choices()
is to display every year for which a department has roster information.
take the lowest that's in the database and offer every year from that year on until current year
I went for this approach.
bb076a1
to
2dc4ef7
Compare
Still working on my in-depth code review here, but I noticed in the meantime that the "year" selector only appears in the filter bar on the left-hand side of the officer listing page, but not on the "find an officer" page (e.g, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review done! Sorry if this is too nitpicky, I guess that's what happens when you do a line-by-line review. Also github says there's a merge conflict so we need to fix that too. From a high level this looks like a great new feature, thanks @McEileen
OpenOversight/app/main/forms.py
Outdated
@@ -202,6 +202,9 @@ class AddOfficerForm(Form): | |||
query_factory=unit_choices, get_label='descrip', | |||
allow_blank=True, blank_text=u'None') | |||
employment_date = DateField('Employment Date', validators=[Optional()]) | |||
last_employment_date = DateField('Last Employment Date', validators=[Optional()]) | |||
last_employment_details = StringField('Last Employment Details', default='', validators=[ | |||
Regexp(r'\w*'), Length(max=50), Optional()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we restricting this comment to just alphanumeric characters? What about punctuation, etc? It seems like a length-limited text field should be pretty open ended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought was, "shoot, you're right." Then I tested this and was able to successfully enter last employment details with punctuation.
When I changed the validator from Regexp(r'\w*')
to Regexp(r'\w')
, then I was no longer able to enter punctuation. It appears that the wildcard adds flexibility to this regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that Regexp(r'\w*')
doesn't do anything. It matches literally any string. Regexp(r'\w')
only fails if the first character of the string is punctuation. So the question here really is which strings should be allowed and which shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my 2019 self used this regex because it's being used in most of the other string fields in main/forms.py
. Since @abandoned-prototype is right that it matches literally any string, I changed the regex to require alphanumeric characters in its input. The field now allows punctuation, but will throw an error if the input is only punctuation.
OpenOversight/app/main/forms.py
Outdated
@@ -248,6 +251,9 @@ class EditOfficerForm(Form): | |||
gender = SelectField('Gender', choices=GENDER_CHOICES, coerce=lambda x: x or None, | |||
validators=[AnyOf(allowed_values(GENDER_CHOICES))]) | |||
employment_date = DateField('Employment Date', validators=[Optional()]) | |||
last_employment_date = DateField('Last Employment Date', validators=[Optional()]) | |||
last_employment_details = StringField('Last Employment Details', default='', validators=[ | |||
Regexp(r'\w*'), Length(max=50), Optional()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto the above comment about alphanumeric chars
((salary.overtime_pay and 'overtime_pay' in row and | ||
Decimal('%.2f' % salary.overtime_pay) == Decimal('%.2f' % float(row['overtime_pay']))) or | ||
(not salary.overtime_pay and ('overtime_pay' not in row or not row['overtime_pay']))): | ||
((salary.overtime_pay and 'overtime_pay' in row and Decimal('%.2f' % salary.overtime_pay) == Decimal('%.2f' % float(row['overtime_pay']))) or (not salary.overtime_pay and ('overtime_pay' not in row or not row['overtime_pay']))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional was super confusing before and is super duper confusing now! Not gonna lie I have no idea what's going on here. Do you think you could shed some light w/ a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only appears in the diff because of a formatting change. Since I didn't write the logic, I'm not the right person to add an explanatory comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is comparing the imported salary with the officer's existing salaries, due to type and formatting differences.
OpenOversight/app/main/views.py
Outdated
@@ -481,12 +476,15 @@ def list_officer(department_id, page=1, race=[], gender=[], rank=[], min_age='16 | |||
if request.args.get('rank') and all(rank in rank_choices for rank in request.args.getlist('rank')): | |||
form_data['rank'] = request.args.getlist('rank') | |||
|
|||
if request.args.get('year') and request.args.get('year') in year_choices(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this was already happening the code base before you added this line, but I've always learned to use is not None
to check for None
in Python, as opposed to relying on the falsy-ness of None
. This is recommended by PEP 8 (see the 2nd bullet point under the heading): https://www.python.org/dev/peps/pep-0008/#programming-recommendations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
OpenOversight/app/utils.py
Outdated
@@ -62,6 +62,14 @@ def dept_choices(): | |||
return db.session.query(Department).all() | |||
|
|||
|
|||
def year_choices(): | |||
years = [year[0] for year in db.session.query(cast(func.extract('year', Officer.employment_date), Integer)).filter(Officer.employment_date.isnot(None)).order_by(func.extract('year', Officer.employment_date))] | |||
years_deduped = list(set(years)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just hardcode the years to be from like the year we founded OO or something (open to discussion how far back to go), to the current year?
OpenOversight/app/utils.py
Outdated
@@ -300,6 +310,15 @@ def filter_by_form(form, officer_query, department_id=None): | |||
officer_query = officer_query.filter(db.or_(db.and_(Officer.birth_year <= min_birth_year, | |||
Officer.birth_year >= max_birth_year), | |||
Officer.birth_year == None)) # noqa | |||
if form.get('year') and form.get('year') in year_choices(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above w/ regard to checking the truthyness of None
OpenOversight/tests/conftest.py
Outdated
@@ -81,17 +82,29 @@ def pick_salary(): | |||
return random.randint(100, 100000000) / 100 | |||
|
|||
|
|||
def generate_random_date(year, month): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm randomness in tests seems bad. What about picking a random number once and then hardcoding that into the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless, if this is standard practice in our test suite already, then I guess it's fine to keep doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay because a lot of the other methods in conftest.py
, like pick_birth_date()
or pick_gender()
, use random.randint()
or random.choice()
. Let me know if you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that we do this in a lot of places and I think it is actually a problem. I saw an issue with the randomness of our test-data just yesterday.
But given that it is already so widespread I guess it's fine to keep doing it for now. And if it's just the actual value randomness is not as bad as deciding if a field is None or not based on randomness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #756 offers an alternate implementation that might have better random distribution since it hashes the officer name: https://github.com/lucyparsons/OpenOversight/pull/756/files#diff-4eef8ece740b94822be839e43d28d7a7R42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw an issue with the randomness of our test-data just yesterday.
What was the problem you saw with test data randomness? Is it something that we need to open an issue for?
I can use the implementation from PR #756, but I would also like to know more about the problems with our existing use of random test data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open an issue on the concrete problem I saw and probably add some ideas I have for moving forward
@@ -118,6 +119,86 @@ def test_filter_by_partial_unique_internal_identifier_returns_officers(mockdata) | |||
assert returned_identifier == identifier | |||
|
|||
|
|||
def test_year_filter_selects_officer_who_quit_that_year(mockdata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you covered a lot of corner cases, nice job
Yep, I only put the "year" selector on the browse page, since that's where the original issue, #589, requested it. Still, thanks for opening (ba-dum tiss?) my eyes to this. |
2dc4ef7
to
34972cc
Compare
I responded to feedback. Here's a video of the feature in action. |
34972cc
to
8eef74c
Compare
I responded to feedback. Thanks to everyone who offered their thoughts. |
def pick_date(seed: bytes = None, start_year=2000, end_year=2020): | ||
# source: https://stackoverflow.com/questions/40351791/how-to-hash-strings-into-a-float-in-01 | ||
# Wanted to deterministically create a date from a seed string (e.g. the hash or uuid on an officer object) | ||
from struct import unpack | ||
from hashlib import sha256 | ||
|
||
def bytes_to_float(b): | ||
return float(unpack('L', sha256(b).digest()[:8])[0]) / 2 ** 64 | ||
|
||
if seed is None: | ||
seed = str(uuid.uuid4()).encode('utf-8') | ||
|
||
return datetime.datetime(start_year, 1, 1, 00, 00, 00) \ | ||
+ datetime.timedelta(days=365 * (end_year - start_year) * bytes_to_float(seed)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach. If we want to go with this I think we should put the bytes_to_float
function in a utils-file are something so that we can use it everywhere, and I suggest rewriting the function as
def bytes_to_float(seed = None):
if seed is None:
seed = uuid.uuid4()
return float(unpack('L', sha256(str(seed).encode('utf-8')).digest()[:8])[0]) / 2 ** 64
that way any object that has a str
function can be used as seed.
if we want the result to be deterministic we have to make sure that the input is deterministic too. In this case it seems that the last name is used, which is unfortunately randomly chosen.
oldest_year = db.session.query(cast(func.extract('year', Officer.employment_date), Integer)) \ | ||
.filter(Officer.employment_date.isnot(None)) \ | ||
.order_by(func.extract('year', Officer.employment_date)) \ | ||
.limit(1).first()[0] | ||
current_year = datetime.datetime.now().year | ||
year_choices = [str(year) for year in range(oldest_year, current_year + 1)] | ||
return year_choices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I think for the oldest year we should use a SELECT min(employment_date) FROM officers
to make it as efficient as possible. I looked up how to do that in sqlalchemy and it seems
oldest_year = db.session.query(func.min(Officer.employment_date)).scalar().year
should be do the trick.
What do we want to do in the case that there is no officer with a (not NULL) employment date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep coming back to this and I am think we should just hard-code the oldest_year
. I think we now have some officers in the database whose employment_date is before 1950, which will make the drop-down menu pretty long and we really don't have good data for those year anyways. So my suggestion would be to hard-code it to 1980 or something, but I am also fine with the suggestion I made, if we can come up with a solution for an officer table without any non-NULL employment dates.
{% endif %} | ||
{% if officer.last_employment_details %} | ||
<tr> | ||
<td><b>Last Employment Date Details</b></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the label that we display to the users to something that is easier understood? Like "Details on Resignation" or "Details on End of Employment" or something? Not exactly sure how this field will be used though
Status
Ready for review
Description of Changes
Fixes #604 and #589 .
Changes proposed in this pull request:
last_employment_date
andlast_employment_details
fields to all officers.last_employment_date
andlast_employment_details
fields to the add/edit officer forms.year
dropdown selector to the page that displays all of a department's officers.Notes for Deployment
Screenshots (if appropriate)
Tests and linting
I have rebased my changes on current
develop
pytests pass in the development environment on my local machine
(There are two failing tests,
test_admin_can_disable_user
andtest_lastname_capitalization
, however, they were failing before this PR.flake8
checks pass