Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

589/add year selector #642

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open

589/add year selector #642

wants to merge 6 commits into from

Conversation

McEileen
Copy link
Collaborator

Status

Ready for review

Description of Changes

Fixes #604 and #589 .

Changes proposed in this pull request:

  • Add last_employment_date and last_employment_details fields to all officers.
  • Add last_employment_date and last_employment_details fields to the add/edit officer forms.
  • Add a year dropdown selector to the page that displays all of a department's officers.

Notes for Deployment

Screenshots (if appropriate)

YearSelector

Tests and linting

  • I have rebased my changes on current develop

  • pytests pass in the development environment on my local machine
    (There are two failing tests, test_admin_can_disable_user and test_lastname_capitalization, however, they were failing before this PR.

  • flake8 checks pass

@dismantl
Copy link
Member

I'm not understanding what the 'Year' dropdown in the filter section does. Does that mean year of employment?

@McEileen
Copy link
Collaborator Author

Yes, so for example, if someone were trying to find an officer they interacted with in 2018, they could select "2018" in the dropdown. @camfassett pointed out in this issue that as we obtain rosters for 2019, we want historic data to remain accessible to the public. It's a good way to see how officer's assignments or roster sizes change over time.

@McEileen
Copy link
Collaborator Author

Could someone please review this when they get a chance? Thanks! 🙏

Copy link
Member

@dismantl dismantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this! I left a couple comments, mostly about avoiding static lists of years. Let me know what you think about that.

OpenOversight/app/main/forms.py Outdated Show resolved Hide resolved
.vscode/settings.json Outdated Show resolved Hide resolved
OpenOversight/app/utils.py Outdated Show resolved Hide resolved
OpenOversight/migrations/versions/f01cb1bc6293_.py Outdated Show resolved Hide resolved
@McEileen McEileen mentioned this pull request Apr 22, 2019
3 tasks
@McEileen McEileen force-pushed the 589/add-year-selector branch 2 times, most recently from 56a7759 to f606b88 Compare June 21, 2019 21:45
@McEileen
Copy link
Collaborator Author

@r4v5 @redshiftzero Could one of you please review this PR? @dismantl already reviewed it once and his main feedback was to remove the hardcoded lists of years and to instead use a QuerySelectField. Thank you! 🙏

@McEileen McEileen changed the title 589/add year selector [WIP] 589/add year selector Feb 24, 2020
@McEileen
Copy link
Collaborator Author

McEileen commented Feb 24, 2020

I found a bug in this PR where officers that started working for a department after the year selected would still appear in results. For example, if a user selected "1970" from the drop-down selector, an officer that started working in "2005" would still appear in the results.
I will fix this bug and then ping someone for a re-review.

The PR is ready for re-review!
I fixed the bug where the year selector returned officers who worked for the department after the selected year. I also fixed a bug where adding a new officer to the department would cause the year_choices() method to return duplicate years.
@redshiftzero @r4v5 @dismantl @b-meson @brianmwaters Your thoughts are appreciated.

@McEileen McEileen changed the title [WIP] 589/add year selector 589/add year selector Mar 13, 2020
@McEileen McEileen force-pushed the 589/add-year-selector branch 2 times, most recently from dd010ad to bb076a1 Compare March 13, 2020 01:28
@@ -62,6 +62,14 @@ def dept_choices():
return db.session.query(Department).all()


def year_choices():
years = [year[0] for year in db.session.query(cast(func.extract('year', Officer.employment_date), Integer)).filter(Officer.employment_date.isnot(None)).order_by(func.extract('year', Officer.employment_date))]
years_deduped = list(set(years))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't a super efficient way to dedupe the years. Please let me know if anyone can think of a more performant approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just hardcode the years to be from like the year we founded OO or something (open to discussion how far back to go), to the current year?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the current code isn't ideal, I prefer it to hardcoding a list of years. I think hardcoding is brittle and vulnerable to breakage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what exactly we want to get here. I think whatever years we offer to select should be an uninterrupted range. If it was 1990, 1995, 2003 and 2010 a user would probably be somewhat confused. If you want 2005, do you select 2003 or 2010? I would either hardcode the start year (maybe 2014 or something, I don't think we have much data from before that) and go until current year or take the lowest that's in the database and offer every year from that year on until current year

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal with year_choices() is to display every year for which a department has roster information.

take the lowest that's in the database and offer every year from that year on until current year

I went for this approach.

@brianmwaters
Copy link
Collaborator

Still working on my in-depth code review here, but I noticed in the meantime that the "year" selector only appears in the filter bar on the left-hand side of the officer listing page, but not on the "find an officer" page (e.g, /find). This seems to sort of make sense, but just want to verify that that's by design and not an oversight (ba-dum tiss)

Copy link
Collaborator

@brianmwaters brianmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review done! Sorry if this is too nitpicky, I guess that's what happens when you do a line-by-line review. Also github says there's a merge conflict so we need to fix that too. From a high level this looks like a great new feature, thanks @McEileen

@@ -202,6 +202,9 @@ class AddOfficerForm(Form):
query_factory=unit_choices, get_label='descrip',
allow_blank=True, blank_text=u'None')
employment_date = DateField('Employment Date', validators=[Optional()])
last_employment_date = DateField('Last Employment Date', validators=[Optional()])
last_employment_details = StringField('Last Employment Details', default='', validators=[
Regexp(r'\w*'), Length(max=50), Optional()])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we restricting this comment to just alphanumeric characters? What about punctuation, etc? It seems like a length-limited text field should be pretty open ended.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was, "shoot, you're right." Then I tested this and was able to successfully enter last employment details with punctuation.

OOGeneralInformation

When I changed the validator from Regexp(r'\w*') to Regexp(r'\w'), then I was no longer able to enter punctuation. It appears that the wildcard adds flexibility to this regex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that Regexp(r'\w*') doesn't do anything. It matches literally any string. Regexp(r'\w') only fails if the first character of the string is punctuation. So the question here really is which strings should be allowed and which shouldn't?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my 2019 self used this regex because it's being used in most of the other string fields in main/forms.py. Since @abandoned-prototype is right that it matches literally any string, I changed the regex to require alphanumeric characters in its input. The field now allows punctuation, but will throw an error if the input is only punctuation.

@@ -248,6 +251,9 @@ class EditOfficerForm(Form):
gender = SelectField('Gender', choices=GENDER_CHOICES, coerce=lambda x: x or None,
validators=[AnyOf(allowed_values(GENDER_CHOICES))])
employment_date = DateField('Employment Date', validators=[Optional()])
last_employment_date = DateField('Last Employment Date', validators=[Optional()])
last_employment_details = StringField('Last Employment Details', default='', validators=[
Regexp(r'\w*'), Length(max=50), Optional()])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the above comment about alphanumeric chars

((salary.overtime_pay and 'overtime_pay' in row and
Decimal('%.2f' % salary.overtime_pay) == Decimal('%.2f' % float(row['overtime_pay']))) or
(not salary.overtime_pay and ('overtime_pay' not in row or not row['overtime_pay']))):
((salary.overtime_pay and 'overtime_pay' in row and Decimal('%.2f' % salary.overtime_pay) == Decimal('%.2f' % float(row['overtime_pay']))) or (not salary.overtime_pay and ('overtime_pay' not in row or not row['overtime_pay']))):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional was super confusing before and is super duper confusing now! Not gonna lie I have no idea what's going on here. Do you think you could shed some light w/ a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only appears in the diff because of a formatting change. Since I didn't write the logic, I'm not the right person to add an explanatory comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is comparing the imported salary with the officer's existing salaries, due to type and formatting differences.

@@ -481,12 +476,15 @@ def list_officer(department_id, page=1, race=[], gender=[], rank=[], min_age='16
if request.args.get('rank') and all(rank in rank_choices for rank in request.args.getlist('rank')):
form_data['rank'] = request.args.getlist('rank')

if request.args.get('year') and request.args.get('year') in year_choices():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was already happening the code base before you added this line, but I've always learned to use is not None to check for None in Python, as opposed to relying on the falsy-ness of None. This is recommended by PEP 8 (see the 2nd bullet point under the heading): https://www.python.org/dev/peps/pep-0008/#programming-recommendations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

@@ -62,6 +62,14 @@ def dept_choices():
return db.session.query(Department).all()


def year_choices():
years = [year[0] for year in db.session.query(cast(func.extract('year', Officer.employment_date), Integer)).filter(Officer.employment_date.isnot(None)).order_by(func.extract('year', Officer.employment_date))]
years_deduped = list(set(years))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just hardcode the years to be from like the year we founded OO or something (open to discussion how far back to go), to the current year?

@@ -300,6 +310,15 @@ def filter_by_form(form, officer_query, department_id=None):
officer_query = officer_query.filter(db.or_(db.and_(Officer.birth_year <= min_birth_year,
Officer.birth_year >= max_birth_year),
Officer.birth_year == None)) # noqa
if form.get('year') and form.get('year') in year_choices():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above w/ regard to checking the truthyness of None

@@ -81,17 +82,29 @@ def pick_salary():
return random.randint(100, 100000000) / 100


def generate_random_date(year, month):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm randomness in tests seems bad. What about picking a random number once and then hardcoding that into the tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless, if this is standard practice in our test suite already, then I guess it's fine to keep doing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay because a lot of the other methods in conftest.py, like pick_birth_date() or pick_gender(), use random.randint() or random.choice(). Let me know if you disagree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that we do this in a lot of places and I think it is actually a problem. I saw an issue with the randomness of our test-data just yesterday.
But given that it is already so widespread I guess it's fine to keep doing it for now. And if it's just the actual value randomness is not as bad as deciding if a field is None or not based on randomness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #756 offers an alternate implementation that might have better random distribution since it hashes the officer name: https://github.com/lucyparsons/OpenOversight/pull/756/files#diff-4eef8ece740b94822be839e43d28d7a7R42

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 I saw an issue with the randomness of our test-data just yesterday.

What was the problem you saw with test data randomness? Is it something that we need to open an issue for?

I can use the implementation from PR #756, but I would also like to know more about the problems with our existing use of random test data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open an issue on the concrete problem I saw and probably add some ideas I have for moving forward

@@ -118,6 +119,86 @@ def test_filter_by_partial_unique_internal_identifier_returns_officers(mockdata)
assert returned_identifier == identifier


def test_year_filter_selects_officer_who_quit_that_year(mockdata):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you covered a lot of corner cases, nice job

@McEileen
Copy link
Collaborator Author

Still working on my in-depth code review here, but I noticed in the meantime that the "year" selector only appears in the filter bar on the left-hand side of the officer listing page, but not on the "find an officer" page (e.g, /find). This seems to sort of make sense, but just want to verify that that's by design and not an oversight (ba-dum tiss)

Yep, I only put the "year" selector on the browse page, since that's where the original issue, #589, requested it. Still, thanks for opening (ba-dum tiss?) my eyes to this.

@McEileen
Copy link
Collaborator Author

I responded to feedback. Here's a video of the feature in action.

@McEileen
Copy link
Collaborator Author

McEileen commented Aug 2, 2020

I responded to feedback. Thanks to everyone who offered their thoughts.

Comment on lines +84 to +98
def pick_date(seed: bytes = None, start_year=2000, end_year=2020):
# source: https://stackoverflow.com/questions/40351791/how-to-hash-strings-into-a-float-in-01
# Wanted to deterministically create a date from a seed string (e.g. the hash or uuid on an officer object)
from struct import unpack
from hashlib import sha256

def bytes_to_float(b):
return float(unpack('L', sha256(b).digest()[:8])[0]) / 2 ** 64

if seed is None:
seed = str(uuid.uuid4()).encode('utf-8')

return datetime.datetime(start_year, 1, 1, 00, 00, 00) \
+ datetime.timedelta(days=365 * (end_year - start_year) * bytes_to_float(seed))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice approach. If we want to go with this I think we should put the bytes_to_float function in a utils-file are something so that we can use it everywhere, and I suggest rewriting the function as

    def bytes_to_float(seed = None):
        if seed is None:
            seed = uuid.uuid4()
        return float(unpack('L', sha256(str(seed).encode('utf-8')).digest()[:8])[0]) / 2 ** 64

that way any object that has a str function can be used as seed.

if we want the result to be deterministic we have to make sure that the input is deterministic too. In this case it seems that the last name is used, which is unfortunately randomly chosen.

Comment on lines +81 to +87
oldest_year = db.session.query(cast(func.extract('year', Officer.employment_date), Integer)) \
.filter(Officer.employment_date.isnot(None)) \
.order_by(func.extract('year', Officer.employment_date)) \
.limit(1).first()[0]
current_year = datetime.datetime.now().year
year_choices = [str(year) for year in range(oldest_year, current_year + 1)]
return year_choices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think for the oldest year we should use a SELECT min(employment_date) FROM officers to make it as efficient as possible. I looked up how to do that in sqlalchemy and it seems
oldest_year = db.session.query(func.min(Officer.employment_date)).scalar().year
should be do the trick.
What do we want to do in the case that there is no officer with a (not NULL) employment date?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep coming back to this and I am think we should just hard-code the oldest_year. I think we now have some officers in the database whose employment_date is before 1950, which will make the drop-down menu pretty long and we really don't have good data for those year anyways. So my suggestion would be to hard-code it to 1980 or something, but I am also fine with the suggestion I made, if we can come up with a solution for an officer table without any non-NULL employment dates.

{% endif %}
{% if officer.last_employment_details %}
<tr>
<td><b>Last Employment Date Details</b></td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the label that we display to the users to something that is easier understood? Like "Details on Resignation" or "Details on End of Employment" or something? Not exactly sure how this field will be used though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a "last employment date" field
4 participants