Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dates to be only year #743

Closed
gary-host-laptop opened this issue Mar 15, 2021 · 34 comments · Fixed by #3059
Closed

Allow dates to be only year #743

gary-host-laptop opened this issue Mar 15, 2021 · 34 comments · Fixed by #3059

Comments

@gary-host-laptop
Copy link

Is your feature request related to a problem? Please describe.
Currently dates need to be a full date with day, month and year.

Describe the solution you'd like
To be able to to set only year, since a lot of older or not so known books don't have that specific data available.

Describe alternatives you've considered
N/A

Additional context
N/A

@mouse-reeve
Copy link
Member

#863 doesn't do this, but it's adjacent

@ghost
Copy link

ghost commented Apr 5, 2021

i got a dev environment set up and looked around the codebase a bit; i'd love to work on this! how would you want it implemented in terms of ui/database though? a date input can't be used for just a year (and likewise the DateTimeField uses datetime which doesn't support just a year). i was thinking maybe using a toggle switch to change the form to use a number input (sorry for the quick terrible mockup)
image

and then have the form's clean method check the toggle box, and if set, set the date to datetime(year, 1, 1) along with an additional year_only flag set to true

does that sound alright? if there are other ideas please lmk :)

@ghost
Copy link

ghost commented Apr 5, 2021

alternatively, could drop the toggle switch and just display both inputs. might be a little confusing in the ui, but would be less code on the frontend and backend

@mouse-reeve
Copy link
Member

I think the toggle is a good idea, if it isn't too much effort to get the frontend stuff working for it. I wish there was a better database solution to this (since year-only dates will be indistinguishable from books actually published on January 1st), but I'm not thinking of one! Which is all to say, go for it @void-witch

@ghost
Copy link

ghost commented Apr 5, 2021

cool! yeah, the only "better" solution i can think of would be a custom field that saves to a string, but that's super complex and also you lose db optimizations because of not using a date type
i'll look into it and report back when i have something! :3

@mouse-reeve mouse-reeve assigned ghost Apr 5, 2021
@ghost ghost removed their assignment Apr 18, 2021
@hughrun
Copy link
Contributor

hughrun commented Jan 17, 2022

This issue is still open but the associated PR is closed (rather than merged) and the user working on it is no longer registered. @arkhi @mouse-reeve do you know the current status? This kinda annoys me every time I add a book because I only ever have the year of publication (which usually is all we want), so I'd be happy to assist if help is needed.

@arkhi
Copy link
Contributor

arkhi commented Jan 18, 2022

@hughrun: As far as I know (from a few months back), this issue is stalled and could definitely use a hand. Thanks for bringing that up!

@mouse-reeve
Copy link
Member

The change to the date picker widget that I have a PR for above should make the UI aspect of this much more straightforward. The form validation still requires a full date at this point, however.

@Ryuno-Ki
Copy link
Contributor

Ryuno-Ki commented May 3, 2022

Instead of using the flag … would it be an option to split the field up into three parts?
Akin https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date#html

Implementation wise that would be a custom input instead of Django's native ones.

@mouse-reeve
Copy link
Member

Yep! We had the exact same idea, and a couple months ago I switched to a custom three part Django form widget for publication dates. The fields are still all required, but from a UI perspective it's ready for that, and the flag is no longer necessary.

Current UI:
Screen Shot 2022-05-03 at 7 59 05 AM

@mxamber
Copy link

mxamber commented Sep 7, 2022

Any news on this? Just ran into the same issue when I tried to manually add a book and it wouldn't accept 2021 as publishing date, but I don't know the exact day it was published on.

@mouse-reeve
Copy link
Member

Unfortunately this is still stalled (it bugs me all the time, too). I had a vague recollection of there being a way to store dates like this in Postgres but I can't find it and now I wonder if it was just wishful thinking. Barring that, perhaps the best solution is to store the date precision as an enum (down to the year, month, or day) in addition to the date, and default to January 1st for absent information. From the user's perspective, you would leave the day and/or month field blank, in the database it would be stored as a regular date, and then in the display of the book, only the relevant part of the date would be shown.

@hughrun
Copy link
Contributor

hughrun commented Sep 8, 2022

If we're storing a date object in Postgres it has to have a day and month:

https://www.postgresql.org/docs/current/datatype-datetime.html

I was working on this a couple of months ago but gave up because I couldn't work out how to set a default value if the input is empty when the user saves the form. As far as I can tell that's the only solution: set any empty values to 01 before storing in the database.

@arkhi
Copy link
Contributor

arkhi commented Sep 11, 2022

If the data is not necessarily a date (just YYYY, a YYYY-MM or a full date) and it’s annoying for everyone, then maybe the data type is not the right one?

Could this be three fields made of numbers (0-31, 1-12, -infity to current year) and that’s it? That would allow to not add erroneous data (January 1st) in many cases. In case of a full date, the switch mentioned in a previous comment could work too but that would make the DB more difficult to deal with, I guess.

@Ryuno-Ki
Copy link
Contributor

Drawback are invalid dates such as 31st February …

@arkhi
Copy link
Contributor

arkhi commented Sep 30, 2022

Could this be three fields made of numbers (0-31, 1-12, -infity to current year) and that’s it? That would allow to not add erroneous data (January 1st) in many cases. In case of a full date, the switch mentioned in a previous comment could work too but that would make the DB more difficult to deal with, I guess.

Drawback are invalid dates such as 31st February …

Since data should be checked before being stored anyway, what would be the roadblock with checking for validity before the date or its components are stored?

@Ryuno-Ki
Copy link
Contributor

Dates can be tricky: https://github.com/kdeldycke/awesome-falsehood#dates-and-time

Just saying. I see a lot of maintenance burden here.

@arkhi
Copy link
Contributor

arkhi commented Sep 30, 2022

Dates are tricky overall, indeed, but we’re not aiming for precision here, quite the opposite. :)

I’m no python developer but wouldn’t a dateutil.parser.isoparse be enough?

@Ryuno-Ki
Copy link
Contributor

I defer to @mouse-reeve for a decision. Keep in mind that migrations are likely necessary. They has a better connection to fellow instances to make a judgement call, whether it is worth it.

@hughrun
Copy link
Contributor

hughrun commented Oct 1, 2022

The reality is that the vast majority of bibliographic metadata in the world only lists a publication year, or at most a month as well. The only reason to care about a specific day is that the Bookwyrm database field was originally a datetime.

@mouse-reeve
Copy link
Member

As I see it, both database-level solutions (either using a datetime field and storing the precision, or storing the year and month and day as separate date fields) have advantages and disadvantages, and it's not super obvious which is better in the application as it is. I'm favoring using a datetime field and precision enum because it would mean that it's not super hard to do date math. If you're manipulating the dates in python, @arkhi is quite right that parsing the date after loading the object would be a simple solution. However, if you wanted to query, for example, every book published between 3 and 5 years ago, or sort a queryset by publication date, you'd have a very challenging query ahead of you.

Since querying based on publication time seems like a pretty expected use case to me (it may also be how the author page is sorted, I don't recall off the top of my head), I'm inclined to keep the database field as a datetime field. Does that make sense to yall? Am I missing something in my reasoning? Extremely grateful for all your input ❤️

@hughrun
Copy link
Contributor

hughrun commented Oct 15, 2022

I think there may possibly be a solution that pleases everyone here.

@mouse-reeve's explanation makes sense. Generally dates should always be Dates as @Ryuno-Ki points out above.

However the original point of this issue is that it's a crappy user experience to have to enter full dates, given that most of the time the publication month and day are unknown to the user and almost everyone else. If seems unlikely that there would be a pressing use-case for searching publication dates with more specificity than a year, so can we set the value of a "blank" day or month to 1 such that if a user enters only a year, Bookwyrm set the date to 1 January {year}?

I did some testing and it seems this might be as easy as swapping out two lines of code in SelectDateWidget within forms/widgets.py:

# old
if not self.is_required:
    month_choices.insert(0, self.month_none_value) # self.month_none_value equates to "(0, '---')"
# new
if not self.is_required:
    month_choices.insert(0, (1, '---'))

I find Django widgets a little bamboozling, however, so it's possible this might cause problems I haven't anticipated.

@mouse-reeve
Copy link
Member

Agreed! I don't think there's anything up in the air about how the UI should work from the user's perspective; you should be able to leave the day and/or month blank as fitting. The reason for the database discussion is, once you've create a date that's like 2022-01-01, how do you know if that's a book that was published in 2022, or a book that was published in January of 2022. If the year really only ever mattered, it would be a trivial problem, but I do think there can be publication dates that are meaningful down to the day (when I'm waiting for the next book in a series, for example, I care very much when specifically it will get released).

I'm glad you dug into that widget; I think I copy/pasted it nearly wholesale from the django source code so that's a valuable insight into how it works even though git blame acts like I wrote it 😂

@chdorner
Copy link
Member

Once we get around to fixing this on a database level, #2660 needs attention as well given that any book coming in via ActivityPub's federation, or the connectors which transform the data into ActivityPub's format first, uses dateutil.parser.parse which fills in the blanks with the current date and turns a string like "2022" into datetime.datetime(2022, 2, 13, 0, 0) (Feb 13 being the date today).

@chdorner
Copy link
Member

chdorner commented Feb 17, 2023

I've been low-key thinking about this a bit over the past days. What do you think about this proposal:

We create a new Django field type. Naming is hard, but let's say for now we'd call it DateWithNullablePartsField, it would work like this:

  • on a database level it's a plain old string storing the date as YYYY-MM-DD
  • in Python the field is a value object with data accessors for .day, .month, .year
  • Day and month parts would be allowed to be null, not year, in which case the whole field and column should be null
  • when translating the field from Python to the value that Django will send to the database we:
    1. construct an ephemeral string with the format YYYY-MM-DD field, important here is that for nullable months and days we use the value 01, because:
    2. we parse this ephemeral string into a Python date with a strict parser (not dateutil) to validate the date
    3. if valid, construct a similar string, but replace the nullable parts with 00 (i.e. 2023-02-00 for February 2023)
    4. send this string with the null parts zeroed out to Django to send to the database
  • when translating the field from the database to Python we:
    1. parse the string with a regex (since 2023-02-00 is an invalid date)
    2. fill in the values into the value object

I'm still not entirely sure how to support ordering based on the first-/published dates with those field types. Maybe the solution to that could be that we store two columns each, one as described above, and one as an actual date field where the nullable parts default to 01. We could then use that date field for ordering.

@chdorner
Copy link
Member

I have two commits in this draft PR each showcasing two separate solutions to this. Both are proofs of concept and only deal with the Book edit form rendering and submitting. They don't include any changes for where we render the dates or federating them out to other instances.

I will humbly remove the "good first issue" label because this is anything but easy 😅

@chdorner chdorner removed the good first issue Good for newcomers label Feb 25, 2023
@mxamber
Copy link

mxamber commented Mar 26, 2023

Just ran into this again. tbh it's baffling, I don't think I've ever seen any book that had a full YYYY MM DD date in it, only ever years.

@mxamber
Copy link

mxamber commented Mar 26, 2023

I'm manually importing all the niche books from my Goodreads import that aren't in the database (speaking of which: Inventaire often imports complete word salad), and the inability to file books with only a year is currently messing it all up.

@arkhi
Copy link
Contributor

arkhi commented Mar 27, 2023

I’m mentioning my previous proposal again. :)

@chdorner
Copy link
Member

chdorner commented Mar 29, 2023

I’m mentioning my #743 (comment) again. :)

@arkhi your proposal works fine by itself, there's even a proof of concept implementation in this commit, it even validates the date as an actual date by filling in the missing parts with "01", but then not storing that.
The main issue with this approach is that it'll break federation with older BookWyrm instances which don't know these imprecise dates. A workaround could be to introduce a new date field for the imprecise one so that old versions still look for the real date. Not the nicest way of handling this, unfortunately.

Another approach is in this other commit, keeping just one date field with missing parts filled in with "01", but adding a second field describing the precision of the date. Rendering out those two fields (i.e. date: "2022-03-06", precision: "month") would keep backwards compatibility intact and the ActivityPub formats slightly cleaner than the first solution.

There's a bit of a discussion which happened on #2691 but then died down. Would be great to restart this and have a few more points of views on those two approaches, or even better, a new nicer one if we can find it :)

@mxamber
Copy link

mxamber commented Mar 29, 2023

The main issue with this approach is that it'll break federation with older BookWyrm instances which don't know these imprecise dates. A workaround could be to introduce a new date field for the imprecise one so that old versions still look for the real date. Not the nicest way of handling this, unfortunately.

Store fuzzy date, fill day/month with 01 for the old format, only display that if no fuzzy date is found, deprecate the old field somewhere medium future down the road (at a point when it can reasonably be assumed that most-to-all instances have since updated to some version that includes the new field)?

Another approach is in this other commit, keeping just one date field with missing parts filled in with "01", but adding a second field describing the precision of the date. Rendering out those two fields (i.e. date: "2022-03-06", precision: "month") would keep backwards compatibility intact and the ActivityPub formats slightly cleaner than the first solution.

But at the same time make any future switch to fuzzy-only harder, I imagine.

@hughrun
Copy link
Contributor

hughrun commented Apr 1, 2023

The problem with @arkhi's solution of dispensing with date as the type is - as @mouse-reeve pointed out - a date type is very useful for quick sorts and database queries.

I still think the easiest and cleanest solution is to simply replace any missing days and months with 01 (the first day of the month and/or January). I don't see this being a significant problem - I suppose the main drawback is that a lot of books will erroneously by listed as published on 1 January of their publication year, but I'm not clear on why this would cause any particular problem, especially since exact dates are most important when it comes to future dates (when is the next book in this series coming out?) and it's primarily quite recently published books that are likely to have specific publication dates.

If that's the approach, then there is a very simple built-in solution. We don't need to futz around with custom date parsers.

@gary-host-laptop
Copy link
Author

The problem with @arkhi's solution of dispensing with date as the type is - as @mouse-reeve pointed out - a date type is very useful for quick sorts and database queries.

I still think the easiest and cleanest solution is to simply replace any missing days and months with 01 (the first day of the month and/or January). I don't see this being a significant problem - I suppose the main drawback is that a lot of books will erroneously by listed as published on 1 January of their publication year, but I'm not clear on why this would cause any particular problem, especially since exact dates are most important when it comes to future dates (when is the next book in this series coming out?) and it's primarily quite recently published books that are likely to have specific publication dates.

If that's the approach, then there is a very simple built-in solution. We don't need to futz around with custom date parsers.

It causes problems, I've encountered this many times while adding books. Let's say you have the month and the year, but not the day, is it really the first of that month or is it just to fill that gap? Then there are cases were books are marked as January and maybe have another date than 01 but they were not released on January, so maybe someone changed the date to some other number and now you might think that it was actually released on January something. In my opinion it's just a bad practice to have for a project that's about archiving in some way or another, simply because the solution might be a bit harder to come by in terms of code.

dato added a commit to dato/bookwyrm that referenced this issue Oct 20, 2023
Some dates (publication dates, author dates) are meant as _literals_. What
the user inputs through a `SelectDateWidget` should be preserved as-is.
Django's otherwise-excelent support for timezones interferes with it (see

Until a better fate of these columns is determined (do we migrate them to a
DateField?), and as a stop-gap measure, we can start being faithful to the
data by storing them in the Eastern-most timezone.

This is particularly important because 1/1/YYYY is a common pattern in
publication dates, given bookwyrm-social#743.
dato added a commit to dato/bookwyrm that referenced this issue Oct 20, 2023
Some dates (publication dates, author dates) are meant as _literals_. What
the user inputs through a `SelectDateWidget` should be preserved as-is.
Django's otherwise-excelent support for timezones interferes with it (see

Until a better fate of these columns is determined (do we migrate them to a
DateField?), and as a stop-gap measure, we can start being faithful to the
data by storing them in the Eastern-most timezone.

This is particularly important because 1/1/YYYY is a common pattern in
publication dates, given bookwyrm-social#743.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants