Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model #7

Open
audiomuze opened this issue Jan 1, 2023 · 51 comments
Open

Data model #7

audiomuze opened this issue Jan 1, 2023 · 51 comments

Comments

@audiomuze
Copy link

audiomuze commented Jan 1, 2023

I've cheekily installed and run Deaftone and had a peek at the database it generated.

I noticed you're populating track number into the the track and disk fields in the songs table.

I also saw that many tags I'd anticipate would be imported were not. These are the tags I typically store in my file based metdatata. Most comply to musicbrainz' tagging model but others are added with the intention to one day leverage them through a self-developed app or someone else's. Bottom line is this: There are some data points which do not exist in online music databases but which users need to be able to populate if they so wish in order to get the most out of the UX. Such fields e.g. albumrating and trackrating need to be able to be applied within an app, but also need to be importable from file tags for users to be able to leverage prior effort where they've done so.

acousticbrainz_mood
acoustid_id
album
albumrating
albumartist
analysis
arranger
artist
asin
barcode
bootleg
catalog
catalognumber
compilation
composer
conductor
country
date
discnumber
discsubtitle
engineer
ensemble
explicit
fingerprint
genre
isrc
label
live
lyricist
lyrics
mixer
mood
movement
movementnumber
musicbrainz_albumartistid
musicbrainz_albumid
musicbrainz_artistid
musicbrainz_discid
musicbrainz_releasegroupid
musicbrainz_releasetrackid
musicbrainz_trackid
musicbrainz_workid
originaldate
originalreleasedate
originalyear
part
performancedate
performer
personnel
producer
recordinglocation
releasetype
remixer
replaygain_album_gain
replaygain_album_peak
replaygain_track_gain
replaygain_track_peak
review
albumtag
radioban
tracktag
style
subtitle
theme
title
tracknumber
trackrating
upc
version
work
writer

In any event, I've been thinking of rolling my own open source music server for some time now and have documented some of my thinking based on my own experience, what I've observed from others as well as the good, bad and ugly of Logitechmediaserver, Ampache, Navidrome, countless desktop apps and Roon. It all starts with the data model. I've a number of thoughts I'd like to share early in the piece if you're interested?

@112RG
Copy link
Contributor

112RG commented Jan 2, 2023

My plan is to have all the tags from the MusicBrainz tagging model in the database. Not sure about custom tags yet since they will be only useful for a certain amount of people. And if we started adding custom tags we might end up with a heap of random little tags. From my current understanding, there is no way to dynamically add columns to the database sadly.

What I want Deaftone to do is basically Stream. And provide you with the metadata. And the clients work with the data how they want. With the future possibility of a recommendation engine.

Please share your thoughts in this thread. I am interested in hearing them

@audiomuze
Copy link
Author

audiomuze commented Jan 2, 2023

In the spirit of a picture is worth 1000 words:

image
image
image
image
image
image
image

End goal is an open source music server and client ecosystem that leverages FOSS components to create a state of the art immersive, flexible, UX oriented music exploration and consumption interface, focused on users with local libraries as well as those using streaming services.

All pre-existing user tags they've put effort into that affect the data model will be importable, and everything will be exportable in a form that can be leveraged to update file tags or a sqlite database / table to enable SQL updates to metadata and re-importation into app. Favourites, track and album ratings etc. will be able to be mapped to the data model and imported from file tags.

Data model will allow for:

  • user ratings for tracks
  • user ratings for albums
  • editorial ratings
  • user defined genres (alongside Tivo/Rovi or RYM tree as default)
  • per user library, interface & settings based on profiles. This includes selecting multiple folders across multiple locations and overlapping folder selection with other users using same server. As an example a tree might look like:
    image
    A user could specify their library as being "." or any combination of the subdirs and they would only see that portion of the totality in their library. This means they don't get to see/hear music that's included in "their" library. From a performance perspective this probably implies a master and per user database, with the user database being filled first from the master (disk is cheap)
  • scrobbling via listenbrainz (incl. ability to pull last.fm content or scrobble to it
  • preservation and visualisation of listening history
  • DJ mode will not impact user listening history unless a user opts DJ in. This will be a defaulted config that can be toggled at time of playback
  • do no harm - no music file will ever be written to, moved or deleted by the app, but the app will enable items to be flagged and browsed by flag, and flag lists to be exported as actionable lists for users if they choose to
  • proper library management workflows will exist to deal with missing or incorrect content and unidentified files and albums. Inferences will be made where necessary and require user confirmation where ambiguity exists. Absent user changes the inferences are adopted
  • security by design - app will not need to be run as root, rather own or user uid
  • all audio files will have a uuid that enables tracking irrespective of location in filesystem
  • methods will exist to enable embedding of uuid's into file based tags meaning file attributes are persistent unless deliberately deleted and making UI impervious to file moves. This may require exporting for users to import into their file based tags to make immutable e.g. track_uuid
  • native input to enable playback of any audio stream through server / endpoint interface
  • proper handling of box sets (support for discsubtitle, per disc cover art)
  • classical music will be handled appropriately. It'll be possible to browse by composer, genre, period, instrumentation
  • Music on the Go will be natively supported via the same endpoint app
  • volume levelling
  • DSP, parametric EQ, presets, Room Wizard?

Conceptually that's it at a high level and I'm sure there are gaps, but it's what I've pulled together in fits and starts.

@112RG
Copy link
Contributor

112RG commented Jan 3, 2023

I like the idea of most of this. But there are some features I am still not too sure about.

  1. Not sure about having a per-user library.
  2. I am not fond of the idea of having a UI to play songs on the server. I want it to be done via Orpheus which can be run as a standalone web UI or a Tauri desktop app. The server UI would be only for the management of data and the server itself such as editing metadata etc

I want to try and keep the server KISS. I also want to extract as much metadata as possible and give it to the user to do what they want with it. Meaning I want Deaftone to provide ways to search and query it.

But it's up to apps like Orpheus to take advantage of it. I would rather have a separate app for playing browsing etc. As it would keep a lot of it out of the Rust codebase which some users in the future might not want to contribute to. But having an easy-to-read and use NodeJS Vue app, allows more people to contribute to it. My weak spot in development is UI design and CSS so Orpheus has been a big challenge for me to write. And that also seems to be a big thing most people also find trouble doing, so writing a GUI app is a very slow undertaking and writing a good one is just as hard.

My inspiration for writing Deaftonen came from 1) I don't like Golang too much 2) I hate the subsonic API 3) I wanted a free alternative to Roon at least UI-wise. Which is my inspiration for Orpheus UI which I still have a long way to go on.

The most common thing I see when people are asking for good self-hosted options is

  1. Roon is the best but most expensive
  2. Plex amp is good but again paid and proprietary
  3. Subsonic apps have no good mobile app to use. And very few desktop apps

So those are the targets I am trying to hit with Deaftone and Orpheus which when Tauri mobile matures more will have a single codebase for desktop and for mobile

@audiomuze
Copy link
Author

Understood re not integrating a web ui in the server. LMS has had a built-in web ui from the outset and these days most people use the Material Skin plugin (HTML5, CSS, Javascript) as a front-end rather than the server's built-in web ui.

One thing to bear in mind regarding playback is that using DSP etc. requires grunt so ideally any unpacking and processing should be kept server-side (and will benefit from Rust performance also), enabling playback and browsing to be initiated by less powerful hardware - suspect I'm preaching to the choir here.

Roon has a more attractive UI and much better leverage of metadata than others, and has some excellent features in the convolution and presets dept. But, it's architecturally flawed, predominantly streaming oriented, becoming increasingly cloud based, a walled garden from a data perspective, and has many bugs that are left unaddressed. It now requires internet access to play local files. Even with a lifetime license I've abandoned it in favour of LMS and use the Material Skin plugin (which didn't exist in 2015 when I first switched to Roon) and don't find I'm missing much at all.
In fact, I find I'm generally less frustrated and get the same performance from LMS on a i3-7100T w/ 16GB 2400 MT/s RAM as I get from Roon on a Ryzen 4750G with 32GB 3200 MT/s RAM serving the same library.

None of the other apps out there do a great job with metadata and music exploration. Roon stands out because the bar is so low rather than because of what it is.

@audiomuze
Copy link
Author

Coming back to a per-user library. I've seen many users ask whether they could segregate "their music" from that of other family members, but at the same time want to use one solution for all to enjoy their music. A common complaint is they don't want to see their so' or kid's music among theirs because it serves as a distraction when browsing etc. and can cause DJ type capability to play stuff they don't like. Equally, if building dynamic playlists and specifying that only stuff that's never been played, or only 4 rated tracks, favourited tracks etc. as selection criteria you'd inevitably get music that's not in "your" collection, which would sully the listening experience. Also, if one ever provided analytics about listening habits (or the data to enable the analytics to be carried out), what's listened to, what's not, favourite genres etc. the analysis is against a population that extends beyond the population relevant to that particular user.

From these perspectives a user having a profile makes sense. All users would then need to do is to organise their music on disc in a way that enables them to choose parts of a directory tree or devices attached to the server that represent those parts of the music collection they're interested in. Adding music etc. would entail an admin profile to initiate an update/ ingestion scan with the server then indicating to the user that they need to run a refresh of their "library" if they want to see changes / newly ingested stuff (of course that could also be automated).

@112RG
Copy link
Contributor

112RG commented Jan 3, 2023

For playback it will be server side on Deaftone. And will have various ways of playing. But I want it to be kept simple and basically all queue management to be done by the GUI app. I ideally want to keep /stream/:id just as it is. But this is subject to change based on features wanted etc.

I am still looking at other ways of streaming the audio. But found it a little challenging to understand but I am looking at other music servers and how they do it.

Your case for per user library has swayed me I never thought of it that way. I am not sure the best way to go about splitting libraries where it be done through the database or not. But as you said having a seperate dir for different people would also be handy. But it would most likely need its own seperate table in the database which could get tricky. I might have to look at Jellyfin to see if the handle it similar or not

My plan is to basically just have a UI that is as nice to look at as Roons. And with a good API on Deaftone hopefully somedays people make there own apps who knows

@audiomuze
Copy link
Author

I am still looking at other ways of streaming the audio. But found it a little challenging to understand but I am looking at other music servers and how they do it.

That's the other key weakness in most of the apps - they for the most part assume you want to be tethered to your desk or listen using your mobile or tablet with little to nothing in the way of being able to send audio from the server to anything else.

If you implement SlimProto you've solved the "play audio to any hi-fi" endpoint issue, nothing more to be done. All a user needs to to is pop squeezelite on a Raspberry Pi using the likes of Ropieee XL or other x64 or ARM device with I2S or USB out and they've a hi-fi grade transport. As an added bonus using RopieeeXL provides DLNA, Airplay®, Spotify®, HQPlayer NAA and UPnP-2-Roon Bridge. I suspect the Squeezelite code will be more informative than the old Slimdevices SlimProto documentation.

Playback scenarios that should ideally be catered for:

  • mobile / tablet
  • desktop / laptop
  • Squeezelite
  • DLNA enabled devices
  • Airplay enabled devices
  • HQPlayer

With that covered you've pretty much got a server that can stream to just about anything.

@audiomuze
Copy link
Author

audiomuze commented Jan 3, 2023

Your case for per user library has swayed me I never thought of it that way. I am not sure the best way to go about splitting libraries where it be done through the database or not. But as you said having a seperate dir for different people would also be handy. But it would most likely need its own seperate table in the database which could get tricky. I might have to look at Jellyfin to see if the handle it similar or not

I've also not given a hell of a lot of thought to the technical implementation, but my gut is that separate databases are the simplest way to go, with one being the master or source of truth where album ingestion, metadata etc. is concerned. It in effect represents the totality of all users' music and serves as the feed from which individual databases are populated and the only place where metadata updates/edits are done.

SQLite's limitations allow two or more databases to be associated to the same database connection using ATTACH and to operate as if they were a single database, to a maximum of 125 simultaneously attached databases. Frankly I can't see anyone needing to link more than a handful to cater for a household.

ATTACH 'userUUID.db ' AS userUUID and thereafter reference the tables as userUUID.tablename and treat them as if they were another table in the first connected database.

User runs front-end app, chooses/authenticates to profile and Deaftone addresses their database directly. Given the structures are identical, a single code base handles all databases equally.

All that needs to exist on top of that is for all ingestion/ editing etc. to be done using the source of truth deaftone.sqlite and Deaftone keeping tabs on which DB's have been synched so users get a deferrable sync prompt when selecting their profile.

I think the issue of each user being able to select whatever portions of a directory tree or which devices they want included is important because it enables an efficient structure that avoids the need to duplicate music/content.

tunes
├── am
├── common
├── rylee
└── new_tunes

  • am & rylee contain music only am & rylee respectively enjoy.
  • common contains music both enjoy
  • new_tunes contains recent purchases from whoever chooses to put stuff there for others to sample / be aware of

am defines their folder selections as:

/tunes/am
/tunes/common

rylee defines their folder selections as:

/tunes/rylee
/tunes/common
/tunes/new_tunes
/qnap

deaftone defines its library as

/tunes
/qnap

It populates and updates to am and rylee DB's from deaftone.sqlite by reference to their folder selections. This also minimises hammering of the drives containing the music.

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

I do like the idea of this. I wonder about the work to get it done. There could be a lot of edge cases. And I haven't even got the deletion of media done atm correctly. A lot of the work to support this would have to be done in tasks. Such as syncing DB x with z. Especially when deleting artists and albums it might get tricky.

Since I really wanted to keep using SeaORM on the handlers. And mainly the raw SQL in the scanner portion

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

Currently I have this. I am unsure what most of there datatypes are meant to be

pub struct Model {
    #[sea_orm(primary_key, auto_increment = false)]
    pub id: String,
    #[sea_orm(unique)]
    pub path: String,
    pub acoustid_id: Option<String>,
    pub acoustid_fingerprint: Option<String>,
    pub album: String,
    pub album_artist: Option<String>,
    pub alubm_sort: Option<String>,
    pub arranger: Option<String>,
    pub artist: String,
    pub artist_sort: Option<String>,
    pub artists: Option<String>,
    pub asin: Option<String>,
    pub barcode: Option<String>,
    pub bpm: Option<String>,
    pub catalognumber: Option<String>,
    pub comment_description: Option<String>,
    pub compilation: Option<String>,
    pub composer: Option<String>,
    pub composersort: Option<String>,
    pub conductor: Option<String>,
    pub copyright: Option<String>,
    pub director: Option<String>,
    pub discnumber: Option<String>,
    pub discsubtitle: Option<String>,
    pub encodedby: Option<String>,
    pub encodersettings: Option<String>,
    pub engineer: Option<String>,
    pub gapless: Option<String>,
    pub genre: Option<String>,
    pub grouping: Option<String>,
    pub key: Option<String>,
    pub isrc: Option<String>,
    pub language: Option<String>,
    pub license: Option<String>,
    pub lyricist: Option<String>,
    pub lyrics_description: Option<String>,
    pub media: Option<String>,
    pub djmixer: Option<String>,
    pub mixer: Option<String>,
    pub mood: Option<String>,
    pub movement: Option<String>,
    pub movementtotal: Option<String>,
    pub movementnumber: Option<String>,
    pub musicbrainz_artistid: Option<String>,
    pub musicbrainz_discid: Option<String>,
    pub musicbrainz_originalartistid: Option<String>,
    pub musicbrainz_originalalbumid: Option<String>,
    pub musicbrainz_recordingid: Option<String>,
    pub musicbrainz_albumartistid: Option<String>,
    pub musicbrainz_releasegroupid: Option<String>,
    pub musicbrainz_albumid: Option<String>,
    pub musicbrainz_trackid: Option<String>,
    pub musicbrainz_workid: Option<String>,
    pub musicip_fingerprint: Option<String>,
    pub musicip_puid: Option<String>,
    pub originalalbum: Option<String>,
    pub originalartist: Option<String>,
    pub originalfilename: Option<String>,
    pub originaldate: Option<String>,
    pub originalyear: Option<String>,
    pub performer_instrument: Option<String>,
    pub podcast: Option<String>,
    pub podcasturl: Option<String>,
    pub producer: Option<String>,
    pub _rating: Option<String>,
    pub label: Option<String>,
    pub releasecountry: Option<String>,
    pub date: Option<String>,
    pub releasestatus: Option<String>,
    pub releasetype: Option<String>,
    pub remixer: Option<String>,
    pub replaygain_album_gain: Option<String>,
    pub replaygain_album_peak: Option<String>,
    pub replaygain_album_range: Option<String>,
    pub replaygain_reference_loudness: Option<String>,
    pub replaygain_track_gain: Option<String>,
    pub replaygain_track_peak: Option<String>,
    pub replaygain_track_range: Option<String>,
    pub script: Option<String>,
    pub show: Option<String>,
    pub showsort: Option<String>,
    pub showmovement: Option<String>,
    pub subtitle: Option<String>,
    pub totaldiscs: Option<String>,
    pub totaltracks: Option<String>,
    pub tracknumber: Option<String>,
    pub title: Option<String>,
    pub titlesort: Option<String>,
    pub website: Option<String>,
    pub work: Option<String>,
    pub writer: Option<String>,
    #[sea_orm(column_name = "createdAt")]
    pub created_at: String,
    #[sea_orm(column_name = "updatedAt")]
    pub updated_at: String,
    #[sea_orm(column_name = "albumId")]
    pub album_id: Option<String>,
}

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

There could be a lot of edge cases. And I haven't even got the deletion of media done atm correctly. A lot of the work to support this would have to be done in tasks. Such as syncing DB x with z. Especially when deleting artists and albums it might get tricky.

Off the cuff I'd think the following principles should “simplify” matters :

  • if playback history exists for any track uuid in anyone's playback history the composition, track, composer, artist and album record associated with that track uuid become immutable, albeit the item is clearly not a candidate for playback
  • a user can elect to scrap deleted music from their listening history, in which case an update/refresh scan to deaftone.sqlite would remove those track uuid's and all associated composition, track, composer, artist and album records to the extent they're unique to that track uuid
  • where Deaftone detects filesystem changes and ingests these, the changes should be cascaded to the individual db's that reference the same uuid's.

To make this work, I come back to the need ultimately, to be able to embed a uuid in the file based metadata of every track that's ingested into Deaftone. Whilst I'm not a fan of unleashing an app other than a tagger to add metadata, I suspect most users wouldn't care. For those that are averse to it, it should be trivial enough for them to export the results of

SELECT id,
       path
  FROM songs
 ORDER BY path;

to a text file which they can then feed into a tagging engine they're comfortable with.

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

I don't want to modify the music files at all. It would be nice to see how plex handles all this.

There is going to be a lot of overhead in a partial scan of checking and updating songs in many Databases. And for people with very large libraries that do not use multi-user, it would be a slow down for them. This is something that maybe in the future Deaftone could support but it seems a lot of work and would add a lot more complexity to the database.

I am almost tempted to say set up multiple Deaftone instances when wanting multi-users. Since Deaftone uses very minimal system resources

I think this can be done with as you said

/dad
/mum
/common

Where the dad user has /mnt/a/music/dad mum has hers and common is shared between both. This would still only be scanning files once since. Files in mum are unique to her and common is just merged to all users somehow whether threw attach or something else. But even this will add additional over head on queries to songs albums etc. Even with a attached since we could have to join the queries or maybe we can create a virtual table somehow?

So if say 1 user was using Deaftone all they would have is deaftone.sqlite

But if you enabled multi-user. You would have mum.sqlite dad.sqlite common.sqlite. Sure we would have 3 scanners running it would cause no more disk access than what is mentioned above. Then with playlist generation for say mum, it would only take into account common and mum. I think a design like this rather than syncing from a master DB is a far better option

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

I think a design like this rather than syncing from a master DB is a far better option

If it's a possibility to have multiple Deaftone instances running on one server that's fine, but that introduces the complexity of which endpoint is connecting to which Deaftone server.

Also, where does the need for metadata correction get handled e.g. missing or incorrect metadata?

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

Currently I have this. I am unsure what most of there datatypes are meant to be

Taking a quick peek at beets' DDL (beets relies heavily on MusicBrainz):

CREATE TABLE albums (
    id                   INTEGER PRIMARY KEY,
    artpath              BLOB,
    added                REAL,
    albumartist          TEXT,
    albumartist_sort     TEXT,
    albumartist_credit   TEXT,
    album                TEXT,
    genre                TEXT,
    style                TEXT,
    discogs_albumid      INTEGER,
    discogs_artistid     INTEGER,
    discogs_labelid      INTEGER,
    year                 INTEGER,
    month                INTEGER,
    day                  INTEGER,
    disctotal            INTEGER,
    comp                 INTEGER,
    mb_albumid           TEXT,
    mb_albumartistid     TEXT,
    albumtype            TEXT,
    albumtypes           TEXT,
    label                TEXT,
    mb_releasegroupid    TEXT,
    asin                 TEXT,
    catalognum           TEXT,
    script               TEXT,
    language             TEXT,
    country              TEXT,
    albumstatus          TEXT,
    albumdisambig        TEXT,
    releasegroupdisambig TEXT,
    rg_album_gain        REAL,
    rg_album_peak        REAL,
    r128_album_gain      REAL,
    original_year        INTEGER,
    original_month       INTEGER,
    original_day         INTEGER
CREATE TABLE items (
    id                   INTEGER PRIMARY KEY,
    path                 BLOB,
    album_id             INTEGER,
    title                TEXT,
    artist               TEXT,
    artist_sort          TEXT,
    artist_credit        TEXT,
    album                TEXT,
    albumartist          TEXT,
    albumartist_sort     TEXT,
    albumartist_credit   TEXT,
    genre                TEXT,
    style                TEXT,
    discogs_albumid      INTEGER,
    discogs_artistid     INTEGER,
    discogs_labelid      INTEGER,
    lyricist             TEXT,
    composer             TEXT,
    composer_sort        TEXT,
    work                 TEXT,
    mb_workid            TEXT,
    work_disambig        TEXT,
    arranger             TEXT,
    grouping             TEXT,
    year                 INTEGER,
    month                INTEGER,
    day                  INTEGER,
    track                INTEGER,
    tracktotal           INTEGER,
    disc                 INTEGER,
    disctotal            INTEGER,
    lyrics               TEXT,
    comments             TEXT,
    bpm                  INTEGER,
    comp                 INTEGER,
    mb_trackid           TEXT,
    mb_albumid           TEXT,
    mb_artistid          TEXT,
    mb_albumartistid     TEXT,
    mb_releasetrackid    TEXT,
    trackdisambig        TEXT,
    albumtype            TEXT,
    albumtypes           TEXT,
    label                TEXT,
    acoustid_fingerprint TEXT,
    acoustid_id          TEXT,
    mb_releasegroupid    TEXT,
    asin                 TEXT,
    isrc                 TEXT,
    catalognum           TEXT,
    script               TEXT,
    language             TEXT,
    country              TEXT,
    albumstatus          TEXT,
    media                TEXT,
    albumdisambig        TEXT,
    releasegroupdisambig TEXT,
    disctitle            TEXT,
    encoder              TEXT,
    rg_track_gain        REAL,
    rg_track_peak        REAL,
    rg_album_gain        REAL,
    rg_album_peak        REAL,
    r128_track_gain      REAL,
    r128_album_gain      REAL,
    original_year        INTEGER,
    original_month       INTEGER,
    original_day         INTEGER,
    initial_key          TEXT,
    length               REAL,
    bitrate              INTEGER,
    bitrate_mode         TEXT,
    encoder_info         TEXT,
    encoder_settings     TEXT,
    format               TEXT,
    samplerate           INTEGER,
    bitdepth             INTEGER,
    channels             INTEGER,
    mtime                REAL,
    added                REAL
);

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

Currently I have this. I am unsure what most of there datatypes are meant to be

Some additional comments (I'll edit this iteratively):

  • musicip_puid is defunct, it's a hangover from the days of MusicIP which was ultimately absorbed into Gracenote and killed
  • originalfilename purpose?
  • totaldiscs purpose?
  • totaltracks purpose?

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

I think a design like this rather than syncing from a master DB is a far better option

If it's a possibility to have multiple Deaftone instances running on one server that's fine, but that introduces the complexity of which endpoint is connecting to which Deaftone server.

Also, where does the need for metadata correction get handled e.g. missing or incorrect metadata?

My idea for metadata correction/editing. Is mostly likely editing through a UI of somekind. This would only be for Album or Artist information. Where song metadata is up to your tags.

If you say readded a album and the metadata entry for it somehow become a orphan. It would be relinked from a editor UI of somekind

I will have to test the performance of doing such large joins. Since my music collection is small to most I might have to generate data.

Its looking more and more like. multi user might need you to setup a database like Postgres or something. We are stretching the limits of what SQLite can handle

@audiomuze
Copy link
Author

I will have to test the performance of doing such large joins. Since my music collection is small to most I might have to generate data.

Mine is inordinately large to most. Shout if you want my current deaftone.sqlite uploaded somewhere or if there are tests you'd like me to run.

Its looking more and more like. multi user might need you to setup a database like Postgres or something. We are stretching the limits of what SQLite can handle

Updates are controlled and written infrequently as maintenance tasks, so I'd be very surprised if that's the case.

I can easily have a handful or more users concurrently interacting with LMS' database via UI and playing back to as many endpoints without performance issues.

@audiomuze
Copy link
Author

My idea for metadata correction/editing. Is mostly likely editing through a UI of somekind. This would only be for Album or Artist information. Where song metadata is up to your tags.

Personally I think the less editing done in the app the better, unless it's basically enabling a user to correct metadata in a corrections table of some sort that they can then export and use as data to correct their tags with.

There is, however, an opportunity to enable a lot of enrichment of underlying file tags (leading to a much richer UX) by leveraging the database to identify what is able to be enriched and generating the enriched data. At some point I resorted to exporting all my file tags to a flat table, using SQL to do the enrichment and then writing the changed metadata back to the underlying file tags. I still have many of these queries saved if you ever wanted to leverage.

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

Mine is inordinately large to most. Shout if you want my current deaftone.sqlite uploaded somewhere or if there are tests you'd like me to run.

How large is your collection?

Its looking more and more like. multi user might need you to setup a database like Postgres or something. We are stretching the limits of what SQLite can handle

Does LMS have per user libraries handled the same way we are talking about with common libraries etc?

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

My idea for metadata correction/editing. Is mostly likely editing through a UI of somekind. This would only be for Album or Artist information. Where song metadata is up to your tags.

Personally I think the less editing done in the app the better, unless it's basically enabling a user to correct metadata in a corrections table of some sort that they can then export and use as data to correct their tags with.

There is, however, an opportunity to enable a lot of enrichment of underlying file tags (leading to a much richer UX) by leveraging the database to identify what is able to be enriched and generating the enriched data. At some point I resorted to exporting all my file tags to a flat table, using SQL to do the enrichment and then writing the changed metadata back to the underlying file tags. I still have many of these queries saved if you ever wanted to leverage.

I am not looking to make a tag editor since there are applications out there far better than what I could do. Basically, your tags are your own to edit. All metadata Deaftone will bring is Album / Artist metadata

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

Mine is inordinately large to most. Shout if you want my current deaftone.sqlite uploaded somewhere or if there are tests you'd like me to run.

How large is your collection?

50336 directories - extrapolate from there.

Its looking more and more like. multi user might need you to setup a database like Postgres or something. We are stretching the limits of what SQLite can handle

Does LMS have per user libraries handled the same way we are talking about with common libraries etc?

No, it has no concept of users, but the UI can be concurrently accessed from any device that speaks SlimProto or interacts with its web server. I've had plenty occasions where 10 or so are visiting and everyone is choosing tunes for queueing via their mobile.

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

Mine is inordinately large to most. Shout if you want my current deaftone.sqlite uploaded somewhere or if there are tests you'd like me to run.

How large is your collection?

50336 directories - extrapolate from there.

Its looking more and more like. multi user might need you to setup a database like Postgres or something. We are stretching the limits of what SQLite can handle

I am guessing in 10TB plus. Which is if so would criple Sqlite when doing a join

Does LMS have per user libraries handled the same way we are talking about with common libraries etc?

No, it has no concept of users, but the UI can be concurrently accessed from any device that speaks SlimProto or interacts with its web server. I've had plenty occasions where 10 or so are visiting and everyone is choosing tunes for queueing via their mobile.

Yes but those queries wouldn't require joins. Which having a common and mum db of tracks would require. I imagine LMS handles such large libraries by not sending you every Album in 1 web request rather in paginates it

@audiomuze
Copy link
Author

am not looking to make a tag editor since there are applications out there far better than what I could do. Basically, your tags are your own to edit. All metadata Deaftone will bring is Album / Artist metadata

100% agreed. However, as an example, here's a scenario that the server can help improve for users:

analysis of an ingested library shows 15 performances of a particular composition with the same title. 5 of them have the same composer entry associated with them, the rest have no composer associated, but the most of the performances are by the same group of artists. The server could spit out a list / populate a table with each track's uuid, path, tile and the suggested composer. Still up to the user whether or not they want to leverage that data or not. Without that, they're left with the impossible task of grooming that which is not easily seen without the benefit of some insight.

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

I am guessing in 10TB plus. Which is if so would criple Sqlite when doing a join

I suspect we're not understanding one another?
Under what I'd proposed, if you had distinct, per user databases a user would only be interacting with one database at any juncture. You may have different users interacting with different databases concurrently.

The only exception to that would be if an administrative user pushed changes from the master database to user databases - something that'd presumably only ever happen occasionally when they've done heavy tag editing or ingested new music. If ingestion makes it easy to identify what's newly ingested then updates should be limited in scope and relatively quick. Equally purging or updating records can be flagged to limit scope of changes to only those affected records.

Under your suggestion it should also be a non issue. Each user simply initiates a refresh on demand from their client.

@audiomuze
Copy link
Author

And I haven't even got the deletion of media done atm correctly.

How do you plan to handle detecting updates to underlying files - inotify or relying on changes to last mod date and user manually initiating a refresh?

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

And I haven't even got the deletion of media done atm correctly.

How do you plan to handle detecting updates to underlying files - inotify or relying on changes to last mod date and user manually initiating a refresh?

Changes to modtime of folder the file is in

I am guessing in 10TB plus. Which is if so would criple Sqlite when doing a join

I suspect we're not understanding one another? Under what I'd proposed, if you had distinct, per user databases a user would only be interacting with one database at any juncture. You may have different users interacting with different databases concurrently.

The only exception to that would be if an administrative user pushed changes from the master database to user databases - something that'd presumably only ever happen occasionally when they've done heavy tag editing or ingested new music. If ingestion makes it easy to identify what's newly ingested then updates should be limited in scope and relatively quick. Equally purging or updating records can be flagged to limit scope of changes to only those affected records.

Under your suggestion it should also be a non issue. Each user simply initiates a refresh on demand from their client.

Under what we where talking about you had a common DB for users. Where they would have common tracks. How in say in a request of albums. Meant to get both the albums in there personal collection and the ones in the common collection in the same query?

Also your original idea of master and separate DB per user then they sync from master is also just as intensive. Espically when trying to find orphan data that isn't in the master DB anymore

Where as my idea has a common db and a DB for each user common only contains items in the common directory. Rather your master DB which has all information in it

@112RG
Copy link
Contributor

112RG commented Jan 4, 2023

Reading navidrome/navidrome#192 and here sentriz/gonic#50 it might be easier to implement some way talked about in these

@audiomuze
Copy link
Author

Reading navidrome/navidrome#192 and here sentriz/gonic#50 it might be easier to implement some way talked about in these

I'll have a read and revert. I think before you settle on a solution the data architecture should be be bedded down both in terms of what it will enable users to do and in terms of what the server needs in order to efficiently manage the dataset. Some of the data points will come from tags, others need to come from the filesystem and the file attributes themselves e.g. file type, bit depth and sampling rate, last modded.

Changes to modtime of folder the file is in

On this that'd imply rescanning the entire folder whereas only one or two files in a folder may have been updated. Perhaps the modtime of folder is indicator of which folders to focus on, but rescanning is focused only on those files that have had a mod time change?

Roon has an option to display albums by mod date/time. It's very handy for being able to browse a collection in the order acquired. When I make changes to file tags, my tagger is typically set to preserve mode time/date so that the order isn't disrupted. My tagger then has the ability to add a second to mod time, which causes Roon to detect the change and rescan the associated files (they rely on inotify).

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

Under what we where talking about you had a common DB for users. Where they would have common tracks. How in say in a request of albums. Meant to get both the albums in there personal collection and the ones in the common collection in the same query?

I'd not understood it that way, I'd understood only:

  • master db and seperate db per user populated from master (what I'd proposed). Master has ingested all folders from all users into a single db. Each user db is in turn populated where path like x or path like y or path like z etc. where x, y & z are the filepaths specified by the user in their .toml
  • separate db's for each user, each involving a scan of the paths they specify in their .toml ((what I'd understood you'd proposed - apologies if I misunderstood, which from your comment below, is the case)

Also your original idea of master and separate DB per user then they sync from master is also just as intensive. Espically when trying to find orphan data that isn't in the master DB anymore

I'm not sure I agree. Here's why:

If, following the inaugural import scan, but preceding any update scans you add a trigger (that updates a field when a record is changed) to every pertinent table in the master db :

CREATE TRIGGER sqlmods
         AFTER UPDATE
            ON tablename
      FOR EACH ROW
          WHEN old.sqlmodded IS NULL
BEGIN
    UPDATE tablename
       SET sqlmodded = TRUE
     WHERE rowid = NEW.rowid;
END;

you'll automatically have a record in master of every record change in every table in master. db

Given you also have a UUID for every record in albums, artists, directories & songs, running an update query matching the UUID from master db to user db is not a complex update. The only additional update required would be a delete query on the user db where it has a uuid not present in the corresponding table in master db.

Once the updates are pushed you drop the trigger in master, clear the sqlmodded flags and don't reinstate the the trigger until the next update scan.

Where as my idea has a common db and a DB for each user common only contains items in the common directory. Rather your master DB which has all information in it

I think the same as I outlined above could be applied in this scenario?

@audiomuze
Copy link
Author

audiomuze commented Jan 4, 2023

I am not looking to make a tag editor since there are applications out there far better than what I could do. Basically, your tags are your own to edit. All metadata Deaftone will bring is Album / Artist metadata

I think before you settle on a solution the data architecture should be be bedded down both in terms of what it will enable users to do and in terms of what the server needs in order to efficiently manage the dataset. Some of the data points will come from tags, others need to come from the filesystem and the file attributes themselves e.g. file type, bit depth and sampling rate, last modded.

Following on from above comments, let's take a look at two tracks from well known albums. I've wiped all tags and then tagged using only Picard (and added one custom tag of my own to illustrate a point).

Popular Music example:

ACOUSTID_FINGERPRINT
ACOUSTID_ID=178704ed-ee31-4e51-bc72-9783c0fcd0f7
ALBUM=Purple Rain
ALBUMARTIST=Prince and The Revolution
ALBUMARTISTSORT=Prince and Revolution, The
ARRANGER=Lisa Coleman
ARRANGER=Prince
ARRANGER=The Revolution
ARTIST=Prince and The Revolution
ARTISTS=Prince
ARTISTS=The Revolution
ARTISTSORT=Prince and Revolution, The
ASIN=B0718ZCS6B
BARCODE=093624913207
CATALOGNUMBER=547374-2
COMPOSER=Prince
COMPOSERSORT=Prince
DATE=2017-06-23
DISCNUMBER=1
DISCSUBTITLE=The Complete Album (2015 Paisley Park remaster)
DISCTOTAL=4
ISRC=USWB10001880
ISRC=USWB10600305
ISRC=USWB11700470
LABEL=NPG Records
LABEL=Warner Bros. Records
LANGUAGE=eng
LYRICIST=Prince
MEDIA=CD
MUSICBRAINZ_ALBUMARTISTID=070d193a-845c-479f-980e-bef15710653e
MUSICBRAINZ_ALBUMARTISTID=4c8ead39-b9df-4c56-a27c-51bc049cfd48
MUSICBRAINZ_ALBUMID=627be6c3-b4f3-4460-8875-9357747979a7
MUSICBRAINZ_ARTISTID=070d193a-845c-479f-980e-bef15710653e
MUSICBRAINZ_ARTISTID=4c8ead39-b9df-4c56-a27c-51bc049cfd48
MUSICBRAINZ_RELEASEGROUPID=b93a7c47-a6d4-33f2-9034-53fdd991f4ba
MUSICBRAINZ_RELEASETRACKID=45f634da-e517-4309-b511-0427b4f0583e
MUSICBRAINZ_TRACKID=85a752f8-cbad-42e6-8828-bd81318e3c7d
ORIGINALDATE=1984-06-25
ORIGINALYEAR=1984
performer=Bobby Z. (drums (drum set))
performer=Bobby Z. (percussion)
performer=Brown Mark (bass guitar)
performer=Brown Mark (vocals (additional))
performer=David Coleman (cello)
performer=Doctor Fink (keyboard)
performer=Doctor Fink (vocals (additional))
performer=Lisa Coleman (keyboard)
performer=Lisa Coleman (vocals (additional))
performer=Novi Novog (viola)
performer=Novi Novog (violin)
performer=Prince
performer=Prince (vocals, lead)
performer=Suzie Katayama (cello)
performer=The Revolution
performer=Wendy Melvoin (guitar)
performer=Wendy Melvoin (vocals (additional))
PRODUCER=Prince
PRODUCER=The Revolution
RELEASECOUNTRY=US
RELEASESTATUS=official
RELEASETYPE=album
RELEASETYPE=soundtrack
SCRIPT=Latn
TITLE=Purple Rain
TOTALDISCS=4
TOTALTRACKS=9
TRACKNUMBER=9
TRACKTOTAL=9
VERSION=(CD & DVD - GC  Rhino  Warner Bros. #9362491320)[2496.0 kHz]

Classical music example:

TITLE=Concerto for Violin and Orchestra No. 2 in D major, K. 211
TRACKNUMBER=1
ACOUSTID_FINGERPRINT
ACOUSTID_ID=9896ee67-0bda-4462-b03a-8878a7d10686
MUSICBRAINZ_RELEASEGROUPID=a32bbca4-a3ad-3ba8-b3a7-5c2b23827b53
ORIGINALDATE=2005-10-14
ORIGINALYEAR=2005
RELEASETYPE=album
MUSICBRAINZ_ALBUMID=5819ce23-548f-407d-92cb-876727993a76
ASIN=B000AD1IQ2
LABEL=Deutsche Grammophon
CATALOGNUMBER=00289 477 5925
CATALOGNUMBER=0289 477 5925
SCRIPT=Latn
RELEASESTATUS=official
DATE=2005-11-01
ALBUM=The Violin Concertos / Sinfonia concertante
RELEASECOUNTRY=DE
BARCODE=028947759256
MUSICBRAINZ_ALBUMARTISTID=b972f589-fb0e-474e-b64a-803b0364fa75
MUSICBRAINZ_ALBUMARTISTID=206e3a6b-301f-4a0d-8632-7dfd7927cbcf
MUSICBRAINZ_ALBUMARTISTID=fa37018e-3557-4cb7-973f-555003c30174
ALBUMARTIST=Wolfgang Amadeus Mozart; Anne-Sophie Mutter, London Philharmonic Orchestra
ALBUMARTISTSORT=Mozart, Wolfgang Amadeus; Mutter, Anne-Sophie, London Philharmonic Orchestra
TOTALDISCS=2
DISCNUMBER=1
DISCSUBTITLE=Violin Concertos Nos. 2, 1, 5
MEDIA=CD
TOTALTRACKS=9
MUSICBRAINZ_TRACKID=665b689f-a4f2-4ec4-9135-a4fb1ab007b3
MUSICBRAINZ_ARTISTID=b972f589-fb0e-474e-b64a-803b0364fa75
ARTIST=Wolfgang Amadeus Mozart
ARTISTSORT=Mozart, Wolfgang Amadeus
ARTISTS=Wolfgang Amadeus Mozart
ISRC=DEN960500071
CONDUCTOR=Yuri Bashmet
ENGINEER=Ingmar Haas
ENGINEER=Wolf-Dieter Karwatky
PRODUCER=Reinhild Schmidt
PRODUCER=Ulrich Vette
LANGUAGE=zxx
COMPOSER=Wolfgang Amadeus Mozart
COMPOSERSORT=Mozart, Wolfgang Amadeus
MUSICBRAINZ_RELEASETRACKID=e10628ab-c5be-3124-8429-23ed592440e8
MOVEMENTNAME=Allegro moderato
MOVEMENT=1
WORK=Concerto for Violin and Orchestra no. 2 in D major, K. 211
MUSICBRAINZ_WORKID=5507316e-1299-49e2-a81a-3bbca43b763a
SHOWMOVEMENT=1
TRACKTOTAL=9
DISCTOTAL=2
performer=Boris Garlitsky (concertmaster)
performer=Anne-Sophie Mutter (violin (solo))
performer=London Philharmonic Orchestra (orchestra)

Looking at that data I'd anticipate a music server that intends to overcome the shortcomings we'd previously touched on would incorporate the following concepts:

Baseline assumptions:

  1. server's data model is impervious to albums moving around on disc whether that's through wholesale changes to storage location or changes to foldernames or filenames or some or all of these. This is key to retaining user listening history, play count etc.
  2. assume each folder that contains audio is an album on its own
  3. where you encounter foldername/cd1...cd2...cd3 etc. or multiple discnumbers in single folder assume it's a multi-disc album / box set. At this juncture discubtitle, discrete cover art for each disc and possible even different albumartists come into play, but within the confines of a box set / multi-disc album
  4. accept that users may have multiple versions of the same album, sometimes with identical folder names, but not the same path. Do not conflate these and merge them into a single album (Navidrome and many others do. Where this is the case provide a way to differentiate them (albumuuid) or leverage differentiation the user has provided by tags (this is where the VERSION tag comes into play in my metadata (because at least one app leverages it). The MusicBrainz equivalent is _releasecomment).
  5. support user tag mapping to server tag model e.g. in the example above enable VERSION in file tags to be mapped to _releasecomment within the server's data model
  6. albumartists table containing every unique albumartist
  7. albums table containing every unique album name
  8. performers table containing every unique performer (anticipate albumartists would represent a subset of performers)
  9. compositions table containing every unique composition
  10. composers table containing every unique composer and a boolean to indicate whether classical composer
  11. labels table containing every unique record label
  12. producers table containing every unique producer
  13. performances/tracks/songs table (what you've called songs) setting out every unique performance
  14. credits table mapping performers to roles on particular performances (so there's a need to parse the performer tag
  15. I'd anticipate that the following would be browseable/ searchable objects within the app:
  • albumartists
  • performers / artists
  • albums
  • composers (differentiating classical from non-classical)
  • compositions
  • works
  • labels
  • genres
  • producers
  • engineers

I'd also anticipate that the app would handle presentation of Classical music appropriately as opposed to how MusicBrainz have retrofitted tagging of Classical music by stuffing the Composer name in as the first entry in albumartist and artist fields (should be disregarded in those fields).

  • Sometimes you like a songwriter and you want to find what they’ve written in your lib = composer / lyricist
  • Other times you might want to find all performances of a song/composition. Finding all title and composer matches allows you to browse that across the lib, regardless of who the performer is. Implementing the concept of compositions and linking tracks to the composition makes this even simpler
  • Same can be said for Label - Some labels tend to sign similar artists. Being able to browse all albums released under a particular label is handy if you’re looking for something to listen to but not sure what
  • Similarly producers - there are producers who I generally find make albums I like, regardless of albumartist. So browsing by producer is a great way to discover stuff in lib.
  • File type, bit depth, sampling rate and gender should also be able to be selected when browsing to be able to find specific types of content . As an example on the gender front it should be possible to filter a browse by selecting say some genres, perhaps a few years or decades and specify only selected genders where they're not a band.

Coming back to tag mapping and functionality that leverages tags where they exist, there are a number of fields I think are key requirements to enable users to be able to interact with their music collections (particularly where some form of DJ, dynamic playlisting etc. comes into the picture:

  • albumrating (smallint). Enables user to rate album, browse/filter by rating as criterion (>;>=;=<=;<)
  • bootleg (boolean). Differentiates official albums from bootlegs, useful for when listing albums. Could mean adding Bootleg entry to releasetype in db and simply mapping during ingestion
  • compilation (boolean) . Definitively determines whether or not album is considered a Various Artists album and thus assigned to albumartist = "Various Artists"
  • discsubtitle (to be displayed alongside discnumber when browsing multi-disc albums and box sets. May leverage different cover art also
  • explicit (boolean). Enables avoiding of any explicit content in user profile (e.g. kids profile), DJ, dynamic playlisting or even album playback (clearly a configurable behaviour attached to user profile)
  • live (boolean) . Denotes live performance.
  • releasetype. Enables performer content to be organised by Albums, EPs, Singles, Bootlegs, Compilations, Appearances, Various Artists
  • albumtag. Enables users to add custom tag entries that enables albums to be browsed by tag. e.g. "Music from when I lived in Oz";"Albums introduced by Rylee" etc.
  • radioban
  • tracktag. Same story as albumtag, applied to tracks.
  • style. Map to genre tag
  • subtitle. Append to track title whenever track title is displayed e..g. Live & Let Die [Live in Wherever]; Title [XYZ remix] etc.
  • mood. Keywords that populate a mood table to enable browsing
  • theme. Keywords that populate a theme table to enable browsing
  • trackrating. Same as albumrating but applies to tracks
  • version. Map to _releasecomment and append to album name wherever album name is displayed. Enables user to find specific release of an album and also avoids combining them and having 5 track 1, track 2 etc.
  • writer. Map to composer

Of course all of the above serves not only to enrich browsing, discovery and actively engaged listening, they also serve as great seed for dynamic playlists, DJ and the like.

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

Reading navidrome/navidrome#192 and here sentriz/gonic#50 it might be easier to implement some way talked about in these

Have had a read. Unless I've misunderstood it'll add constraints to every SQL query (particularly the Navidrome proposal). It feels to me like they're retrofitting. You have the opportunity to begin with the end in mind. What are your thoughts on tracking changes to master via a trigger, which then enables update queries based only on matching uuid's where there's been a change, which should be clean and efficient.

Architecturally I think of the master db as the central repository. Server keeps track of the last update date/time to each user db.

Any filesystem changes, additions and deletions are picked up by a single scanning operation which only updates the master repository. Additions are added and flagged, deletions moved to holding tables for running a uuid based update query on each user DB. Until all user DB's have been updated, records in holding tables persist. Post update they're purged. Server sets notification flag for each affected user profile. Admin can "push" updates or users can "pull" updates.

If approached this way should it not alleviate any performance concerns? Updates are unlikely to be a daily affair in any event?

@112RG
Copy link
Contributor

112RG commented Jan 5, 2023

Reading navidrome/navidrome#192 and here sentriz/gonic#50 it might be easier to implement some way talked about in these

I'll have a read and revert. I think before you settle on a solution the data architecture should be be bedded down both in terms of what it will enable users to do and in terms of what the server needs in order to efficiently manage the dataset. Some of the data points will come from tags, others need to come from the filesystem and the file attributes themselves e.g. file type, bit depth and sampling rate, last modded.

Changes to modtime of folder the file is in

On this that'd imply rescanning the entire folder whereas only one or two files in a folder may have been updated. Perhaps the modtime of folder is indicator of which folders to focus on, but rescanning is focused only on those files that have had a mod time change?

Roon has an option to display albums by mod date/time. It's very handy for being able to browse a collection in the order acquired. When I make changes to file tags, my tagger is typically set to preserve mode time/date so that the order isn't disrupted. My tagger then has the ability to add a second to mod time, which causes Roon to detect the change and rescan the associated files (they rely on inotify).

Well based on the assumption that most of the time you have albums in separate folders. Its fine to scan the entire folder again. Rather then keep track of each files modtime. I don't want to use inotify. Then that would require you to have deaftone running when you do changes to the files. Also I have created_at and updated_at collum in the DB. So you can still browser via recently acquired only if the file changes. If the path changes it will be a new entry in the DB so will get a new created_at time.

Getting deaftone to work properly with classical music. Is something to be worked on after we have a solid scanner written. Since again there a lot of edge cases for classical.

Again albums are specific to path. So albums with the same names are fine

sqlx::query("SELECT * FROM albums WHERE name = ? AND path = ?")

I don't think we need separate tables for performance etc. We can just have 1 table called songs filled with all the metadata from the songs.

Supporting listening history correctly when files change is going to be hard. I wonder if listening history should be tracked to SONG_TITLE + ALBUM_TITLE + RELEASE_YEAR or something. Do you have any ideas? I don't want to track it via path

Reading navidrome/navidrome#192 and here sentriz/gonic#50 it might be easier to implement some way talked about in these

Have had a read. Unless I've misunderstood it'll add constraints to every SQL query (particularly the Navidrome proposal). It feels to me like they're retrofitting. You have the opportunity to begin with the end in mind. What are your thoughts on tracking changes to master via a trigger, which then enables update queries based only on matching uuid's where there's been a change, which should be clean and efficient.

Architecturally I think of the master db as the central repository. Server keeps track of the last update date/time to each user db.

Any filesystem changes, additions and deletions are picked up by a single scanning operation which only updates the master repository. Additions are added and flagged, deletions moved to holding tables for running a uuid based update query on each user DB. Until all user DB's have been updated, records in holding tables persist. Post update they're purged. Server sets notification flag for each affected user profile. Admin can "push" updates or users can "pull" updates.

If approached this way should it not alleviate any performance concerns? Updates are unlikely to be a daily affair in any event?

How are you going to search say the master and mum DB? You would have to do a join on 2 databases correct? That is where I am saying performance comes into play

@audiomuze
Copy link
Author

How are you going to search say the master and mum DB? You would have to do a join on 2 databases correct? That is where I am saying performance comes into play

Simple - every user has the complete database that comprises the albums within the tree path's they defined. The only scanning that is ever done is to master. Master the populates user databases... insert into uuid.sqlite.table where path like ...

@112RG
Copy link
Contributor

112RG commented Jan 5, 2023

How are you going to search say the master and mum DB? You would have to do a join on 2 databases correct? That is where I am saying performance comes into play

Simple - every user has the complete database that comprises the albums within the tree path's they defined. The only scanning that is ever done is to master. Master the populates user databases... insert into uuid.sqlite.table where path like ...

See I think that's gets tricky when say 2 users have the same artist etc. Then wouldn't our implementation be almost exactly the same as what was talked about in the navidrome thread IE

It's not a super trivial thing. I think, from the top of my head, there's at least the following steps (please correct me if I'm wrong):

Create new database table base_folder with id (a UUID), path ("/path/to/folder1") and name (to give it a fancy name "Harry's Audiobooks Folder")

Write the server code to pick up the (comma separated?) list of base folders from the configfile and populate this table

New database table base_folder_access that maps users to base folders, with columns user_id and base_folder_id (the primary keys of these tables)

Create a folders API endpoint to allow the web UI to read/modify this table

Write the web UI code to allow admins to populate this table (ie, assign base folders to users, enter the fancy name), but DO NOT ALLOW THE API TO CREATE A NEW BASE FOLDER - SECURITY RISK!

Add to the media_file table:

new field base_folder_id, the UUID that references the base folder
new field relative_path ("/path/to/mediafile.mp3") that stores the relative path of the file, constructed by stripping the base > folder path from the mediafile's path field (which would be "/path/to/folder1/path/to/mediafile.mp3")

Change the scanner so it populates these fields correctly, and ideally "intelligent" enough that when the base folder changes but > the relative path stays the same (ie, a user moves his whole music folder to a new location), ND doesn't create new records for > each song but just updates the base_folder_id and path.

Change all the relevant SQL queries to filter for only the base folders that the user has access to (that's relatively trivial for songs, > but you also want to filter out all artists and albums (and genres) that the user has no access to.

Its basically the same except new we need to manage syncs across databases

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

I don't think we need separate tables for performance etc. We can just have 1 table called songs filled with all the metadata from the songs.

This implies it'll be very difficult to deliver what R does e.g. browse all performances of a composition ? by unless users stuff the credits in every instance of every performance through manual tagging. Not a train smash as long as the DB is accessible and can be enriched through queries, even if via SQLite directly.

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

Supporting listening history correctly when files change is going to be hard. I wonder if listening history should be tracked to SONG_TITLE + ALBUM_TITLE + RELEASE_YEAR or something. Do you have any ideas? I don't want to track it via path

If the path changes it will be a new entry in the DB so will get a new created_at time.

You're already generating a UUID for each track, album etc. Create an exportable list of uuid and path for each track and tell users that if they write that UUID to their file tags, their entire library, listening history etc. is impervious to changes in location, filename etc. Those that want it opt in and take care of it on the side, those that don't, live with compromise.

@112RG
Copy link
Contributor

112RG commented Jan 5, 2023

I don't think we need separate tables for performance etc. We can just have 1 table called songs filled with all the metadata from the songs.

This implies it'll be very difficult to deliver what R does e.g. browse all performances of a composition ? by unless users stuff the credits in every instance of every performance through manual tagging. Not a train smash as long as the DB is accessible and can be enriched through queries, even if via SQLite directly.

I see why no music app has done true multi-user libraries. Like this and basically done separate libraries

Maybe we should settle on browsing each library differently. Like plex

@112RG
Copy link
Contributor

112RG commented Jan 5, 2023

Supporting listening history correctly when files change is going to be hard. I wonder if listening history should be tracked to SONG_TITLE + ALBUM_TITLE + RELEASE_YEAR or something. Do you have any ideas? I don't want to track it via path

If the path changes it will be a new entry in the DB so will get a new created_at time.

You're already generating a UUID for each track, album etc. Create an exportable list of uuid and path for each track and tell users that if they write that UUID to their file tags, their entire library, listening history etc. is impervious to changes in location, filename etc. Those that want it opt in and take care of it on the side, those that don't, live with compromise.

Again I don't really to have custom tags

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

See I think that's gets tricky when say 2 users have the same artist etc. Then wouldn't our implementation be almost exactly the same as what was talked about in the navidrome thread IE

Not sure I'm following. Let's say universe is /common, /davetunes /jontunes, /newstuff.

Jon chooses /common and /jontunes.
Dave chooses /common and /davetunes
Mary chooses /newtunes

Server scans and ingests /common, /davetunes /jontunes, /newstuff.

On initial setup of users server runs

insert into jon.sqlite.tablex from deaftone.sqlite.tablex where deaftone.sqlite.tablex.path like '/jontunes%'
insert into dave.sqlite.tablex from deaftone.sqlite.tablex where path like '/common%' or path like '/davetunes%'
insert into mary.sqlite.tablex from deaftone.sqlite.tablex where path like ''/newtunes%'

After that's done for each relevant table each user now has their own db and does their own thing. Deafton now only interacts with their particular database without a need to reference deaftone.sqlite again.

Where music is added, altered or deleted Deaftone runs an update scan against deaftone.sqlite (and the sqlmod trigger is in play as described earlier). deaftone.sqlite is once again the source of truthall user databases now need an update from it. Easily achieved by reference to altered record uuids. This kind of update query shouldn't take long at all - you could even just recreate each user's tables other than playback history if that was a quicker update (albeit I'd think not).

This whole scenario is an occasional thing, not a daily occurence.

Change all the relevant SQL queries to filter for only the base folders that the user has access to (that's relatively trivial for songs, > but you also want to filter out all artists and albums (and genres) that the user has no access to.

Its basically the same except new we need to manage syncs across databases

My way you're never filtering out. Navidrome's way you're forever filtering out - impacting performance at every select statement.

@audiomuze
Copy link
Author

I don't think we need separate tables for performance etc. We can just have 1 table called songs filled with all the metadata from the songs.

This implies it'll be very difficult to deliver what R does e.g. browse all performances of a composition ? by unless users stuff the credits in every instance of every performance through manual tagging. Not a train smash as long as the DB is accessible and can be enriched through queries, even if via SQLite directly.

I see why no music app has done true multi-user libraries. Like this and basically done separate libraries

Maybe we should settle on browsing each library differently. Like plex

The one has nothing to do with the other - one or multiple users would face the same limitations.

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

Again I don't really to have custom tags

I'm not suggesting you take responsibility for embedding them in file tags, just that you give users the data to be able to do so themselves with their own tagger. There's no other viable method that's impervious to change.

The app that is going to be the best out there is the one that's prepared to say to hell with it, for this to work there are some things that aren't negotiable, because the benefit by far outweighs the cost. Without the odd custom metadata field that survives a db rebuild and/or metadata and/or filesystem change whatever is done will inevitably be a compromise and a fallible one at that.

@112RG
Copy link
Contributor

112RG commented Jan 5, 2023

Is there a problem with having it? Per user databases and no common browsing like plex does it?

Having multiple connections to multiple databases is a pain. And using SQL ATTACH. Is not supported with SeaORM which is what I am using on the web handlers. And I didn't really wanna stray away from it as it allows me to have cleaner code on the web handlers.

When we get into the realm of multiple databases I think it starts to be complicated and I think there are edge cases we haven't thought about and which adds a whole layer of complexity to changes down the line. Having artists that aren't artists you listen to in your library aren't a distraction at all. It won't cause issues with DJ or playlist generation since we can filter out most of the other music with genre etc. Or a recommendation engine like LMS. If I am totally wrong about this and there has been other people talking about seeing music being a distraction please link some Reddit threads or something for me to read about it

Since the whole idea of master and mum, dad. Databases are based on not wanting to see the music of others.

Even with multiple libraries like Plex. SeaORM is the limiting factor for this since it cant support databases using the attach command and again I don't really wanna stray away from it in the web handler since it really helps keep the code clean. And I am talking about getting a QueryFilter added currently SeaQL/sea-orm#1360 (reply in thread)

Cant all these issues of smart playlist generation and viewing other people's music be easily fixed with genre-based filtering?

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

Is there a problem with having it? Per user databases and no common browsing like plex does it?

Nope. If each user defines their own folders and there happens to be crossover then it simply means that when each user does their own ingestion they're ingesting things that have previously been ingested by another user. Depending on how you organise the database and handle updates someone like me might ingest everything then make copies of that DB for different user profiles and run a delete query for all out of scope stuff followed by a vacuum operation. I suspect that'd be faster than re-ingesting common content for every user where a large collection is concerned. Clearly that kind of behaviour would be deemed outlier.

Cant all these issues of smart playlist generation and viewing other people's music be easily fixed with genre-based filtering?

Possibly, assuming they agree on what belongs in what genres.

I think drop the concept of a master db, have each scan whatever they define as the paths / devices they want in-scope and they get their own that lives independently.

@audiomuze
Copy link
Author

audiomuze commented Jan 5, 2023

Just as an aside for my own edification I ran a little experiment with deaftone.sqlite having 650k and 14.4k records in the songs and albums tables respectively. I used .schema output to create an identically structured database am.sqlite and then ran the following:

sqlite3 deaftone.sqlite
attach 'am.sqlite' as am;
insert into am.artists select * from artists;
insert into am.songs select * from songs;

collectively it didn't even take 2 seconds to replicate.

When we get into the realm of multiple databases I think it starts to be complicated and I think there are edge cases we haven't thought about and which adds a whole layer of complexity to changes down the line. Having artists that aren't artists you listen to in your library aren't a distraction at all. It won't cause issues with DJ or playlist generation since we can filter out most of the other music with genre etc. Or a recommendation engine like LMS. If I am totally wrong about this and there has been other people talking about seeing music being a distraction please link some Reddit threads or something for me to read about it

Will take a gander and see if I can dig out some forum posts etc. Personally I think it's naive to believe genres can solve not seeing stuff you don't want in your view of a lib. Genres is one of those nebulous topics in music that everyone will have a different view on - both in terms of the genres they consider valid and want to use and in terms of what artists and tracks belong to what genre. Some people want to assign their genre scheme at track level, others at album, others both, and they'll all have different views on what a valid genre taxonomy comprises.

@112RG
Copy link
Contributor

112RG commented Feb 1, 2023

@audiomuze I have updated the album and artist models. Please see if they detect the right tags on your songs

@audiomuze
Copy link
Author

@audiomuze I have updated the album and artist models. Please see if they detect the right tags on your songs

pulled, compiled and ran latest code. No attempt is made to ingest contents from the media directory:

Version: 0.0.3 | Media Directory: /qnap/qnap2/zzl/ | Database: ./deaftone.sqlite
2023-02-17T08:14:27.617045Z  INFO deaftone: listening on 0.0.0.0:3030
2023-02-17T08:14:27.618559Z  INFO deaftone::services::task: Started task service

@112RG
Copy link
Contributor

112RG commented Feb 17, 2023

Oh sorry about that I am in the middle of writing a TaskServer so it doesn't by default start scanning

@audiomuze
Copy link
Author

Let me know when done and I'll reattempt.

@audiomuze
Copy link
Author

audiomuze commented Jul 10, 2023

@112RG, this sorted now?

@112RG
Copy link
Contributor

112RG commented Jul 10, 2023

Yes you can check here https://github.com/Deaftone/Deaftone#installation you will need to build from source though

@audiomuze
Copy link
Author

@112RG I'm ready to take a proper look at this now if you'd like. I think it'd be useful to take an album with tons of good metadata on MusicBrainz e.g. Fleetwood Mac - Rumours tag it using Picard (incorporating the plugins that pull the additional metadata) and then ensuring the data model and tag import caters to it. Also learned a little re how MB tags can be used to aid disambiguation of performers, albums etc.. Let me know if you're keen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants