Skip to content

Support storing multivalued composer/lyricist/arranger MBIDs in the database #5698

@Maxr1998

Description

@Maxr1998

Motivation

Currently, beets stores the names of writers and arrangers in the database and the audio file tags. I'm currently using these to build automated playlists (with the smartplaylist plugin) based on these writers. However, this has several issues:

  • Names of these writers are not necessarily unique, creating collisions and thus including unrelated tracks in my playlists
  • Matching these single-valued tags makes it even more complex, since some names might have substring overlaps, but you can't do exact matches (because there can be multiple writers in a single field), and handling the separator char (comma) with RegEx is unnecessarily complex.

Proposal

Thus, I suggest adding new fields to the database and possibly to files:

  • mb_composerids
  • mb_lyricistids
  • mb_arrangerids

All of these should be multivalued fields for future proofing. Adding legacy fields that simply join the values with a separator would be possible, but isn't strictly necessary in my opinion.

Implementation concepts

As a plugin

This feature could be implemented as a plugin, in theory:

Plugin implementation
from beets.plugins import BeetsPlugin
from beets.dbcore import types

MB_COMPOSERIDS = "mb_composerids"
MB_LYRICISTIDS = "mb_lyricistids"
MB_ARRANGERIDS = "mb_arrangerids"


class WriterMbidsPlugin(BeetsPlugin):
    item_types = {
        MB_COMPOSERIDS: types.MULTI_VALUE_DSV,
        MB_LYRICISTIDS: types.MULTI_VALUE_DSV,
        MB_ARRANGERIDS: types.MULTI_VALUE_DSV,
    }

    def __init__(self):
        super().__init__()
        self.register_listener("mb_track_extract", self.track_extract)

    @staticmethod
    def track_extract(data):
        recording = data
        composer_mbids = []
        lyricist_mbids = []
        arranger_mbids = []

        for work_relation in recording.get("work-relation-list", ()):
            if work_relation["type"] != "performance":
                continue

            work = work_relation["work"]
            for artist_relation in work.get("artist-relation-list", ()):
                if "type" in artist_relation:
                    artist_type = artist_relation["type"]
                    if artist_type == "composer":
                        composer_mbids.append(artist_relation["artist"]["id"])
                    elif artist_type == "lyricist":
                        lyricist_mbids.append(artist_relation["artist"]["id"])

        for artist_relation in recording.get("artist-relation-list", ()):
            if "type" in artist_relation:
                artist_type = artist_relation["type"]
                if artist_type == "arranger":
                    arranger_mbids.append(artist_relation["artist"]["id"])

        info = {}
        if composer_mbids:
            info[MB_COMPOSERIDS] = composer_mbids
        if lyricist_mbids:
            info[MB_LYRICISTIDS] = lyricist_mbids
        if arranger_mbids:
            info[MB_ARRANGERIDS] = arranger_mbids
        return info

However, there's one issue currently preventing this: plugins cannot use multivalued fields because beets doesn't handle those for flexible fields:

Stacktrace
…
  File "/usr/lib/python3.13/site-packages/beets/dbcore/db.py", line 679, in add
    self.store()
    ~~~~~~~~~~^^
  File "/usr/lib/python3.13/site-packages/beets/library.py", line 396, in store
    super().store(fields)
    ~~~~~~~~~~~~~^^^^^^^^
  File "/usr/lib/python3.13/site-packages/beets/dbcore/db.py", line 616, in store
    tx.mutate(
    ~~~~~~~~~^
        "INSERT INTO {} "
        ^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        (self.id, key, value),
        ^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/beets/dbcore/db.py", line 987, in mutate
    cursor = self.db._connection().execute(statement, subvals)
sqlite3.ProgrammingError: Error binding parameter 3: type 'list' is not supported

From what I understand, this is because beets doesn't use the sql type conversions for flexible fields, only for built-in fields:

Current code
# dbcore.pydef store(self, fields: Optional[Iterable[str]] = None):
        …
        for key in fields:
            if key != "id" and key in self._dirty:
                self._dirty.remove(key)
                assignments.append(key + "=?")
                value = self._type(key).to_sql(self[key]) # !!! TYPE IS CONVERTED !!!
                subvars.append(value)
        with db.transaction() as tx:
            # Main table update.
            if assignments:
                query = "UPDATE {} SET {} WHERE id=?".format(
                    self._table, ",".join(assignments)
                )
                subvars.append(self.id)
                tx.mutate(query, subvars)

            # Modified/added flexible attributes.
            for key, value in self._values_flex.items():
                if key in self._dirty:
                    self._dirty.remove(key)
                    tx.mutate(
                        "INSERT INTO {} "
                        "(entity_id, key, value) " # !!! NO TYPE CONVERSION !!!
                        "VALUES (?, ?, ?);".format(self._flex_table),
                        (self.id, key, value),
                    )
        …

To fix this, the value could be converted to SQL for flexible fields as well:

diff --git a/beets/dbcore/db.py b/beets/dbcore/db.py
old mode 100644
new mode 100644
index 4645d4b..53c462d
--- a/beets/dbcore/db.py
+++ b/beets/dbcore/db.py
@@ -611,6 +611,7 @@ class Model(ABC):
             for key, value in self._values_flex.items():
                 if key in self._dirty:
                     self._dirty.remove(key)
+                    value = self._type(key).to_sql(value)
                     tx.mutate(
                         "INSERT INTO {} "
                         "(entity_id, key, value) "

As a core feature

Alternatively, implementation as a core feature could also work. This would require a few changes to the Item model, but could otherwise be handled mostly in the mbsync plugin. The "bug" described above wouldn't impact this implementation, but should probably still be addressed.

Implementation

I'm interested in implementing this myself, please tell me which option you prefer, and I will open a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions