Skip to content

Migration 20260403000001 introduces cross-table collation mismatch -> Error 1267 on /v1/sidebar/sync #150

@an9xyz

Description

@an9xyz

Summary

modules/category/sql/20260403000001_category_legacy01.sql introduces a JOIN key between two tables with mismatched collations. When MySQL 8 actually has to compare the two columns (i.e. the JOIN returns rows), it errors out with Error 1267 (HY000): Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '=', breaking /v1/sidebar/sync (fail-closed for the follow tab).

Symptom

Production log:

{"level":"error","msg":"【Sidebar】sidebar sync: category query failed (follow tab fail-closed)",
 "error":"Error 1267 (HY000): Illegal mix of collations (utf8mb4_0900_ai_ci,IMPLICIT) and (utf8mb4_general_ci,IMPLICIT) for operation '='"}
[GIN] | 400 | 131.347337ms | POST "/v1/sidebar/sync"

Query (modules/message/db_group_category.go:65, QueryCategorySettingsByGroupNos):

SELECT gs.group_no, gs.category_id, IFNULL(gs.category_sort,0), IFNULL(gc.sort,0)
  FROM group_setting gs
  LEFT JOIN group_category gc
    ON gs.category_id = gc.category_id   -- collation mismatch
   AND gs.uid         = gc.uid           -- collation mismatch
   AND gc.status != 2
 WHERE gs.group_no IN (?) AND gs.uid = ?

Root cause

modules/category/sql/20260403000001_category_legacy01.sql does two things in the same file:

  1. CREATE TABLE group_category ... COLLATE=utf8mb4_general_ci (explicit, follows the historical convention)
  2. ALTER TABLE group_setting ADD COLUMN category_id VARCHAR(32) ...no explicit COLLATE, so the column inherits the table's collation, which on MySQL 8 prod databases is utf8mb4_0900_ai_ci (silent MySQL 8 default applied to many legacy tables that were also created without an explicit COLLATE)

Result: group_setting.category_id is utf8mb4_0900_ai_ci, group_category.category_id is utf8mb4_general_ci. The JOIN compiles, runs, and works for any row where one side returns no match — and only errors out when both sides actually produce comparable rows. That is why CI/unit tests do not catch it: the fixtures don't drive the offending row combination.

Reproduction

-- on a fresh MySQL 8 prod-like DB after running all migrations
SELECT gs.group_no, gs.category_id
  FROM group_setting gs
  LEFT JOIN group_category gc
    ON gs.category_id = gc.category_id AND gs.uid = gc.uid
 WHERE gs.uid = '<any uid that has both a group_setting row and a group_category row with matching category_id>';
-- -> ERROR 1267 (HY000)

Scope of the underlying drift

Audit on our prod (im_prod):

  • ~60 tables are utf8mb4_0900_ai_ci (legacy tables created without explicit COLLATE on MySQL 8): user, group, group_setting, message, message_extra, friend, device, ...
  • ~22 business tables are utf8mb4_general_ci (newer tables that did set COLLATE per the documented convention): group_category, user_oidc_identity, thread, thread_member, thread_setting, user_api_key, user_verification, user_conversation_ext, ...

Any future JOIN between the two groups will explode the same way. group_category × group_setting is the one that surfaced first because /v1/sidebar/sync actively exercises it.

Suggested fixes (pick one)

Option A — Patch the offending migration (cleanest going forward)

Update 20260403000001_category_legacy01.sql so the new column matches the table it is being added to:

ALTER TABLE `group_setting`
  ADD COLUMN `category_id` VARCHAR(32) DEFAULT NULL COLLATE utf8mb4_0900_ai_ci COMMENT '用户自定义类别ID';

This only helps fresh installs / dev. Existing databases that already ran the migration still have the wrong collation and need an additive forward-repair migration.

Option B — Add a forward-repair migration

A new migration that explicitly aligns the JOIN sides. Two valid directions; the right one depends on the project-wide convention you actually want:

-- Direction 1: align new general_ci tables down to the prod-majority 0900_ai_ci
ALTER TABLE `group_category` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;

-- Direction 2: align old 0900_ai_ci tables up to the documented general_ci
ALTER TABLE `group_setting` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

We picked Direction 1 in our prod (5,434 rows, ~1s) because fewer tables to change and matches the majority of legacy tables. A full sweep in the other direction would touch 60+ tables including hot tables like message*.

Option C — Patch the query

Add COLLATE utf8mb4_general_ci casts in db_group_category.go:65. Treats one symptom; does not fix the underlying drift and risks defeating idx_uid_category on group_setting.

Recommendations

  1. Apply Option A + a Direction-1 forward-repair migration, scoped to the minimum set of tables (the new general_ci ones that have a JOIN partner on the 0900_ai_ci side). Lowest blast radius, fixes prod, keeps the bigger sweep out of the hot path.
  2. Document the actual convention. The codebase comments say utf8mb4_general_ci, but the runtime majority is utf8mb4_0900_ai_ci. Pick one and update the migration style guide so future CREATE TABLE / ADD COLUMN are explicit.
  3. CI lint: reject any CREATE TABLE / ADD COLUMN <varchar/text> in *.sql migrations that lacks an explicit COLLATE clause. This is the recurring root cause.

Environment

  • MySQL 8.0.x
  • octo-server main (commit at the time of investigation: a8f56d15 Merge branch 'test' into 'main')

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:P1High — this sprinttype:bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions