Skip to content

Possible data corruption with memcached #525

@raivil

Description

@raivil

Hi!
First of all thanks for the great work on this gem.
Here's my situation/scenario: I'm deploying a Rails 7 app to AWS and using Elasticache Memcached as the store for Identity cache, rails cache and cache store.

The app runs as expected, users are able to access, the data is cached into memcached and I can see the usage metrics (bytes read/write, conn numbers, cpu, unused memory, etc)
After some time, the app started raising a lot of exceptions and users were no longer able to access it.
The app was barely used (testing infrastructure), and memcached metrics/health looks ok (lots of spare resources).
Some examples of exceptions:

  • Dalli::UnmarshalError: Unable to unmarshal value: marshal data too short
  • Dalli::DalliError: Response error: "?x??UOk$E7*?
  • Dalli::DalliError: Response error: VA
  • Dalli::DalliError: Response error: NS
  • Dalli::DalliError: Response error: ".\u0000[:idc_cached_nilf1670773635.8781602

Cleaning the memcached data fixed the issue, but it happened again the next day.

What could be causing this? I've double checked configs everywhere (sample below) and no clue so far on what could be causing it. I'm moving the data from identity cache to an isolated memcached instance.

Any clues? Thoughts?

Thank you!

gems:
dalli (3.2.3)
identity_cache (1.2.0)
rails (7.0.4)

memcached version 1.6.17 (docker version running on CI) and on AWS Elasticache (1.6.16)

app configs:

  config.session_store :cache_store,
                       key: "the_session_key",
                       expire_after: 24.hours

  config.cache_store = :mem_cache_store, "memcached.server.url",
                       { pool: { size: 30 }, protocol: :meta, expires_in: 24.hours,
                         failover: false } # avoids more cache consistency issues

  config.action_policy.cache_store = :mem_cache_store, "memcached.server.url",
                                     { pool: { size: 30 }, protocol: :meta, expires_in: 1.hour,
                                       failover: false } # avoids more cache consistency issues
  config.identity_cache_store = :mem_cache_store, "memcached.server.url",
                                { pool: { size: 30 }, protocol: :meta, expires_in: 6.hours,
                                  failover: false } # avoids more cache consistency issues

Update 1:
Issues happen when updating the model classes and doing a new deploy.

Example stack trace

Error message:
no implicit conversion of Symbol into Integer
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
…dle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
…ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions