The other day I was struggling with a very weird error when upgrading to Ruby 3. The initial migrations for a Rails app would fail with “Mysql2::Error: Unknown MySQL error (ActiveRecord::StatementInvalid)”, but only in certain environments. The error would occur when Rails tried to check what migrations are already applied by looking at the schema_migrations table.

2022-01-19 09:12:59.205 +0000 [DEBUG]    (1.7ms)  SELECT GET_LOCK('4252831219231700070', 0)
/usr/local/bundle/gems/mysql2-0.5.3/lib/mysql2/client.rb:131: warning: rb_tainted_str_new_cstr is deprecated and will be removed in Ruby 3.2
2022-01-19 09:12:59.222 +0000 [DEBUG]    (2.0ms)  SELECT `schema_migrations`.`version` FROM `schema_migrations` ORDER BY `schema_migrations`.`version` ASC
2022-01-19 09:12:59.224 +0000 [DEBUG]    (1.8ms)  SELECT RELEASE_LOCK('4252831219231700070')
rails aborted!
ActiveRecord::StatementInvalid: Mysql2::Error: Unknown MySQL error
/usr/local/bundle/gems/mysql2-0.5.3/lib/mysql2/client.rb:131:in `_query'

The error could only be reproduced in environments that had long-living databases (dev databases only live as long as the environment they belong to). It wouldn’t occur even if I duplicated a database from scratch using CREATE TABLE LIKE and INSERT ... SELECT statements. I checked the Rails source code for unintentional breakage too, but there was nothing suspicious (not to mention it did work with fresh databases without issue). This made me suspect that there would be some kind of incompatibility between old and new versions used… at some layer.

While the “Unknown MySQL error” isn’t the most helpful error message, I was soon pointed to a post about dealing with a similar error (in Django). Since the error was happening on queries and not on auth, the query cache issue became my focus. There are a few tickets on both the MySQL and the MariaDB trackers for similar issues (MySQL #86318, #84639, MBDEV-27063), sadly all “unresolved” or “won’t fix”.

I realized that the root cause for the issue is that in the new Ruby 3 environment the library underlying the mysql2 Ruby gem changed from libmysqlclient (which was used on the Ruby 2 Ubuntu environment) to libmariadb (used on the new Ruby 3 Debian env). While I thought the two are compatible, as apparent from the tickets above, sometimes the same query cache corruption resulting in “malformed packet” errors occurs between versions of just libmysqlclient as well. Going back to Ubuntu was not feasible due to other library compatibility reasons, so I had to figure out how to fix the cache.

The post suggests just disabling the query cache, but I was hesitant to pull that plug since I didn’t know what would be the performance penalties (which, as it turns out aren’t significant at all). So I first checked for a way to just flush the cache, and indeed there was RESET QUERY CACHE and FLUSH TABLES. The former “just” clears the cache, which was what I needed. I confirmed on one of the repro environments that RESET QUERY CACHE indeed fixes the problem (might need a client reconnect).

It however did not fix the issue in production, or rather say it only fixed it partially. Production being under constant load, queries that occurred in the old released environment first would still get cached in incompatible ways. At first I suspected that maybe RESET QUERY CACHE wasn’t enough and tried FLUSH TABLES, but this had side effects that make it highly unpreferable in production. Namely, as noted in its docs (which I noticed way too late), it “forces all tables in use to be closed” which also involved force aborting queries to said tables. Running this in a production env with sufficient load will very likely result in users seeing 500 errors (“Mysql2::Error::ConnectionError: Lost connection to MySQL server during query (ActiveRecord::StatementInvalid)”) which is obviously not desirable.

Since the old, live env would keep producing incompatible cache (and thus errors in the new environment), the following operation was put in place:

  1. disable query cache with query_cache_limit = 0. This as a side effect clears the query cache
  2. confirm that the Ruby 3 environment works parallel to the live Ruby 2 env
  3. release the Ruby 3 environment and retire the Ruby 2 one
  4. re-enable query cache by restoring the Aurora parameter group to its previous value

I suspect that the reason this issue (which sounds pretty grave to me) is marked as “won’t fix” at MySQL is that with MySQL 8, query cache as a whole has been removed. It’s just natural they don’t want to spend more hours on a feature that failed its promise.