I was looking at fulltext search options when I found out that there is a Japanese language-specific plugin to make indexing more meaningful. Japanese doesn’t have word-delimiting spaces, so fulltext usually has a very hard time with it. MeCab uses a dictionary approach to that, in contrast to n-gram which just splits up words into equal sized bits.
Let’s check my MySQL version first… Apparently I have 5.17, which supposedly comes shipped with MeCab. Except it doesn’t, if you use Ubuntu, because apparently dependency rules for the universe repo don’t let them include it. Which is a huge pain in the ass, since I now have to look for the libpluginmecab.so file myself, and finding it wasn’t exactly an easy task.
Sure I’m not very well versed in the workings of open source dev communities, so I’ve got no idea where I’m supposed to look. I figured that if they can’t include that plugin file in the repo, then they might make it available elsewhere. I found it eventually in the community package .deb for the server, so I tried naively just extracting it and putting it in my plugins folder (which is /usr/lib/mysql/plugin/
in my case).
And it worked! Oh wait, no it didn’t. It still died with an error, but at least it was finding and reading the plugin file. Of course the MySQL console error message didn’t tell me what the error was, so I just went and looked at tail /var/log/mysql/error.log
. And ta-da! “[ERROR] Mecab: createModel() failed: param.cpp(69) [ifs] no such file or directory: /etc/mecabrc
”
Except, I have the mecabrc file there all right. Even rebooting the system I couldn’t get it to see the file and now I’m kinda fed up. Because if this all wasn’t enough, MeCab is refusing to be used in UTF-8 too. I was naive enough to install it from the Ubuntu repos with apt, and that bit me. It installed with the default EUC-JP encoding instead of UTF-8 for whatever magical reasons. And even after re-encoding all the dictionary files in /usr/share/mecab/dic/ipadic/
with nkf
as suggested on Japanese blogs, MeCab refuses to handle UTF-8 encoded text files.
Honestly, I’m out of ideas and I just had to make a reminder for myself that if I ever actually set up a production server on Ubuntu, I’ll have to build MeCab and MySQL from source or give up Japanese fulltext search – unless someone tells me until then how to set this mess straight.