mroonga - An open-source storage engine for fast fulltext search with MySQL.

3.3.1. Storage mode

Here we explain how to use storage mode of mroonga

3.3.1.2. How to get search score

Note

In version 1.0.0 or before, mroonga used a special column named _score to get search score. From version 1.0.0, it follows MySQL's standard way to get search score.

We often want to display more relevant results first in full text search. We use search score in such case.

We can get search score by MySQL's standard way [1], i.e. we use MATCH...AGAINST in one of columns in SELECT or ORDER BY.

Let's try.

mysql> INSERT INTO diaries (content) VALUES ("今日は晴れました。明日も晴れるでしょう。");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO diaries (content) VALUES ("今日は晴れましたが、明日は雨でしょう。");
Query OK, 1 row affected (0.00 sec)

mysql> SELECT *, MATCH (content) AGAINST ("晴れ") FROM diaries WHERE MATCH (content) AGAINST ("晴れ") ORDER BY MATCH (content) AGAINST ("晴れ") DESC;
+----+--------------------------------------------------------------+------------------------------------+
| id | content                                                      | MATCH (content) AGAINST ("晴れ") |
+----+--------------------------------------------------------------+------------------------------------+
|  3 | 今日は晴れました。明日も晴れるでしょう。 |                                  2 |
|  1 | 明日の天気は晴れでしょう。                      |                                  1 |
|  4 | 今日は晴れましたが、明日は雨でしょう。    |                                  1 |
+----+--------------------------------------------------------------+------------------------------------+
3 rows in set (0.00 sec)

The result having the search word 晴れ more, i.e. id = 3 message having the higher search score, is displayed first. And you also get search score by using MATCH AGAINST in SELECT phrase.

You can use AS to change the attribute name.

mysql> SELECT *, MATCH (content) AGAINST ("晴れ") AS score FROM diaries WHERE MATCH (content) AGAINST ("晴れ") ORDER BY MATCH (content) AGAINST ("晴れ") DESC;
+----+--------------------------------------------------------------+-------+
| id | content                                                      | score |
+----+--------------------------------------------------------------+-------+
|  3 | 今日は晴れました。明日も晴れるでしょう。 |     2 |
|  1 | 明日の天気は晴れでしょう。                      |     1 |
|  4 | 今日は晴れましたが、明日は雨でしょう。    |     1 |
+----+--------------------------------------------------------------+-------+
3 rows in set (0.00 sec)

3.3.1.5. How to get the record ID

Groonga assigns a unique number to identify the record when a record is added in the table.

To make the development of applications easier, you can get this record ID by SQL in mroonga

To get the record ID, you need to create a column named _id when you create a table.

mysql> CREATE TABLE memos (
    ->   _id INT,
     >   content VARCHAR(255),
    ->   UNIQUE KEY (_id) USING HASH
    -> ) ENGINE = mroonga;
Query OK, 0 rows affected (0.04 sec)

Tye typo of _id column should be integer one (TINYINT, SMALLINT, MEDIUMINT, INT or BIGINT).

You can create an index for _id column, but it should be HASH type.

Let's add records in the table by INSERT. Since _id column is implemented as a virtual column and its value is assigned by groonga, you cannot specify the value when updating. So you need to exclude it from setting columns, or you need to use null as its value.

mysql> INSERT INTO memos VALUES (null, "今夜はさんま。");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO memos VALUES (null, "明日はmroongaをアップデート。");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO memos VALUES (null, "帰りにおだんご。");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO memos VALUES (null, "金曜日は肉の日。");
Query OK, 1 row affected (0.00 sec)

To get the record ID, you invoke SELECT with _id column.

mysql> SELECT * FROM memos;
+------+------------------------------------------+
| _id  | content                                  |
+------+------------------------------------------+
|    1 | 今夜はさんま。                    |
|    2 | 明日はmroongaをアップデート。 |
|    3 | 帰りにおだんご。                 |
|    4 | 金曜日は肉の日。                 |
+------+------------------------------------------+
4 rows in set (0.00 sec)

By using last_insert_grn_id function, you can also get the record ID that is assigned by the last INSERT.

mysql> INSERT INTO memos VALUES (null, "冷蔵庫に牛乳が残り1本。");
Query OK, 1 row affected (0.00 sec)

mysql> SELECT last_insert_grn_id();
+----------------------+
| last_insert_grn_id() |
+----------------------+
|                    5 |
+----------------------+
1 row in set (0.00 sec)

last_insert_grn_id function is included in mroonga as a User-Defined Function (UDF), but if you have not yet register it in MySQL by CREATE FUNCTION, you need to invoke the following SQL for defining a function.

mysql> CREATE FUNCTION last_insert_grn_id RETURNS INTEGER SONAME 'ha_mroonga.so';

As you can see in the example above, you can get the record ID by _id column or last_insert_grn_id function. It will be useful to use this value in the ensuing SQL queries like UPDATE.

mysql> UPDATE memos SET content = "冷蔵庫に牛乳はまだたくさんある。" WHERE _id = last_insert_grn_id();
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

3.3.1.6. Logging

Mroonga outputs the logs by default.

Log files are located in MySQL's data directory with the filename groonga.log.

Here is the example of the log.

2010-10-07 17:32:39.209379|n|b1858f80|mroonga 1.10 started.
2010-10-07 17:32:44.934048|d|46953940|hash get not found (key=test)
2010-10-07 17:32:44.936113|d|46953940|hash put (key=test)

The default log level is NOTICE, i.e. we have important information only and we don't have debug information etc.).

You can get the log level by mroonga_log_level system variable, that is a global variable. You can also modify it dynamically by using SET phrase.

mysql> SHOW VARIABLES LIKE 'mroonga_log_level';
+-------------------+--------+
| Variable_name     | Value  |
+-------------------+--------+
| mroonga_log_level | NOTICE |
+-------------------+--------+
1 row in set (0.00 sec)

mysql> SET GLOBAL mroonga_log_level=DUMP;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'mroonga_log_level';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| mroonga_log_level | DUMP  |
+-------------------+-------+
1 row in set (0.00 sec)

Available log levels are the followings.

  • NONE
  • EMERG
  • ALERT
  • CRIT
  • ERROR
  • WARNING
  • NOTICE
  • INFO
  • DEBUG
  • DUMP

You can reopen the log file by FLUSH LOGS. If you want to rotate the log file without stopping MySQL server, you can do in the following procedure.

  1. change the file name of groonga.log (by using OS's mv command etc.).
  2. invoke "FLUSH LOGS" in MySQL server (by mysql command or mysqladmin command).

3.3.1.7. Choosing appropriate columns

Groonga uses one file per column to store data, and mroonga accesses needed columns only when accessing a table to utilise this characteristic.

This optimisation is done automatically in mroonga internal, you don't need any specific configuration.

Imagine that we have a table with 20 columns like below.

CREATE TABLE t1 (
  c1 INT PRIMARY KEY AUTO_INCREMENT,
  c2 INT,
  c3 INT,
  ...
  c11 VARCHAR(20),
  c12 VARCHAR(20),
  ...
  c20 DATETIME
) ENGINE = mroonga DEFAULT CHARSET utf8;

When we run SELECT phrase like the following, mroonga reads data from columns that are referred by SELECT phrase and WHERE phrase only (and it does not access columns that not required internally).

SELECT c1, c2, c11 FROM t1 WHERE c2 = XX AND c12 = "XXX";

In this case above, only columns c1, c2, c11 and c12 are accessed, and we can process the SQL rapidly.

3.3.1.8. Optimisation for counting rows

In MySQL's storage engine interface, there is no difference between counting rows like COUNT(*) and normal data retrieving by SELECT. So access to data that is not included in SELECT result can happen even if you just want to count rows.

Tritonn (MySQL + Senna), that is mroonga's predecessor, introduced "2ind patch" to skip needless access to data and solved this performance issue.

Mroonga also has the optimisation for counting rows.

In the following SELECT, for example, needless read of columns are skipped and you can get the result of counting rows with the minimal cost.

SELECT COUNT(*) FROM t1 WHERE MATCH(c2) AGAINST("hoge");

You can check if this optimisation works or not by the status variable.

mysql> SHOW STATUS LIKE 'mroonga_count_skip';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| mroonga_count_skip | 1     |
+--------------------+-------+
1 row in set (0.00 sec)

Each time the optimisation for counting rows works, mroonga_count_skip status variable value is increased.

Note : This optimisation is implemented by using the index. It only works in the case where we records can be specified only by the index.