Optimization
Below you will find tips on how to optimize your database queries, identify bottlenecks, and improve your database structure.
For MariaDb, the manufacturer provides instructions for optimizing server operation, data structure, and queries. The same applies to PostgreSQL, where performance tips are offered. Since server operation for both server systems is handled by the RRZ, users do not have access to the server configuration, but can follow the instructions on data structure in their database and when creating queries. Since a large number of completely different databases and use cases are hosted on the RRZ's database servers, the servers cannot be configured for a specific case. Please also note the information on quotas, access restrictions, and function restrictions.
The RRZ can provide you with information that can help you identify problems in the operation of the database servers. Please contact the RRZ service line and provide the name of the database and the corresponding ID. Specifically, these are:
- Slow log: Upon request, the RRZ can provide information about queries that took longer than 10 seconds to respond.
- Queries without the use of indexes: Upon request, the RRZ can collect and provide information about queries that did not use indexes (only for MariaDb).
- SQL errors: Upon request, the RRZ can collect and provide information about queries that encountered an error.
- The actual database size (see below).
Here is an example that shows you how the data structure of a table can be improved based on an entry found in the slow log file of the MariaDb server. The commands explain select (shows which data types are involved in a query) and procedure analyse() (shows whether a data type can be better represented in a different way) are used for this purpose.
The slow log entry:
# Time: 161121 15:03:15
# User@Host: *******[*******] @ zd-dbproxy-2.rrz.uni-hamburg.de [134.100.55.69]
# Thread_id: 631826 Schema: mydatabase QC_hit: No
# Query_time: 5.464161 Lock_time: 0.000096 Rows_sent: 1 Rows_examined: 343320
use mydatabase;
SET timestamp=1479736995;
SELECT COUNT(DISTINCT `resourcemiscId`) AS `count` FROM `mytable1_resource_misc` LEFT JOIN `mytable1_resource_creator` AS `mytable1_c` ON `mytable1_c`.`resourcecreatorResourceId` = `resourcemiscId` LEFT JOIN `mytable1_creator` ON `creatorId` = `mytable1_c`.`resourcecreatorCreatorId` WHERE (`resourcecreatorCreatorId` = '5304' OR `creatorSameAs` = '5304');
use explain select:
explain SELECT COUNT(DISTINCT `resourcemiscId`) AS `count` FROM `mytable1_resource_misc` LEFT JOIN `mytable1_resource_creator` AS `mytable1_c` ON `mytable1_c`.`resourcecreatorResourceId` = `resourcemiscId` LEFT JOIN `mytable1_creator` ON `creatorId` = `mytable1_c`.`resourcecreatorCreatorId` WHERE (`resourcecreatorCreatorId` = '5304' OR `creatorSameAs` = '5304');
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | mytable1_resource_misc | index | NULL | resourcemiscCollection | 5 | NULL | 111510 | Using index |
| 1 | SIMPLE | mytable1_c | ref | resourcecreatorResourceId | resourcecreatorResourceId | 4 |mydatabase.mytable1_resource_misc.resourcemiscId | 1 | |
| 1 | SIMPLE | mytable1_creator | eq_ref | PRIMARY | PRIMARY | 4 | mydatabase.mytable1_c.resourcecreatorCreatorId | 1 | Using where |
3 rows in set (0.00 sec)
So, in the query, the resourcemiscCollection field from the mytable1_resource_misc table is used as an index. That's good, but can the index be optimized? This is what procedure analyse() is good for:
select resourcemiscCollection from mytable1_resource_misc procedure analyse();
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std | Optimal_fieldtype |
| mydatabase.mytable1_resource_misc.resourcemiscCollection | 176 | 1806 | 3 | 4 | 0 | 12247 | 304.6646 | 261.2061 | SMALLINT(4) UNSIGNED |
1 row in set (0.05 sec)
This means that the best data type for the index would be SMALLINT(4) UNSIGNED. If resourcemiscCollection is defined differently, e.g. as INT(11), the query would not be processed at the maximum possible speed.
Database size
With MariaDb (MySQL), fragmentation of the storage space for the relevant tables can occur, especially when using the (default) InnoDB storage engine. As a result, a database then occupies more storage space in the file system than is displayed by MariaDb. This can cause the storage space quota for databases to become active and block the database for write access, even though the size of the database is actually still below the quota.
With the command
SELECT table_schema as “Database Name”,
ROUND( SUM( (data_length + index_length)/1024/1024), 2)
as “database size (MB)” FROM information_schema.tables
where table_schema = ‘mydatabase’;
you can determine the database size of the mydatabase database (in MB) – but only the size of the actual data stock, not the fragmentation overhead. Only the RRZ database administrators can determine this. However, we can tell you the actual storage usage of your database upon request.
We therefore recommend that you do not immediately request an increase when a quota is exceeded, but first eliminate any fragmentation overhead from InnoDB tables. This can be done with the OPTIMIZE TABLE command.