Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
System Analysis and Tuning Guide / Kernel tuning / Tuning I/O performance
Applies to openSUSE Leap 15.4

12 Tuning I/O performance

I/O scheduling controls how input/output operations will be submitted to storage. openSUSE Leap offers various I/O algorithms—called elevators—suiting different workloads. Elevators can help to reduce seek operations and can prioritize I/O requests.

Choosing the best suited I/O elevator not only depends on the workload, but on the hardware, too. Single ATA disk systems, SSDs, RAID arrays, or network storage systems, for example, each require different tuning strategies.

12.1 Switching I/O scheduling

openSUSE Leap picks a default I/O scheduler at boot-time, which can be changed on the fly per block device. This makes it possible to set different algorithms, for example, for the device hosting the system partition and the device hosting a database.

The default I/O scheduler is chosen for each device based on whether the device reports to be rotational disk or not. For rotational disks, the BFQ I/O scheduler is picked. Other devices default to MQ-DEADLINE or NONE.

To change the elevator for a specific device in the running system, run the following command:

> sudo echo SCHEDULER > /sys/block/DEVICE/queue/scheduler

Here, SCHEDULER is one of bfq, none, kyber, or mq-deadline. DEVICE is the block device (sda for example). Note that this change will not persist during reboot. For permanent I/O scheduler change for a particular device, copy /usr/lib/udev/rules.d/60-io-scheduler.rules to /etc/udev/rules.d/60-io-scheduler.rules, and edit the latter file to suit your needs.

12.2 Available I/O elevators with blk-mq I/O path

Below is a list of elevators available on openSUSE Leap for devices that use the blk-mq I/O path. If an elevator has tunable parameters, they can be set with the command:

echo VALUE > /sys/block/DEVICE/queue/iosched/TUNABLE

In the command above, VALUE is the desired value for the TUNABLE and DEVICE is the block device.

To find out what elevators are available for a device (sda for example), run the following command (the currently selected scheduler is listed in brackets):

> cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

12.2.1 MQ-DEADLINE

MQ-DEADLINE is a latency-oriented I/O scheduler. MQ-DEADLINE has the following tunable parameters:

Table 12.1: MQ-DEADLINE tunable parameters

File

Description

writes_starved

Controls how many times reads are preferred over writes. A value of 3 means that three read operations can be done before writes and reads are dispatched on the same selection criteria.

Default is 3.

read_expire

Sets the deadline (current time plus the read_expire value) for read operations in milliseconds.

Default is 500.

write_expire

Sets the deadline (current time plus the write_expire value) for write operations in milliseconds.

Default is 5000.

front_merges

Enables (1) or disables (0) attempts to front merge requests.

Default is 1.

fifo_batch

Sets the maximum number of requests per batch (deadline expiration is only checked for batches). This parameter allows to balance between latency and throughput. When set to 1 (that is, one request per batch), it results in "first come, first served" behavior and usually lowest latency. Higher values usually increase throughput.

Default is 16.

12.2.2 NONE

When NONE is selected as I/O elevator option for blk-mq, no I/O scheduler is used, and I/O requests are passed down to the device without further I/O scheduling interaction.

NONE is the default for NVM Express devices. With no overhead compared to other I/O elevator options, it is considered the fastest way of passing down I/O requests on multiple queues to such devices.

There are no tunable parameters for NONE.

12.2.3 BFQ (Budget Fair Queueing)

BFQ is a fairness-oriented scheduler. It is described as "a proportional-share storage-I/O scheduling algorithm based on the slice-by-slice service scheme of CFQ. But BFQ assigns budgets, measured in number of sectors, to processes instead of time slices." (Source: linux-4.12/block/bfq-iosched.c)

BFQ allows to assign I/O priorities to tasks which are taken into account during scheduling decisions (see Section 8.3.3, “Prioritizing disk access with ionice).

BFQ scheduler has the following tunable parameters:

Table 12.2: BFQ tunable parameters

File

Description

slice_idle

Value in milliseconds specifies how long to idle, waiting for next request on an empty queue.

Default is 8.

slice_idle_us

Same as slice_idle but in microseconds.

Default is 8000.

low_latency

Enables (1) or disables (0) BFQ's low latency mode. This mode prioritizes certain applications (for example, if interactive) such that they observe lower latency.

Default is 1.

back_seek_max

Maximum value (in Kbytes) for backward seeking.

Default is 16384.

back_seek_penalty

Used to compute the cost of backward seeking.

Default is 2.

fifo_expire_async

Value (in milliseconds) is used to set the timeout of asynchronous requests.

Default is 250.

fifo_expire_sync

Value in milliseconds specifies the timeout of synchronous requests.

Default is 125.

timeout_sync

Maximum time in milliseconds that a task (queue) is serviced after it has been selected.

Default is 124.

max_budget

Limit for number of sectors that are served at maximum within timeout_sync. If set to 0 BFQ internally calculates a value based on timeout_sync and an estimated peak rate.

Default is 0 (set to auto-tuning).

strict_guarantees

Enables (1) or disables (0) BFQ specific queue handling required to give stricter bandwidth sharing guarantees under certain conditions.

Default is 0.

12.2.4 KYBER

KYBER is a latency-oriented I/O scheduler. It makes it possible to set target latencies for reads and synchronous writes and throttles I/O requests in order to try to meet these target latencies.

Table 12.3: KYBER tunable parameters

File

Description

read_lat_nsec

Sets the target latency for read operations in nanoseconds.

Default is 2000000.

write_lat_nsec

Sets the target latency for write operations in nanoseconds.

Default is 10000000.

12.3 I/O barrier tuning

Some file systems (for example, Ext3 or Ext4) send write barriers to disk after fsync or during transaction commits. Write barriers enforce proper ordering of writes, making volatile disk write caches safe to use (at some performance penalty). If your disks are battery-backed in one way or another, disabling barriers can safely improve performance.

Important
Important: nobarrier is deprecated in XFS

Note that the nobarrier option has been completely deprecated for XFS, and it is not a valid mount option in SUSE Linux Enterprise 15 SP2 and upward. Any XFS mount command that explicitly specifies the flag will fail to mount the file system. To prevent this from happening, make sure that no scripts or fstab entries contain the nobarrier option.

Sending write barriers can be disabled using the nobarrier mount option.

Warning
Warning: Disabling barriers can lead to data loss

Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss.