How to change Intel Optane P4800X sector size
The Intel Optane P4800X is a beast when it comes to random IO at low queue depth, which makes it a good drive for daily use, if you don’t mind the price.
The LBA Format for this drive is sub-optimal out of the factory. If you run:
nvme id-ns -H /dev/nvme1n1
It will show you:
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use)
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 16 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good
LBA Format 3 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 4 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 5 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 6 : Metadata Size: 128 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 0 is in use. The sector size is 512 bytes. We want to use LBA Format 3, 4, 5 or 6, as indicated by the Relative Performance column. Fortunately, the user can change the LBA Format because it is an enterprise drive.
The nvme-format
tool can do the job! All you need is nvme format -l 3 /dev/nvme1n1
right? Not quite.
While the command above did run, it timed out after a few minutes. The following is some log from the kernel.
nvme nvme1: I/O 26 QID 0 timeout, reset controller
INFO: task nvme:76177 blocked for more than 122 seconds.
Tainted: P OE 5.12.9-1-MANJARO #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Intel has publish a Technical Advisory - NVMe Admin Format Command Failure/Error on Intel Optane SSD DC P4800X Series. In this document, it states:
Intel Optane SSD DC P4800X Series format command times are longer than typical kernel timeout settings.
So the solution is to prolong the timeout! Let’s make it timeout in 1 hour (3600000ms).
echo 0 > /proc/sys/kernel/hung_task_timeout_secs
nvme format --lbaf=3 --timeout=3600000 /dev/nvme1n1
Enjoy.