Takes off latency SSD array under ESXi where to dig?

0 like 0 dislike
The problem: if you give the load on the disk subsystem takes off increased latency and everything stops working.
5c4ff15f39c0c524814029.pngFigure - load testing 1C

What was done:
Replaced SSD to a different model
Caching, buffering, and other RAID chips disabled
A common driver AHCI disabled
Disks are occupied less than half. But marked all the space under VMFS6

OS: ESXi 6.7
CPU: i7-3770
Array: LSI Megaraid 9260 4-slot (512 RAM)
2 slots enterprise HDD in a mirror
2 SSD slot in the mirror (was intel, now some Clevo 1100)
Server at Hetzner

Cache is enabled for everything, but when you turn off the situation only gets worse.
May be indicative of a linear file copy 50GB between datastore. After 5 minutes of copying at speeds of 150MB speed drops sharply and stall sharply. While in General the whole system ceases to respond normally.
The same thing happens when, for example, a load test 1S if you run 100 users - suddenly everything hangs and takes off latencу on the SSD array.
5c4ff46dc648c586955495.png Copying between datastores from the HDD to SSD one file of 50 GB

Here is the previous Intel SSD off the cache array:
In the event of the message:
Device naa.600605b0057d5c40ff0284b36d816610 performance has deteriorated. I/O latency increased from average value of 4580 210110 microseconds to microseconds.

In this case, the array of HDD works exactly like it should (small and large files to display the operation of the cache of the RAID)5c4ff77731eb7360365769.png
On a nearby server, same RAID, ESXi version 6.5 2 SSD datacenter edition, that influence should not, but there do not above 10ms latency jumps.

upd. The esxtop screen when copying between datastores (HDD->SSD). Latency of 20-50ms ranges
How to be where to look?
by | 177 views

2 Answers

0 like 0 dislike
Well, it's quite a normal situation.
What do you want from consumer SSDS in RAID?
Do not be surprised if there is still an over provisioning not configured.

Disks are occupied less than half.
You have a RAID, this means that TRIM is not working.
You deleted the file - and the disk it and left.
In the end, the disk idle TRIM is always busy at 100%.
And you say half.
0 like 0 dislike
Try software RAID , it is much more fun

Related questions

0 like 0 dislike
2 answers
0 like 0 dislike
2 answers
0 like 0 dislike
2 answers
0 like 0 dislike
2 answers
110,608 questions
257,187 answers
40,796 users