Swap is useless
False, even if a system has a lot of memory, swap allows for better memory utilisation by swapping out allocated but rarely used anonymous memory pages.
Swap is going to slow down your system by its mere presence
False, as long as the system has enough memory, there would be very little or no swap-related I/O, so there is no slowdown.
It is really bad if you have some memory swapped out 1
False, the kernel swapped out some unused pages, and the memory can be allocated for something more useful, like file cache.
swap is going to wear out your SSD 2
False, as long as there is no swap-related I/O, there is no wearing out of the SSD. And modern SSDs have enough resources to handle swap-related I/O anyway.
Swap is an emergency solution for out-of-memory conditions
False, once your working set exceeds actual physical memory, swap makes things worse, causing swap thrashing and a slower OOM trigger.
Swap allows running workloads that exceed the system’s physical memory
False, once your active working set exceeds actual physical memory, you are for some swap thrashing.
Swap size should be double the amount of the physical memory.3
False. Unless the system has megabytes of memory instead of gigabytes. If you allocate more than a few GB of swap size, you are going for a long swap thrashing session when you run out of memory and before OOM gets triggered.
Swap use begins based on the vm.swappiness threshold, e.g. when 40% of RAM remains for vm.swappiness=40
. 4
False. Before the introduction of the split-LRU design in kernel version 2.6.28
in 2008, there used to be a different algorithm
that used the percentage of allocated memory, but it was more complicated and
with the vm.swappiness=40
, it wouldn’t start swapping even if all memory was
allocated from processes and with the default vm.swappiness=60
it would start
swapping at 80% memory allocation. This algorithm is no longer in use.
Swap aggressiveness is configured using vm.swappiness and it is linear between and 100 5
False. vm.swappiness
was first described in the kernel documentation in
2009
with the following text:
This control is used to define how aggressive the kernel will swap memory pages. Higher values will increase aggressiveness, lower values decrease the amount of swap. A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.
It doesn’t say that the relation between vm.swappiness
and aggressiveness is
linear but people made assumptions.
This description is still present in some texts on kernel.org (this file isn’t present in the kernel tree anymore, and it wasn’t updated since 2019).
The documentation was updated in 2020 to a more appropriate description and the values up to 200 were allowed.
With vm.swappiness=0 kernel won’t swap
False, if the kernel hits the high water mark in any zone, then it is going to swap anyway.
With vm.swappiness=100 kernel is going to swap out everything from memory right away
False, if there is no memory pressure, the kernel isn’t going to swap anything.
vm.swappiness=60 is too agressive 6
False, the vm.swappiness
value 60
means that anon_prio
is assigned the
value of 60
and file_prio
the value of 200 - 60 = 140
. The resulting ratio
140/60
means that the kernel would evict 2.33
times more pages from the
page cache than swap out anonymous pages.
The default value of 60
was chosen with the assumption that the file I/O
operations, which tend to be sequential, are more effective than random swap
I/O, but this applies to rotating media like HDDs only. For SSDs,
vm.swappiness=100
is more appropriate.
As the documentation states:
For in-memory swap, like zram or zswap, as well as hybrid setups that have swap on faster devices than the filesystem, values beyond 100 can be considered
vm.swappiness=10 is just the right setting and makes your system fast
This value gives a ratio of 19 times preference for discarding page cache over swapping out. Your system is going to have a lot of unused anon pages sitting around while churning through file cache pages, making it less effective.
Swap won’t happen if there is some free RAM.
False. If a process runs within a cgroup with defined memory limits, it can be swapped out, even though the system still has a lot of free memory. Swap and OOM can also be triggered due to memory fragmentation when high-order allocation fails, even though there are a lot of free low-order pages.
Swap happens just randomly, when kernel has nothing to do
False. Swap happens when memory allocation brings the number of free memory
pages below the low watermark specified for a memory zone. See /proc/zoneinfo
and this question on
Unix.StackExchange.
Swapping over NFS is a good idea. 7
False. It is very slow, and any packet lost/delayed on the network would cause the system to hang.
OOM won’t trigger if there is swap enabled. 8
False. OOM is triggered regardless of swap being enabled or disabled, full or empty.
OOM won’t trigger if there is some free RAM.
False. Swap and OOM can be triggered due to memory fragmentation when high-order allocation fails, even though there are a lot of free low-order pages. 9
OOM kills a random process.
The current Linux kernel just kills a process with the largest RSS+swap usage
(with per-process OOM score adjustable through /proc
). In v5.1 (2019) it dropped the heuristic to prefer to sacrifice a child instead
of the parent, in v4.17 (2018) CAP_SYS_ADMIN processes lost their 3% bonus.
Before v2.6.36 (2010) it used to be much more complicated and involved factors
like forking, process runtime, nice
values but at least this is described in
the current man 5 proc
.
But enabling vm.oom_kill_allocating_task
sysctl can cause killing a random
process because the random process can be the last one trying to allocate
memory and failing.