<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Alex L. Demidov]]></title>
  <link href="https://alexeydemidov.com/atom.xml" rel="self"/>
  <link href="https://alexeydemidov.com/"/>
  <updated>2026-02-07T06:47:42+00:00</updated>
  <id>https://alexeydemidov.com/</id>
  <author>
    <name><![CDATA[Alex L. Demidov]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[How to Make a Keenetic Router a Tailscale Exit Node.]]></title>
    <link href="https://alexeydemidov.com/2026/01/14/tailscale-on-keenetic/"/>
    <updated>2026-01-14T07:35:30+00:00</updated>
    <id>https://alexeydemidov.com/2026/01/14/tailscale-on-keenetic</id>
    <content type="html"><![CDATA[<p>Tailscale software is available for Keenetic routers as an OpenWRT package (OPKG). OPKG support is optional and needs to be enabled (See <a href="https://help.keenetic.com/hc/en-us/articles/360021214160-Installing-the-Entware-repository-package-system-on-a-USB-drive">the documentation</a>). It is available for routers with USB ports that support USB flash drives (the supported routers are listed in <a href="https://help.keenetic.com/hc/en-us/articles/360021214160-Installing-the-Entware-repository-package-system-on-a-USB-drive">the documentation</a>).</p>

<p>External USB storage is preferred, as internal router memory is limited to a few hundred MB and Tailscale binaries are about 50 MB.</p>

<h2>Prepare USB drive</h2>

<p>The first step is to prepare a USB drive with an ext4 file system. It can be a partition or the entire USB can be formatted as an ext4 file system. It needs to be done on a separate Linux system. (See <a href="https://help.keenetic.com/hc/en-us/articles/115005875145-Using-the-ext4-file-system-on-USB-drives">the documentation</a> on how to format a USB drive on different systems)</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'># check which devices are the USB drives (match TRAN=usb)
</span><span class='line'>lsblk -o NAME,MODEL,TRAN,TYPE,SIZE,MOUNTPOINT
</span><span class='line'>
</span><span class='line'># format the entire USB as an ext4 file system and assign label `USB128GB`
</span><span class='line'># replace sdX with the device name from `lsblk` output
</span><span class='line'>mkfs.ext4 -L 'USB128GB' /dev/sdX</span></code></pre></td></tr></table></div></figure>


<p>The next step is to download the <a href="https://entware.net/">Entware</a> installer and save it to the USB drive. The installer should match the router architecture (<strong>mipsel</strong>, <strong>mips</strong> or <strong>aarch64</strong>). See <a href="https://help.keenetic.com/hc/en-us/articles/360021214160-Installing-the-Entware-repository-package-system-on-a-USB-drive">the documentation</a> section 3 to identify which router model requires a specific architecture. The example below uses <strong>mipsel</strong> architecture.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
</pre></td><td class='code'><pre><code class=''><span class='line'># mount the file system
</span><span class='line'>mkdir /mnt/usb # create a mount point
</span><span class='line'>mount -t ext4 /dev/sdX /mnt/usb # replace sdX with the device name
</span><span class='line'>
</span><span class='line'># create install directory
</span><span class='line'>mkdir /mnt/usb/install
</span><span class='line'>
</span><span class='line'># download the Entware installer for mipsel architecture
</span><span class='line'>curl -O --output-dir /mnt/usb/install \
</span><span class='line'>  https://bin.entware.net/mipselsf-k3.4/installer/EN_mipsel-installer.tar.gz
</span><span class='line'>
</span><span class='line'># unmount the USB drive
</span><span class='line'>umount /mnt/usb</span></code></pre></td></tr></table></div></figure>


<h2>Install Tailscale on the router</h2>

<p>After <code>umount /mnt/usb</code>, it is safe to remove the USB drive and plug it into the router USB port.
The drive should appear on the router <code>System Dashboard</code> section <code>USB Drives and printers</code> and in the <code>Applications</code> page under the <code>USB Devices</code> section. Make sure that it shows the correct label set before, in our example, <code>USB128GB</code>. See <a href="https://help.keenetic.com/hc/en-us/articles/360021214160-Installing-the-Entware-repository-package-system-on-a-USB-drive">the documentation</a> for example screenshots.</p>

<p>Go to the <code>OPKG</code> page and select appropriate device in the <code>Drive</code> dropdown. Click the <code>Save</code> button.</p>

<p>Go to the <code>Diagnostics</code> page and check <code>System Log</code> for the <code>"Entware" installed!</code> message and the default <code>ssh</code> login and password.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>Dec 7 11:50:02 ndm Opkg::Manager: /opt/etc/init.d/doinstall: Log on to start an SSH session using login - root, password - keenetic.
</span><span class='line'>Dec 7 11:50:02 ndm Opkg::Manager: /opt/etc/init.d/doinstall: [5/5] "Entware" installed!</span></code></pre></td></tr></table></div></figure>


<p>Add a section to <code>.ssh/config</code></p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>cat &gt;&gt; ~/.ssh/config &lt;&lt;'EOF'
</span><span class='line'>
</span><span class='line'>Host keenetic
</span><span class='line'>  Hostname 192.168.1.1
</span><span class='line'>  User root
</span><span class='line'>  Port 222
</span><span class='line'>EOF</span></code></pre></td></tr></table></div></figure>


<p>Optionally, put your <code>ssh</code> public keys into <code>/opt/etc/dropbear/authorized_keys</code> for passwordless login.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>scp -O ~/.ssh/id_ed25519.pub keenetic:/opt/etc/dropbear/authorized_keys</span></code></pre></td></tr></table></div></figure>


<p>Log in with <code>ssh</code> using the default password (see the <code>System log</code> above):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>ssh keenetic</span></code></pre></td></tr></table></div></figure>


<p>Change the default password:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>passwd</span></code></pre></td></tr></table></div></figure>


<p>Update the OPKG packages:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>opkg update</span></code></pre></td></tr></table></div></figure>


<p>Install Tailscale packages:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>opkg install iptables tailscale</span></code></pre></td></tr></table></div></figure>


<h2>Bring the Tailscale node up</h2>

<ul>
<li>The tailscaled won&rsquo;t be able to modify the system <code>resolv.conf</code>, so use <code>--accept-dns=false</code>.</li>
<li>The system <code>iptables</code> rules are periodically reset by the router software, so we can&rsquo;t rely on tailscaled to maintain them and need to turn them off:  <code>--netfilter-mode=off</code>.</li>
<li>We want to be able to <code>ssh</code> into the router through the Tailnet, so use <code>--ssh</code> (it works only through the Tailscale web admin panel).</li>
<li>We want to access the subnet connected to the router through the Tailnet, so use <code>--advertise-routes 192.168.1.0/24</code> (replace with your own subnet address).</li>
<li>We want to use the route as an exit node, so use <code>--advertise-exit-node</code>.</li>
</ul>


<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>tailscale up --accept-dns=false --netfilter-mode=off --ssh \
</span><span class='line'>  --advertise-routes 192.168.1.0/24 --advertise-exit-node</span></code></pre></td></tr></table></div></figure>


<h2>Configure netfilter rules</h2>

<p>Without netfilter rules, the exit node and advertised routes won&rsquo;t work.  We need to set up custom hooks for Keentic to configure netfilter rules. We need to create two files in the hook subdirectory  <code>/opt/etc/ndm/netfilter.d</code> (See <a href="https://support.keenetic.com/hero/kn-1011/en/42407-opkg-component-description.html#UUID-1f201cee-728b-cc09-03a3-68a589fe4f08_bridgehead-idm23497206669649">the documentation</a>).</p>

<p><code>/opt/etc/ndm/netfilter.d/tailscale-filter.sh</code> to configure the default <code>filter</code> table:</p>

<p>Set the variable <code>ROUTER_TAILSCALE_IP</code> to the IP returned by the <code>tailscale ip -4</code> command on the router or look up in <code>tailscale status</code> output on another Tailscale host.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>#!/bin/sh
</span><span class='line'>
</span><span class='line'>[ "$type" = "ip6tables" ] && exit 0   # check the protocol type in backward-compatible way
</span><span class='line'>[ "$table" != "filter" ] && exit 0   # check the table name
</span><span class='line'>
</span><span class='line'>ROUTER_TAILSCALE_IP=100.X.X.X/32 # Tailscale IP address assigned to this router
</span><span class='line'>
</span><span class='line'># Create chain only once; skip if it already exists
</span><span class='line'>if iptables -w -N ts-forward &gt;/dev/null 2&gt;&1; then
</span><span class='line'>  iptables -w -A ts-forward -i tailscale0 -j MARK --set-xmark 0x40000/0xff0000
</span><span class='line'>  iptables -w -A ts-forward -m mark --mark 0x40000/0xff0000 -j ACCEPT
</span><span class='line'>  iptables -w -A ts-forward -s 100.64.0.0/10 -o tailscale0 -j DROP
</span><span class='line'>  iptables -w -A ts-forward -o tailscale0 -m conntrack ! --ctstate RELATED,ESTABLISHED -j DROP
</span><span class='line'>  iptables -w -A ts-forward -o tailscale0 -j ACCEPT
</span><span class='line'>  iptables -w -I FORWARD 1 -j ts-forward
</span><span class='line'>fi
</span><span class='line'>
</span><span class='line'>if iptables -w -N ts-input &gt;/dev/null 2&gt;&1; then
</span><span class='line'>  iptables -w -A ts-input -s $ROUTER_TAILSCALE_IP -i lo -j ACCEPT
</span><span class='line'>  iptables -w -A ts-input -s 100.115.92.0/23 ! -i tailscale0 -j RETURN
</span><span class='line'>  iptables -w -A ts-input -s 100.64.0.0/10 ! -i tailscale0 -j DROP
</span><span class='line'>  iptables -w -A ts-input -i tailscale0 -j ACCEPT
</span><span class='line'>  iptables -w -A ts-input -p udp -m udp --dport 41641 -j ACCEPT
</span><span class='line'>  iptables -w -I INPUT 1 -j ts-input
</span><span class='line'>fi</span></code></pre></td></tr></table></div></figure>


<p><code>/opt/etc/ndm/netfilter.d/tailscale-nat.sh</code> to configure <code>nat</code> table:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>#!/bin/sh
</span><span class='line'>
</span><span class='line'>[ "$type" = "ip6tables" ] && exit 0   # check the protocol type in backward-compatible way
</span><span class='line'>[ "$table" != "nat" ] && exit 0   # check the table name
</span><span class='line'>
</span><span class='line'>if iptables -w -t nat -N ts-postrouting &gt;/dev/null 2&gt;&1; then
</span><span class='line'>  iptables -w -t nat -A ts-postrouting -m mark --mark 0x40000/0xff0000 -j MASQUERADE
</span><span class='line'>  iptables -w -t nat -I POSTROUTING 1 -j ts-postrouting
</span><span class='line'>fi</span></code></pre></td></tr></table></div></figure>


<p>The router software resets <code>iptables</code> rules frequently, and it can do this even while these hooks are running, so it is normal to see <code>iptables</code> errors in the router logs. Verify that the rules are applied properly with <code>ssh keenetic iptables-save |grep ts-</code>.</p>

<p>The rules are created by running <code>tailscale</code> with <code>--netfilter-mode=on</code> and saving the rules with <code>iptables-save</code>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Linux Page Reclaim and OOM in a Cgroup]]></title>
    <link href="https://alexeydemidov.com/2025/09/30/linux-oom-in-cgroup/"/>
    <updated>2025-09-30T11:54:30+00:00</updated>
    <id>https://alexeydemidov.com/2025/09/30/linux-oom-in-cgroup</id>
    <content type="html"><![CDATA[<p>Page reclaim is triggered when the kernel tries to allocate a page, but the charge would exceed <code>memory.current</code> > <code>memory.high</code> or <code>memory.current</code> > <code>memory.max</code>.</p>

<p>If the kernel is unable to reclaim enough pages when <code>memory.current</code> > <code>memory.max</code>, then the OOM killer is invoked, and by default, the largest process in the cgroup is killed. The kernel does <a href="https://github.com/torvalds/linux/blob/5aca7966d2a7255ba92fd5e63268dd767b223aa5/mm/internal.h#L533"><code>16</code></a> attempts to reclaim the pages before invoking the OOM killer.</p>

<p>If the kernel is unable to reclaim enough pages when <code>memory.current</code> > <code>memory.high</code>, then the OOM killer is never invoked, but the allocating process (not the entire cgroup) is throttled proportionally to the number of pages above <code>memory.high</code> limit.</p>

<p>With the classic split-LRU, the reclaim process checks pages at the tails of the inactive file LRU and, if swap is enabled, of the inactive anon LRU.</p>

<ul>
<li>If a page doesn&rsquo;t have its &ldquo;accessed&rdquo; bit set by the CPU, then it is considered inactive and can be reclaimed.</li>
<li>If a page in the inactive file LRU isn&rsquo;t dirty and doesn&rsquo;t need to be written back to disk, then it is discarded and reclaimed. If it is dirty, then it is scheduled for writeback but can&rsquo;t be reclaimed yet in this reclaim pass.</li>
<li>If a page in the inactive anon LRU already has a copy in swap (it was swapped in but wasn&rsquo;t modified by the process), then it is discarded; otherwise, it is scheduled for swap out.</li>
</ul>


<p>The kernel scans the inactive file and the inactive anon LRUs proportionally to <code>vm.swappiness</code> settings. If the kernel needs to reclaim 200 pages and <code>vm.swappiness</code> has the default <code>60</code> value, the kernel tries to reclaim 60 pages from the inactive anon LRU and 140 pages from the inactive file LRU. If the OOM is about to be invoked, then this proportion is ignored and the kernel reclaims whatever it can get.</p>

<p>The swap usage for a cgroup is controlled with <code>memory.swap.max</code> and <code>memory.swap.high</code> limits. <code>memory.swap.max</code> is the hard limit for swap usage and exceeding <code>memory.swap.high</code> causes throttling. Default values are <code>max</code>, so the limits are disabled.</p>

<p>When inactive LRUs are reduced by the reclaim process below <a href="https://github.com/torvalds/linux/blob/5aca7966d2a7255ba92fd5e63268dd767b223aa5/mm/vmscan.c#L2291">a specific proportion between active and inactive LRU size</a>, then the kernel shrinks an active LRU by moving pages from its tail almost unconditionally to the corresponding inactive LRU to restore the minimal proportion.</p>

<p>Originally published as an answer on <a href="https://serverfault.com/a/1192826/23022">serverfault.com</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[MariaDB Gets Unexpected OOM]]></title>
    <link href="https://alexeydemidov.com/2025/08/30/mariadb-memory-oom/"/>
    <updated>2025-08-30T07:40:30+00:00</updated>
    <id>https://alexeydemidov.com/2025/08/30/mariadb-memory-oom</id>
    <content type="html"><![CDATA[<p>MariaDB has recently surprised me by getting itself OOM-killed, even though the VM had a couple of GB of spare memory (as I thought). At first, I suspected it was some kind of memory spike, but the VM memory graph showed that it had been sitting just a few MB from the memory limit for some time before the OOM. That was unexpected, too.  Started digging around and found that <code>pmm-agent</code> ate 1GB of RAM, but it is still not enough. Finally, checked the OOM task dump in the logs, and, indeed, MariaDB was using about 25% more memory than my calculations estimated. The 11 GB InnoDB with the default other settings gave me an expected 12 GB, and even the <code>memory_used</code> system variable reported the same 12 GB, but the actual process RSS was 15 GB.</p>

<p>Digging through the web, mailing lists and documentation, I discovered that it is a <a href="https://jira.mariadb.org/browse/MDEV-30889">known</a> and <a href="https://mariadb.com/docs/general-resources/community/community/bug-tracking/profiling-memory-usage#system-malloc-is-not-good-if-there-are-a-lot-of-allocations-of-different-size">documented</a> problem with the default system malloc. The solution is to replace it with <code>jemalloc</code> or <code>tcmalloc</code>, which is also <a href="https://mariadb.com/docs/server/server-management/install-and-upgrade-mariadb/installing-mariadb/compiling-mariadb-from-source/compiling-mariadb-with-extra-modulesoptions/using-mariadb-with-tcmalloc-or-jemalloc">documented</a>. In my case, memory consumption dropped by 40%.</p>

<p>Curiously, it survived without OOMs for months after the last InnoDB size change. As I discovered, MariaDB can monitor for memory pressure events through <code>/proc/pressure/memory</code> and, on detecting memory pressure, it drops all non-dirty pages from the InnoDB buffer pool and marks them with <code>MADV_FREE</code>. That hovering just a few MB from the memory limit was the result of this behaviour, as the kernel reclaimed only the small amount of pages. After the OOM, I added the <code>MemoryHigh</code> limit to the MariaDB systemd unit to give MariaDB a bit of early memory pressure, and was surprised that it wasn&rsquo;t able to fill its InnoDB buffer. The logs indicated that the InnoDB buffer was reset multiple times per hour. That behaviour to drop almost the entire InnoDB buffer is a bit of an overreaction, and it was <a href="https://jira.mariadb.org/browse/MDEV-34863">disabled</a> by default in MariaDB 10.11.12 (not yet in the Bookworm).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Disappointments With AWS Database Migration Service and Macie]]></title>
    <link href="https://alexeydemidov.com/2025/07/05/disappointments-with-aws-dms-and-macie/"/>
    <updated>2025-07-05T06:51:40+00:00</updated>
    <id>https://alexeydemidov.com/2025/07/05/disappointments-with-aws-dms-and-macie</id>
    <content type="html"><![CDATA[<p>AWS Database Migration Service is probably my worst experience with AWS services so far. Wasted half a day just trying to start a replication instance. It appeared to have been stuck in the <code>Starting...</code> state. There is no progress indicator, no logs. I was blindly changing security groups, adding VPC endpoints, subnets, tweaking IAM roles and attached policies, searching CloudWatch and CloudTrail. I read the entire sections on replication instances and troubleshooting in the AWS DMS documentation through and through, hoping to find any hints on how to turn on the logs for the instance creation.</p>

<p>Then I found <a href="https://github.com/hashicorp/terraform-provider-aws/issues/1182">an old GitHub issue</a> for the Terraform AWS provider that suggested increasing default timeouts to 15 minutes. Given that the DMS replication instances use the same classes as EC2, I expected them to start within a few minutes and waited for 10 minutes before assuming that it hung. It actually took 17 minutes to start an instance in my case. The current <a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/dms_replication_instance#timeouts">default timeout</a> for the <code>aws_dms_replication_instance</code> terraform resource is 40 minutes.</p>

<p>When I got the instance running, the fun didn&rsquo;t end. First, it didn&rsquo;t like the database password due to some special characters. The database accepted that password. The AWS Secrets Manager also accepted it. DMS - nope, there are some characters I can&rsquo;t handle.  Ok, no problem, just generated a long alphanumeric password. Then I tried to run &lsquo;Pre-migration assessments&rsquo;. One of the tests failed to run at all, but no error logs again. Try to run it again - the assessment fails to run at all. Ok, not important, move on. Try to run the migration task itself. It failed. CloudWatch logs contain a lot of output, but don&rsquo;t tell why the task failed. Ok, when you click on the task status in the console, it shows that it ran out of memory. With a database that can fit entirely into the memory even on t3.micro. Ok, another 20 minutes of waiting to change the instance class. Try again, run out of memory again. Didn&rsquo;t log the memory failure into CloudWatch logs. Another 20 minutes of waiting.</p>

<p>When I finally got the tables dumped into S3 in Parquet format and moved to the final goal of this adventure: run AWS Macie to detect sensitive information. Another disappointment - it detected only two entries, and one was a false positive. In the end, it was more efficient to review the SQL dump manually - I spent half the time I wasted on DMS and Macie and found more than 30 columns in different tables with sensitive information.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[CrowdSec SQLite to MySQL Migration]]></title>
    <link href="https://alexeydemidov.com/2025/06/30/crowdsec-sqlite-to-mysql/"/>
    <updated>2025-06-30T08:35:30+00:00</updated>
    <id>https://alexeydemidov.com/2025/06/30/crowdsec-sqlite-to-mysql</id>
    <content type="html"><![CDATA[<p>How to convert CrowdSec Local API database from SQLite to MySQL or MariaDB.</p>

<p>The official CrowdSec <a href="https://docs.crowdsec.net/docs/next/local_api/database/">documentation</a> doesn&rsquo;t provide instructions on data migration from SQLite to MySQL or MariaDB, and expects the user to re-register all machines and bouncers, which is inconvenient.</p>

<p>Let&rsquo;s try a straightforward approach: dump the SQLite database into plain SQL and import it into MariaDB (we are running Ubuntu 24.04 Noble).</p>

<p>We need to create a database for CrowdSec first. We can follow <a href="https://docs.crowdsec.net/docs/next/local_api/database/#mysql-and-mariadb">the instructions</a> from the documentation:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>mysql&gt; CREATE DATABASE crowdsec;
</span><span class='line'>mysql&gt; CREATE USER 'crowdsec'@'%' IDENTIFIED BY '&lt;password&gt;';
</span><span class='line'>mysql&gt; GRANT ALL PRIVILEGES ON crowdsec.* TO 'crowdsec'@'%';
</span><span class='line'>mysql&gt; FLUSH PRIVILEGES;</span></code></pre></td></tr></table></div></figure>


<p>We will need <code>mysql</code> and <code>sqlite3</code> on the host where we have CrowdSec installed:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>sudo apt install mariadb-client sqlite3</span></code></pre></td></tr></table></div></figure>


<p>Check the location of the CrowdSec SQLite database file:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>cscli config show --key "Config.DbConfig.DbPath"</span></code></pre></td></tr></table></div></figure>


<p>Let&rsquo;s try the dump-pipe-import:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
</pre></td><td class='code'><pre><code class=''><span class='line'># sqlite3 `cscli config show --key "Config.DbConfig.DbPath"` '.dump' |\
</span><span class='line'>    mysql -h mysqlhost -u crowdsec -p crowdsec
</span><span class='line'>Enter password:
</span><span class='line'>--------------
</span><span class='line'>PRAGMA foreign_keys=OFF
</span><span class='line'>--------------
</span><span class='line'>
</span><span class='line'>ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'PRAGMA foreign_keys=OFF' at line 1</span></code></pre></td></tr></table></div></figure>


<p>Unfortunately, SQLite and MySQL/MariaDB have incompatible syntax.</p>

<p>Let&rsquo;s look for a solution. There is a Python tool, <a href="https://github.com/techouse/sqlite3-to-mysql/"><code>sqlite3-to-mysql</code></a>, which does exactly what we need:  transfer data from SQLite3 to MySQL.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>apt install python3-venv
</span><span class='line'>python3 -m venv .venv
</span><span class='line'>source .venv/bin/activate
</span><span class='line'>pip install sqlite3-to-mysql</span></code></pre></td></tr></table></div></figure>


<p>Now we can convert the data, but stop the CrowdSec first, so we don&rsquo;t get inconsistent data.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>systemctl stop crowdsec</span></code></pre></td></tr></table></div></figure>


<p>Now convert the data:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>sqlite3mysql -f `cscli config show --key "Config.DbConfig.DbPath"` -h mysqlhost -d crowdsec  -u crowdsec -p</span></code></pre></td></tr></table></div></figure>


<p>The data conversion runs without errors, and we get the data into MariaDB.</p>

<p>Now we can update the <code>/etc/crowdsec/config.yaml</code> to use MariaDB (See <a href="https://docs.crowdsec.net/docs/next/configuration/crowdsec_configuration/#db_config">the documentation</a>):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>db_config:
</span><span class='line'>  log_level: info
</span><span class='line'>  # type: sqlite
</span><span class='line'>  # db_path: /var/lib/crowdsec/data/crowdsec.db
</span><span class='line'>  type: mysql
</span><span class='line'>  db_name: crowdsec
</span><span class='line'>  user: crowdsec
</span><span class='line'>  password: &lt;password&gt;
</span><span class='line'>  host: mysqlhost
</span><span class='line'>  port: 3306</span></code></pre></td></tr></table></div></figure>


<p>But when we try to start CrowdSec, it fails with a fatal error:</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>level=fatal msg="unable to create database client: failed creating schema resources: sql/schema: modify \"machines\" table: Error 1833: Cannot
</span><span class='line'> change column 'id': used in a foreign key constraint 'alerts_FK_0_0' of table 'crowdsec.alerts'"</span></code></pre></td></tr></table></div></figure>


<p>There seems to be a difference in CrowdSec database schemas between <code>SQLite</code> and <code>MySQL/MariaDB</code>. Let&rsquo;s try to work around that.</p>

<p>First, recreate the database to clear all the imported data.</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>DROP DATABASE crowdsec;
</span><span class='line'>CREATE DATABASE crowdsec;</span></code></pre></td></tr></table></div></figure>


<p>Start CrowdSec with <code>systemctl start crowdsec</code>. It is going to fail, but it will create the correct database schema. (Backup your <code>local_api_credentials.yaml</code> - it overwrote it with new credentials, but I can&rsquo;t repeat this in a clean environment.)</p>

<p>Now we can try to convert the data again, but without creating the schema (option <code>-K</code>) this time (it also uses a hardcoded path to the SQLite database):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>sqlite3mysql -K -f /var/lib/crowdsec/data/crowdsec.db -h mysqlhost -d crowdsec  -u crowdsec -p</span></code></pre></td></tr></table></div></figure>


<p>Now we can start CrowdSec with <code>systemctl start crowdsec</code>, and it works without any issues.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Falsehoods People (and LLMs) Believe About Linux Swap and OOM.]]></title>
    <link href="https://alexeydemidov.com/2025/05/15/falsehoods-people-and-LLMs-believe-about-Linux-swap-and-OOM/"/>
    <updated>2025-05-15T09:05:10+00:00</updated>
    <id>https://alexeydemidov.com/2025/05/15/falsehoods-people-and-LLMs-believe-about-Linux-swap-and-OOM</id>
    <content type="html"><![CDATA[<h4>Swap is useless</h4>

<p>False, even if a system has a lot of memory, swap allows for better memory
utilisation by swapping out allocated but rarely used anonymous memory pages.</p>

<h4>Swap is going to slow down your system by its mere presence</h4>

<p>False, as long as the system has enough memory, there would be very little or no swap-related I/O, so there is no slowdown.</p>

<h4>It is really bad if you have some memory swapped out <sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup></h4>

<p>False, the kernel swapped out some unused pages, and the memory can be
allocated for something more useful, like the file cache.</p>

<h4>swap is going to wear out your SSD <sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup></h4>

<p>False, as long as there is no swap-related I/O, there is no wearing out of the
SSD. And modern SSDs have enough resources to handle swap-related I/O anyway.</p>

<h4>Swap is an emergency solution for out-of-memory conditions</h4>

<p>False, once your working set exceeds actual physical memory, swap makes things
worse, causing swap thrashing and a slower OOM trigger.</p>

<h4>Swap allows running workloads that exceed the system&rsquo;s physical memory</h4>

<p>False, once your active working set exceeds actual physical memory, the performance degradation is exponential. If we assume a 10<sup>3</sup> difference in latency between RAM and SSD with random 4K page access, then we can calculate that a 0.1% excess causes a 2x degradation, 1% - 10x degradation, 10% - 100x degradation.</p>

<h4>Swap causes gradual performance degradation</h4>

<p>Under stable workloads, until the active working set exceeds physical memory, swap improves performance by freeing unused memory. Once the working set exceeds physical memory, performance degradation is exponential and the system gets unresponsive very quickly.</p>

<p>On a desktop system, when an inactive application gets swapped out, switching back to it would feel slow.</p>

<h4>Kernel evicts program executable pages, making the system unresponsive.</h4>

<p>Active executable pages are explicitly excluded from reclaim. The kernel reclaims inactive file cache and inactive anonymous pages first, proportionally to <code>vm.swappiness</code> setting. Then it starts cannibalizing active file cache and anonymous pages, but executable pages are explicitly excluded from that. The system becomes unresponsive due to I/O starvation, because all file cache is dropped. Also, when system memory is below the low watermark, any task that needs a free memory page causes the kernel to go into direct reclaim and the task gets blocked until free pages are found.</p>

<h4>Swap size should be double the amount of physical memory.<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup></h4>

<p>False. Unless the system has megabytes of memory instead of gigabytes. If you
allocate more than a few GB of swap size, you are going for a long swap
thrashing session when you run out of memory and before OOM gets triggered.</p>

<p>The proper rule of thumb is to make the swap large enough to keep all inactive anonymous pages after the workload has stabilized, but not too large to cause swap thrashing and a delayed OOM kill if a fast memory leak happens. For an average system with 2 GB - 128 GB RAM, you start by adding a 256 MB swap file, and if it fills up entirely, increase it by another 256 MB. The step can be increased to 512 MB - 1 GB on larger systems with faster storage.</p>

<h4>Swap use begins based on the vm.swappiness threshold, e.g. when 40% of RAM remains for <code>vm.swappiness=40</code>. <sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup></h4>

<p>False. Before the introduction of <a href="https://linux-mm.org/PageReplacementDesign">the split-LRU design</a> in kernel version 2.6.28
in <a href="https://github.com/torvalds/linux/commit/4f98a2fee8acdb4ac84545df98cccecfd130f8db">2008</a>, there used to be <a href="https://lwn.net/Articles/83588/">a different algorithm</a> that used the percentage of allocated memory, but it was more complicated and with the <code>vm.swappiness=40</code>, it wouldn&rsquo;t start swapping even if all memory was allocated from processes and with the default <code>vm.swappiness=60</code> it would start swapping at 80% memory allocation. This algorithm is no longer in use.</p>

<h4>Swap aggressiveness is configured using vm.swappiness and it is linear between 0 and 100 <sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup></h4>

<p>False. <code>vm.swappiness</code> was first described in the kernel documentation in
<a href="https://github.com/torvalds/linux/commit/db0fb1848a645b0b1b033765f3a5244e7afd2e3c">2009</a> with the following text:</p>

<blockquote><p>This control is used to define how aggressive the kernel will swap memory pages.  Higher values will increase aggressiveness, lower values decrease the amount of swap.  A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.</p></blockquote>

<p>It doesn&rsquo;t say that the relation between <code>vm.swappiness</code> and aggressiveness is
linear, but people made assumptions.</p>

<p>This description is still present in <a href="https://www.kernel.org/doc/Documentation/sysctl/vm.txt">some texts on kernel.org</a> (this file
isn&rsquo;t present in the kernel tree anymore, and it wasn&rsquo;t updated since 2019).</p>

<p>The documentation was updated in <a href="https://github.com/torvalds/linux/commit/c843966c556d7370bb32e7319a6d164cb8c70ae2">2020</a> to <a href="https://docs.kernel.org/admin-guide/sysctl/vm.html#swappiness">a more appropriate description</a> and the values up to 200 were allowed.</p>

<h4>With vm.swappiness=0 kernel won&rsquo;t swap</h4>

<p>False, if the kernel hits the low water mark in any zone, then it is going to swap anyway.</p>

<h4>With vm.swappiness=100 kernel is going to swap out everything from memory right away</h4>

<p>False, if there is no memory pressure, the kernel isn&rsquo;t going to swap anything.</p>

<h4>vm.swappiness=60 is too agressive <sup id="fnref:6"><a href="#fn:6" rel="footnote">6</a></sup></h4>

<p>False, the <code>vm.swappiness</code> value <code>60</code> means that <code>anon_prio</code> is assigned the
value of <code>60</code> and <code>file_prio</code> the value of <code>200 - 60 = 140</code>. The resulting ratio
<code>140/60</code> means that the kernel would evict <code>2.33</code> times more pages from the
page cache than swap out anonymous pages.</p>

<p>The default value of <code>60</code> was chosen with the assumption that the file I/O
operations, which tend to be sequential, are more effective than random swap
I/O, but this applies to rotating media like HDDs only. For SSDs,
<code>vm.swappiness=100</code> is more appropriate.</p>

<p>As the documentation states:</p>

<blockquote><p>For in-memory swap, like zram or zswap, as well as hybrid setups that have
swap on faster devices than the filesystem, values beyond 100 can be
considered</p></blockquote>

<h4>vm.swappiness=10 is just the right setting and makes your system fast</h4>

<p>This value gives a ratio of 19 times preference for discarding page cache over
swapping out. Your system is going to have a lot of unused anon pages sitting
around while churning through file cache pages, making it less effective.</p>

<h4>Swap won&rsquo;t happen if there is some free RAM.</h4>

<p>False. If a process runs within a cgroup with defined memory limits, it can be
swapped out, even though the system still has a lot of free memory. Swap and
OOM can also be triggered due to memory fragmentation when high-order
allocations fail, even though there are a lot of free low-order pages.</p>

<h4>Swap happens just randomly, when the kernel has nothing to do</h4>

<p>False. Swap happens when memory allocation brings the number of free memory
pages below the low watermark specified for a memory zone. See <code>/proc/zoneinfo</code>
and <a href="https://unix.stackexchange.com/q/533739/1027">this question on Unix.StackExchange</a>.</p>

<h4>Swapping over NFS is a good idea. <sup id="fnref:7"><a href="#fn:7" rel="footnote">7</a></sup></h4>

<p>False. It is very slow, and any packet lost/delayed on the network would cause the system to hang.</p>

<h4>OOM won&rsquo;t trigger if there is swap enabled. <sup id="fnref:8"><a href="#fn:8" rel="footnote">8</a></sup></h4>

<p>False. OOM is triggered regardless of swap being enabled or disabled, full or empty.</p>

<h4>OOM won&rsquo;t trigger if there is some free RAM.</h4>

<p>False. Swap and OOM can be triggered due to memory fragmentation when
high-order allocation fails, even though there are a lot of free low-order
pages. <sup id="fnref:9"><a href="#fn:9" rel="footnote">9</a></sup></p>

<h4>OOM kills a random process.</h4>

<p>The current Linux kernel just kills a process with the largest RSS+swap usage
(with per-process OOM score adjustable through <code>/proc</code>). In v5.1 (2019), it dropped the heuristic to prefer to sacrifice a child instead of the parent. In v4.17 (2018), CAP_SYS_ADMIN processes lost their 3% bonus. Before v2.6.36 (2010), it used to be much more complicated and involved factors like forking, process runtime, <code>nice</code> values, but at least this is described in the current <a href="https://man7.org/linux/man-pages/man5/proc_pid_oom_score.5.html"><code>man 5 proc</code></a>. But enabling <code>vm.oom_kill_allocating_task</code> sysctl can cause killing a random process because the random process can be the last one trying to allocate
memory and failing.</p>

<h4>You can predict OOM. <sup id="fnref:10"><a href="#fn:10" rel="footnote">10</a></sup></h4>

<p>People sometimes assume that you can get metrics from <code>/proc/meminfo</code> and elsewhere, make some calculations, and predict OOM. Even Kubernetes does <a href="https://github.com/kubernetes/kubernetes/issues/43916#issuecomment-3173666866">some naive calculations</a> trying to determine the working set size.</p>

<p>But you can&rsquo;t predict OOM. The kernel itself can&rsquo;t predict OOM. There is no precise information readily available to make this prediction. The kernel doesn&rsquo;t know how much memory is reclaimable. It doesn&rsquo;t know the exact working set size and how much memory is active and inactive, despite having appropriate fields in <code>/proc/meminfo</code> (See <a href="https://alexeydemidov.com/2025/05/13/linux-inactive-memory">another blog post</a> for details). The hardware doesn&rsquo;t provide this information, and it is too expensive to track in the kernel from a performance point of view. The kernel has to go through the reclaim process and check each memory page if it has the Accessed flag set before reclaiming it. Only after failing the reclaim process multiple times does the kernel invoke OOM.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<a href="https://serverfault.com/questions/1179908/will-full-swap-slow-down-the-server-even-though-ram-is-free">a ServerFault post</a><a href="#fnref:1" rev="footnote">&#8617;</a></li>
<li id="fn:2">
<a href="https://serverfault.com/a/1180029/23022">a ServerFault post</a><a href="#fnref:2" rev="footnote">&#8617;</a></li>
<li id="fn:3">
<a href="https://issues.hibernatingrhinos.com/issue/RDoc-1724">These guys show you a warning if your system doesn&rsquo;t have the swap double the size of the memory</a><a href="#fnref:3" rev="footnote">&#8617;</a></li>
<li id="fn:4">
<a href="https://askubuntu.com/questions/969065/why-is-swap-being-used-when-vm-swappiness-is-0/969072">an Ask Ubuntu post</a><a href="#fnref:4" rev="footnote">&#8617;</a></li>
<li id="fn:5">
<a href="https://askubuntu.com/questions/103915/how-do-i-configure-swappiness">Another Ask Ubuntu post</a><a href="#fnref:5" rev="footnote">&#8617;</a></li>
<li id="fn:6">
<a href="https://serverfault.com/questions/1156815/why-swap-usage-is-high-for-influxdb-100-disk-i-o-and-swap-usage-but-only-50-m/#comment1513146_1156815">a ServerFault comment</a><a href="#fnref:6" rev="footnote">&#8617;</a></li>
<li id="fn:7">
<a href="https://unix.stackexchange.com/q/794604/1027">a Unix.StackExchage post</a><a href="#fnref:7" rev="footnote">&#8617;</a></li>
<li id="fn:8">
<a href="https://serverfault.com/questions/1179908/will-full-swap-slow-down-the-server-even-though-ram-is-free#comment1537757_1179917">a ServerFault comment</a><a href="#fnref:8" rev="footnote">&#8617;</a></li>
<li id="fn:9">
<a href="https://serverfault.com/questions/917938/linux-oom-killer-acting-despite-plenty-available-memory">a ServerFault post</a><a href="#fnref:9" rev="footnote">&#8617;</a></li>
<li id="fn:10">
<a href="https://serverfault.com/questions/1192733/actual-sequence-of-events-from-memory-pressure-to-oom-for-cgroups-v2/1192826#comment1544632_1192826">a ServerFault comment</a><a href="#fnref:10" rev="footnote">&#8617;</a></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Linux Inactive Memory]]></title>
    <link href="https://alexeydemidov.com/2025/05/13/linux-inactive-memory/"/>
    <updated>2025-05-13T13:17:30+00:00</updated>
    <id>https://alexeydemidov.com/2025/05/13/linux-inactive-memory</id>
    <content type="html"><![CDATA[<p>Since the introduction of <a href="https://linux-mm.org/PageReplacementDesign">the split-LRU design</a> in version 2.6.28, the
kernel maintains 4 (actually 5) LRU (least recently used) lists for memory
pages:</p>

<ul>
<li>Active anon pages</li>
<li>Active file pages</li>
<li>Inactive anon pages</li>
<li>Inactive file pages</li>
<li>(unevictabled pages)</li>
</ul>


<p>The total sizes of these pages are reported in <code>vmstat</code> and <code>/proc/meminfo</code></p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ vmstat -s | head -5
</span><span class='line'>    131669088 K total memory
</span><span class='line'>     69020576 K used memory
</span><span class='line'>     56007124 K active memory
</span><span class='line'>     29691936 K inactive memory
</span><span class='line'>      6140772 K free memory
</span><span class='line'>
</span><span class='line'>$ cat /proc/meminfo | grep -i active
</span><span class='line'>Active:         56007676 kB
</span><span class='line'>Inactive:       29691936 kB
</span><span class='line'>Active(anon):   34381776 kB
</span><span class='line'>Inactive(anon): 16134108 kB
</span><span class='line'>Active(file):   21625900 kB
</span><span class='line'>Inactive(file): 13557828 kB</span></code></pre></td></tr></table></div></figure>


<p>These lists are maintained for the purpose of the page reclaim. When the kernel needs to allocate some memory pages and there are not enough free pages, the kernel goes through the LRU lists, trying to find pages that can be reclaimed. The pages are organised in LRU lists so the kernel can reclaim these pages that weren&rsquo;t used recently and are supposedly least likely to be used soon.</p>

<p>The kernel goes through both inactive anon and inactive file LRU lists, starting from their tails while maintaining the proportion specified by <code>vm.swappiness</code> between the reclaimed anon and file pages. With the default value <code>vm.swappiness=60</code>, when the kernel needs <code>200</code> pages, it is going to reclaim <code>60</code> pages from inactive anon LRU and <code>140</code> pages from inactive file LRU, but if there are not enough pages available, it won&rsquo;t keep the balance and is going to reclaim whatever it finds.</p>

<p>When a file-backed page is reclaimed, it can be discarded right away if it isn&rsquo;t modified, or it needs to be written back into the file. An anon page always needs to be written into a swap when reclaimed, so it is more expensive to reclaim anon pages.</p>

<p>So far, so good - we established the purpose of the separation of the memory into active/inactive, so the kernel knows which pages to reclaim.</p>

<p>Now the main question: how exactly does the kernel decide which page is active and which is not?</p>

<p>The problem is that the kernel can&rsquo;t really track all memory access, and the hardware provides only one single bit of information - if the page was ever accessed. It doesn&rsquo;t know when it was accessed and how many times. The kernel works around this by periodically scanning the pages and clearing the &lsquo;Accessed&rsquo; bit and looking at this bit again on the next scan. But it doesn&rsquo;t scan all the pages all the time. Only when the kernel needs some memory pages, it scans the tail of the inactive LRU, and it stops the scan when it finds enough pages. So, most of the pages don&rsquo;t get activity information updated.</p>

<p>To illustrate this: a newly allocated page starts its life on the head of the inactive LRU with the &lsquo;Accessed&rsquo; bit reset to 0. As the other new pages are allocated, our page is getting pushed back to the tail of the inactive LRU. If the page gets accessed by the user space program, the CPU sets the &lsquo;Accessed&rsquo; bit to 1. The page can be accessed many times, but it is still considered inactive until the reclaim scan reaches this page position in the inactive LRU from the tail. Pages get promoted to the active LRU on the second (some are on the first) scan if they have the &lsquo;Accessed&rsquo; bit set. If a page doesn&rsquo;t get promoted to the active LRU, it gets reclaimed if it is at the tail end of the inactive LRU and the kernel needs some free pages. Once a page gets to the active LRU, it is considered active even if it isn&rsquo;t accessed there at all. Until the active list grows too large or the inactive list gets too small by the reclaim, there is no change to the active list and pages in the active list are not scanned at all and more active pages are never moved to the head within the active LRU.</p>

<p>The kernel maintains a specific ratio between the active and inactive LRU, depending on the memory size (since v4.10):</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
</pre></td><td class='code'><pre><code class=''><span class='line'> * total     target    max
</span><span class='line'> * memory    ratio     inactive
</span><span class='line'> * -------------------------------------
</span><span class='line'> *   10MB       1         5MB
</span><span class='line'> *  100MB       1        50MB
</span><span class='line'> *    1GB       3       250MB
</span><span class='line'> *   10GB      10       0.9GB
</span><span class='line'> *  100GB      31         3GB
</span><span class='line'> *    1TB     101        10GB
</span><span class='line'> *   10TB     320        32GB
</span></code></pre></td></tr></table></div></figure>


<p>When the kernel needs to grow the inactive LRU, it moves pages from the tail of the active LRU regardless of their activity status, except for file-backed executable pages, which get promoted back to the head of the active LRU.</p>

<p>Even though the design of active/inactive LRUs is to reference active and inactive pages, the kernel intentionally spends minimal effort to maintain these lists (for performance reasons), so the information about the pages in these lists is outdated most of the time for most of the pages.</p>

<p>What can we actually tell about the pages in the LRUs?</p>

<ul>
<li>The page at the head of the active list was accessed just recently. It was accessed once or twice, or a million times.</li>
<li>The page at the tail of the active list was accessed some time ago when it was added to the head of the list. It could have been accessed a million times or zero times while being in the active list.</li>
<li>The page at the head of the inactive list was just allocated or it was just pushed out of the active list where it spent unknown time and has been accessed zero or million times.</li>
<li>The page in the middle of the inactive list was allocated some time ago, and we don&rsquo;t know if it has been accessed yet. Until it is scanned, it can be accessed zero times or a million times.</li>
<li>The pages at the tail of the inactive list are only pages that have accurate information, but only when they are scanned.  We also still don&rsquo;t know if they were accessed more than once. We also don&rsquo;t know if they are going to be accessed again, but at least this is expected.</li>
</ul>


<p>So the more appropriate name for the &lsquo;Inactive&rsquo; LRU would be &lsquo;candidates for the reclaim&rsquo;, and for &lsquo;active&rsquo; - &lsquo;not considered for the reclaim yet&rsquo;. The numbers derived from the lengths of these LRUs and reported as Active/Inactive memory in <code>/proc/meminfo</code> have little relation to the actual working set size.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[AI Fails With DevOps Tasks]]></title>
    <link href="https://alexeydemidov.com/2025/03/01/ai-devops-fails/"/>
    <updated>2025-03-01T11:54:43+00:00</updated>
    <id>https://alexeydemidov.com/2025/03/01/ai-devops-fails</id>
    <content type="html"><![CDATA[<p>Did a quick evaluation and comparison between ChatGPT and Claude on my typical
task to decide if the Claude subscription discount offer is worth it. All
models failed miserably on a simple and straightforward task of creating a
single Terraform resource. The caveat is that this resource was implemented
relatively recently by AWS (in 2021) and in the Terraform AWS provider (10
months ago).</p>

<p>The initial question is simple: “How to add rDNS for AWS EIP with Terraform”.
All models answered that Terraform doesn’t support it natively and offered a
workaround with “local-exec” and call to “aws ec2 modify-address-attribute”.
Claude gave the correct parameter “—domain-name”, both 4o and o3-mini-high
hallucinated parameter names  “—reverse-dns-name” and “—reverse-dns”.</p>

<p>Given a correction that Terraform does support this natively, the models
started hallucinating by inventing or repurposing “aws_eip” resource
attributes. 4o suggested using the “domain” attribute, which is not related to
DNS. o3-mini-high invented “reverse_dns” block for “aws_eip” resource. Claude
suggested assigning the “reverse_dns” attribute, which doesn’t exist.
Interestingly, with web search enabled, 4o was able to find the correct
“aws_eip_domain_name” resource. Both Claude and o3 went back to suggesting
using “local-exec” and inventing random resource names like “aws_ec2_address”,
“aws_ec2_address_attribute”, “aws_eip_reverse_dns”.</p>

<p>I have noticed that o3 is much more stubborn, and if it goes a wrong way, it is
almost impossible to correct - a few weeks ago, it tried to correct me that
MySQL 9 doesn’t exist.  Not sure if the new Claude works the same way, but at
least it is much more cheerful. Sill gave that subscription option a pass as
there is no improvement, and these tasks are still too challenging for AIs.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[One Infrastructure Migration Equals Two Disaster Recoveries]]></title>
    <link href="https://alexeydemidov.com/2025/02/08/one-migration-equals-two-disaster-recoveries/"/>
    <updated>2025-02-08T10:49:14+00:00</updated>
    <id>https://alexeydemidov.com/2025/02/08/one-migration-equals-two-disaster-recoveries</id>
    <content type="html"><![CDATA[<p>To paraphrase a proverb, &ldquo;One infrastructure migration equals two disaster
recoveries&rdquo; <sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. Well, I had to move not two but 25 services. Some are
dockerised, some are running in legacy VMs managed with Chef, and one very old
VM that used to be a dedicated server more than a decade ago, with a jumble of
legacy half-abandoned websites and mail services.</p>

<p>While all the services are internal and not customer-facing, there are some
critical for the team and infrastructure. To avoid much service disruption and
downtime, I decided to move services one at a time so I could deploy and test
the service on the new infrastructure, shut down the old one, re-sync the data
and switch the DNS (almost forgot to lower DNS TTL before the move) with the
option to roll-back to the old service if anything goes wrong. Beforehand, I
created a table in Notion with all service resource requirements, SLAs,
inter-dependencies, priorities and nice color labels. Had this table open in
front of me for almost two months.</p>

<p>The initial infrastructure preparation was a clean greenfield - made it all
modern, fresh and shiny. While I had time, I started with the most complex and
important services, refreshed them, threw away a lot of organic-growth stuff,
and re-wrote some Chef cookbooks to Ansible. When the deadline started
approaching, it became a kind of a slog - move the service, search all the
configuration spread for IPs to update<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>, get the service working, repeat,
preferably early in the morning. The team also dropped on me two other
projects(including a major OS upgrade) with the same deadline to keep the
pressure. Had a lot of overtime and even broke my anti-burnout rule of not
working full-time on weekends once. Ironically, I managed to follow Parkinson’s
law and finished by retiring the old infrastructure exactly at the deadline,
the last day of the month.</p>

<p>But there are still a lot of small broken parts to fix and old dust and junk to
clear. Still sorting out and prioritising my ToDo list as it added more than a
hundred items during the migration. Two major things to re-work - remove as
much as possible from the Chef attributes and re-organize packet filter rules
management.</p>

<p>Also, had some interesting and surprising moments:</p>

<p>The first one was when the hosting provider moved an IP from one server to
another, and I started seeing incoming traffic for the IP on both the old and
the new servers simultaneously.
I was monitoring the traffic with tcpdump on both servers to catch the moment
when the IP switch was about to happen.  At first, I suspected their network
engineer somehow managed to mirror the traffic and even sent them a message, as
I was clearly getting ‘host unreachable’ from the old server public IP, but I
didn’t stop investigating. The IP wasn’t responding even locally, though I
double-checked that it was assigned to a VM and that a route was configured
properly. When I got a ‘host unreachable’ message from an internal IP from the
old server, it all became clear. There was an IPSec tunnel between the old and
the new servers, and the IP was still routed to the old server through the
tunnel.  The confusing part was that the packets coming in through the IPSec
tunnel on the old server were seen as coming through normal ethernet. The irony
is that I had already prepared IPSec reconfiguration, and it was the next step
in my checklist.</p>

<p>The second was a chicken-and-egg problem. To move the Chef server, I needed to
reconfigure load balancers, but the load balancers’ configuration was managed
with Chef. Ended up changing the configuration manually.</p>

<p>The third was a little adventure with a new secondary server at the old hosting
provider. They delayed its delivery for a month. When I got access to its
console, it wasn’t even re-imaged with a fresh Debian; it was a used node from
the provider’s cloud solution. Their netboot image was running Alpine version
from 2019, and their netboot installer didn’t have anything newer than Debian
Buster. Their netboot didn’t support UEFI boot with GPT, and I found it the
hard way. It also explained why they failed to re-image the server.
Re-partitioned the disks with MBR, installed Debian 10, and followed with two
immediate upgrades to Bullseye and Bookworm. Almost entire day wasted doing
someone else work. And then I had to wait 24h for them to re-route an IP to
this server. Oh, and the disks had 50K power-on hours on a supposedly new
server. My expectations were adjusted lower with every interaction with these
guys. But in the end, most services were moved from this provider and we got
twice the capacity for a lower price.</p>

<p>Notes:</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
I discovered that in the English proverb, two or three moves equal one house fire, but in the Russian version, the proportion is opposite - one move equals two house fires.<a href="#fnref:1" rev="footnote">&#8617;</a></li>
<li id="fn:2">
There is an anti-pattern with Chef in that it accumulates a lot of information in attributes, which are spread along different places - node/role/environments, and they are not versioned and hard to search globally. Luckily, I have a daily dump of all roles/nodes/environments/data bags as JSON files, so I have an option to grep through them.<a href="#fnref:2" rev="footnote">&#8617;</a></li>
</ol>
</div>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Why Is It Always Just a Single SQL Statement Causing a Major Performance Regression?]]></title>
    <link href="https://alexeydemidov.com/2024/10/19/it-is-always-a-single-sql-statement/"/>
    <updated>2024-10-19T07:11:13+00:00</updated>
    <id>https://alexeydemidov.com/2024/10/19/it-is-always-a-single-sql-statement</id>
    <content type="html"><![CDATA[<p>A few weeks ago, I had to investigate a batch job that was taking more than 3
hours to run every night.  The DB was the obvious bottleneck, as the job was
hitting it so hard that I noticed excessive load even before the business
started complaining. Before looking into the details, I assumed that the
application code was doing some loops internally, retrieving mostly the same
data again and again. I started preparing myself to dive into the code for a
week to untangle data flows, imagining the horrors of multi-page SQL
statements.</p>

<p>But “Premature optimisation is the root of all evil”, so first things first:
enable detailed monitoring and collect query statistics for a day. The next
morning, there is some data:  the code is hitting mostly a single table with a
simple query but with WHERE on a column without an index. Add an index, maybe
it will help improve the performance before I have to dive into the code.</p>

<p>The next morning, I check the DB graphs, and there is no load at all.  Did the
job run? Did someone disable it? Did I change something, causing the job to
crash?</p>

<p>Checking the application logs. The job did run. And completed successfully. In
10 minutes.  One single index reduced run time from more than 3 hours to 10
minutes. The classic “one hit with the hammer but you need to know where to
hit”.</p>

<p>On the other hand, I’m trying to imagine what someone who “doesn’t know where
to hit” and has no visibility into the database and application performance
would do to solve this issue. Just crank up the instance size? I suspect they
would end up paying ten times more.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Building Vagrant-based Development Environment]]></title>
    <link href="https://alexeydemidov.com/2014/07/02/building-vagrant-based-development-environment/"/>
    <updated>2014-07-02T06:32:42+00:00</updated>
    <id>https://alexeydemidov.com/2014/07/02/building-vagrant-based-development-environment</id>
    <content type="html"><![CDATA[<p>Over the course of the last few months I have built three different custom
<a href="http://www.vagrantup.com">Vagrant</a> boxes to create local development environment for two different
applications &mdash; one is <a href="http://wordpress.org">WordPress</a> based and another is Rails one with a few <abbr>PHP</abbr>
parts.</p>

<p>The problem which Vagrant solved was that both applications are too complex to
setup manually. Even when working with <a href="http://wordpress.org">WordPress</a>
developers didn&rsquo;t work locally but instead used to edit files directly on live
server and even when we imported all code into <code>git</code> they started using
integration server for day-to-day development and their workflow looked
terrible &mdash; change a line, commit, push, wait for deploy script to run,
check integration server for results, repeat. Moreover, as a result of this
workflow <code>git</code> history looked ugly &mdash; myriad of one-line commits with no
commit messages, which are painful to merge. For Rails app we needed some
<abbr>CSS</abbr>/<abbr>HTML</abbr> tweaks and there is just no way average
front-end developer can setup Rails development environment on Windows.</p>

<p>At first I thought about distributing binary Vagrant box but I still needed to
distribute application source code as <code>git</code> repository and Vagrantfile to
configure sharing and I was too lazy to setup password protected directory on
web server to download binary box and hand out all credentials to individual
developers (<a href="https://app.vagrantup.com/">Vagrant Cloud for organizations</a>
hadn&rsquo;t been available yet). So I decided to make one single <code>git</code> repo with
Vagrant configuration and cookbooks and source code repo
included as submodule.</p>

<p>It was a while since I did chef cookbook development so I googled a lot at
first trying to find what is the best current approach and which tools to use.
Cookbook development completely changed over last year or two &mdash; there are now
<a href="http://kitchen.ci">test-kitchen</a>, <a href="http://berkshelf.com">berkshelf</a>,
<a href="http://serverspec.org">serverspec</a> etc and all these tools are changing very
fast &mdash; almost any tutorial older than a few months is obsolete.</p>

<p>So far I have found following blog posts as the most current:</p>

<ul>
<li><p><a href="http://misheska.com/blog/2013/08/06/getting-started-writing-chef-cookbooks-the-berkshelf-way-part3/">Getting Started Writing Chef Cookbooks the Berkshelf Way</a> (ignore parts 1 and 2)</p></li>
<li><p><a href="https://micgo.net/2013/12/automating-cookbook-testing-with-test-kitchen-berkshelf-vagrant-and-guard/">Automating Cookbook Testing with Test-Kitchen, Berkshelf, Vagrant and Guard</a></p></li>
</ul>


<p>In my setup I have followed the second one and cross-checked with the first
article. I chose to include in my toolbox <a href="http://kitchen.ci">test-kitchen</a>, <a href="http://berkshelf.com">berkshelf</a>, <a href="http://serverspec.org">serverspec</a>,
<a href="https://github.com/chefspec/chefspec">chefspec</a>, <a href="https://docs.chef.io/workstation/foodcritic/">foodcritic</a>,
<a href="https://github.com/bbatsov/rubocop">rubocop</a> and wrap everything with <a href="http://guardgem.org/">guard</a> (but later disabled <code>test-kitchen</code> run from <code>guard</code> as it was failing). In the beginning
I started preparing custom Vagrant base box with <a href="https://github.com/jedi4ever/veewee">veewee</a> but dropped it as I
didn&rsquo;t really need anything custom and standard <code>chef/debian</code> box from
<a href="http://vagrantcloud.com">vagrantcloud.com</a> worked well.</p>

<p>My main repo has very simple structure &mdash; <code>Gemfile</code> with <code>berkshelf</code>,
<code>Berksfile</code> with all necessary cookbooks, <code>Vagrantfile</code> and <code>INSTALL</code> file with
step-by-step instructions for developers. In <code>www</code> sub-directory I have site source code as
<code>git</code> submodule and in <code>cookbooks</code> sub-directory all depended cookbooks vendored
using <code>berks vendor cookbooks</code>. At first I used to include my own cookbooks as
<code>git</code> submodule too into <code>site-cookbooks</code> but as <code>berks vendor</code> retrieves them
anyway I dropped this. Also I decided not to use
<a href="https://github.com/berkshelf/vagrant-berkshelf">vagrant-berkshelf</a> plugin to
maintain cookbooks as it is
<a href="https://sethvargo.com/the-future-of-vagrant-berkshelf/">deprecated</a>.</p>

<p>For each application I created individual cookbook and one cookbook for common
configuration. Each cookbook has own <code>git</code> repo and follows standard layout
created by <code>berks cookbook</code>. I have also decided to rely on community
cookbooks for all dependencies like MySQL, <abbr>PHP</abbr> etc, even though I didn&rsquo;t do much
customization but this decision caused a bit of pain &mdash; I had to fork cookbooks
for MySQL and <code>monit</code> to support Debian squeeze and had to use alternative cookbook
for <abbr>PHP</abbr> as <code>phpmyadmin</code> cookbook depends on it. Each cookbook has multiple recipes: for
Vagrant setup, for integration server setup and for live server setup as there
is some differences between them &mdash; <abbr>SSL</abbr> support and while integration server
runs <code>php-fpm</code> live server still uses <code>mod_php</code>.</p>

<p>At first I followed quite strict <abbr>TDD</abbr>/<abbr>BDD</abbr> loop &mdash; create <code>serverspec</code> tests, then
<code>chefspec</code> and then write recipe but after a while dropped <code>chefspec</code> tests as I
find writing <code>expect(chef_run).to include_recipe('apache2')</code> and then
<code>include_recipe 'apache2'</code> a bit boring. Also running <code>kitchen converge &amp;&amp;
kitchen verify</code> is quite slow even with a lot of <abbr>RAM</abbr> and on <abbr>SSD</abbr> disk. I
tried to speed up things by switching to <abbr>LXC</abbr> but <code>kitchen-lxc</code> seems to be broken
and unsupported and using <code>vagrant-lxc</code> with <code>test-kitchen</code> isn&rsquo;t documented
very well and requires building <abbr>LXC</abbr> base boxes manually using
<a href="http://fabiorehm.com/blog/2013/07/18/crafting-your-own-vagrant-lxc-base-box/">outdated instructions</a>
&mdash; some links to configuration templates are 404 and after you build base boxes
recent Vagrant complains about outdated box format. My attempts to use
<a href="https://github.com/fgrehm/vagrant-lxc-base-boxes">more up to date scripts</a> to build
base box failed as these scripts just segfaulted on me and I didn&rsquo;t have time
to fix them as manually built base boxes already working. Another issue
is that my Linux Mint box had <code>sudo</code> configuration setting which caused
<code>vagrant-lxc</code> to fail when used with <code>test-kitchen</code> and a couple weeks passed
before <a href="http://stackoverflow.com/questions/23480155/vagrant-lxc-fails-to-start-when-run-from-test-kitchen">I found time to find a solution</a>
so all cookbooks were developed slowly using <a href="https://www.virtualbox.org/">VirtualBox</a>.</p>

<p>But overall development went quite smoothly except for few PHP/WordPress
surprises in the end &mdash; e.g. PHP with <code>short_open_tag</code> switched fails with
<code>syntax error</code> pointing to the end of huge 5K LOC .php file without any hint of
real error cause or WordPress shows just blank front page without any error
messages in error logs if some plugin fails or missing. But real adventure was
still ahead. When all cookbooks were ready and fully tested locally on Linux
and Mac OS X it was time to deploy to Windows boxes where everything failed
just at the very beginning &mdash; Vagrant was launching VirtualBox VMs but unable to
<code>ssh</code> into them. Few days of remote debugging using email and I had found
that even <code>vagrant init hashicorp/precise</code> failed to work on Windows so I
got idea and tried to switch to 32-bit OS image which worked. Later I got RDP
access to Window 8 box and launched VirtualBox directly which
complained that VT-x is disabled (it needs to be enabled in BIOS and this
feature is unavailable on Celeron processors) and it can&rsquo;t launch 64-bit image.
Once I switched images to 32-bit all Windows users were able to use them
without much problems, except occasional cases when developers didn&rsquo;t read
documentation and forgot to use <code>git clone --recursive</code> and similar issues.</p>

<p>Another quite problematic issue with Windows was that it is impossible to
create symbolic links on shared file system with default settings and Rails app
were deployed <code>capistrano</code> style and relied on symbolic links heavily. I had to
revamp whole recipe for Rails app and remove all symbolic links to get it
working on Vagrant under Windows. Another Rails specific issue is that <code>rvm</code>
cookbooks needs special recipe <code>rvm::vagrant</code> to be included before any other
recipe if it is run in Vagrant VM.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[WordPress Site Performance Optimization]]></title>
    <link href="https://alexeydemidov.com/2014/06/24/wordpress-site-performance-optimization/"/>
    <updated>2014-06-24T09:10:51+00:00</updated>
    <id>https://alexeydemidov.com/2014/06/24/wordpress-site-performance-optimization</id>
    <content type="html"><![CDATA[<p>Spent about a week working on optimizing performance of
<a href="https://wordpress.org">WordPress</a>-based web application. While site already had
some optimizations in place, like <a href="https://wordpress.org/plugins/w3-total-cache/">W3 Total Cache</a> backed by
<abbr>APC</abbr> and <code>mod_pagespeed</code> installed, there still were complaints
that site loads very slow.</p>

<p>Before making any action I started by measuring actual performance and gathered
metrics using <a href="https://newrelic.com">New Relic</a> and Chrome Developer Tools Audit
Tab. New Relic showed a few of critical insights into performance troubles.</p>

<p>First one was two widgets in the footer of every page each making requests to
external <abbr>API</abbr> services taking on average ~600ms one and ~1500 ms
another. As the second service was our own custom service I quickly optimized
it by adding counter cache field to table instead of making select on depended
records on each request and request time went down to around ~150 ms.  But
these requests were still made on every page load so I patched both widgets to
cache responses from external <abbr>API</abbr> in memcached for 5 minutes.
Average page generation time went down to around ~800 ms.</p>

<p>Another thing to optimize was W3 Total Cache. First I turned off its minify
option as it was sometimes taking up 2-3 seconds according to New Relic. Next I
switched cache storage from <abbr>APC</abbr> to memcached as <abbr>APC</abbr>
was constantly reset every minute by some rogue code somewhere on this server:
<code>grep -r apc_clear_cache</code> showed about hundred of matches. This issue also
affected <abbr>PHP</abbr> opcode caching so I decided to switch to Zend OPcache
for opcode caching.  For page cache I choose to switch from memcached storage
to extended disk so if page was cached then Apache would serve it directly
without hitting any <abbr>PHP</abbr> code. As database queries were taking
insignificant percentage of all page generation time I switched DB cache
completely off to avoid <abbr>PHP</abbr> overhead and instead cranked up MySQL
own query cache memory limits.</p>

<p>With all these optimizations page generation time stabilized around ~600ms
against ~2500 ms week ago and I called it a day - I don&rsquo;t think I can squeeze
more performance out of WordPress without going through analyzing performance
impact of each plugin.</p>

<p>The next step in the site optimization was tuning of <code>mod_pagespeed</code> settings.
At first it looked like it was not working at all. After checking
<code>mod_pagespeed</code> logs I have found that it doesn&rsquo;t work with SSL by default and
we have https-only site. Another obstacle with which I have spent good half
hour is that W3 Total Cache page cache interferes with <code>mod_pagespeed</code>, looks
like later ignores static HTML files. After that I have enabled most of
<code>mod_pagespeed</code> CoreFilters, focusing on CSS and JS files optimization, as we
had about 30-40 assets of each kind, and was able to reduce number of external
CSS/JS down to 9-10 per page. I have also tried to optimize page loading by
using filter <code>defer_javascript</code> which moves JS files down to page footer but
had to turn it off later as it broke some JS navigation menus. Overall page
load speed went down from around 10 seconds to 5 seconds on average.</p>

<p><img src="https://alexeydemidov.com/images/newrelic.png" title="New Relic Graph" ></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Octopress Revival]]></title>
    <link href="https://alexeydemidov.com/2014/06/15/octopress-revival/"/>
    <updated>2014-06-15T09:46:37+00:00</updated>
    <id>https://alexeydemidov.com/2014/06/15/octopress-revival</id>
    <content type="html"><![CDATA[<p>Resurrected my standalone blog for the third time, this time again on
<a href="http://octopress.org/">Octopress</a> and still on 2.0 version. I didn&rsquo;t intend to
do this but there is still no good blogging platform with code highlighting
support.</p>

<p>I set it up pretty quickly but the first problem was that I wanted to keep old
content but I didn&rsquo;t have source code for it anymore. Converting by hand seemed
tedious so I thought about hiring someone on oDesk but then, after a few Google
searches I found a tool to convert HTML back to markdown &ndash; <a href="https://github.com/xijo/reverse_markdown">reverse_markdown ruby gem</a>.
At first attempt it did no
conversion but after stripping all HTML code around actual post content (the
most important is to remove <code>article</code> tags around) it produced nice markdown
which I put back into Octopress.</p>

<p>After initial import I did some cleanup &ndash; removed unnecessary <code>/blog</code> prefix
from post permalinks, fixed links in old content pointing to my old-old
MovableType blog and imported static files into Octopress. To check all links I
installed <a href="http://www.ryanalynporter.com/2012/10/06/introducing-the-link-checker-ruby-gem/">link-checker ruby gem</a>
&ndash; it works pretty fine but seems to be having problems with some <code>https://</code>
links.</p>

<p>Once all content was in good shape I tweaked CSS colors back to my old palette,
added <a href="https://gist.github.com/AlexeyDemidov/b045b6f5b6a8d0a19e67">Stack Exchange badge</a>,
enabled <a href="https://disqus.com/">Disqus</a> comments, updated Google Analytics
JavaScript to latest universal code.</p>

<p>After comparing generated HTML with old blog using <code>diff</code> I have found a bug in
Octopress: canonical link for categories pages is broken by default &ndash; has
missing <code>/</code>, see <a href="https://github.com/imathis/octopress/issues/949">Octopress issue #949</a>
for fix. Once I was satisfied with content I deployed it to server using
<code>rsync</code>.</p>

<p>There is currently <a href="https://github.com/octopress/octopress">Octopress version 3.0</a>
in development and it is close to final release but it seems to be quite
different from version 2.0 in its concept and as its author
<a href="https://github.com/octopress/octopress/issues/30">says</a>:</p>

<blockquote><p>For those currently using Octopress, it will be a while before the new
releases can compete with the features of the old version.</p></blockquote>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Hunt for the Bug]]></title>
    <link href="https://alexeydemidov.com/2014/06/08/hunt-for-the-bug/"/>
    <updated>2014-06-08T10:41:46+00:00</updated>
    <id>https://alexeydemidov.com/2014/06/08/hunt-for-the-bug</id>
    <content type="html"><![CDATA[<p>Spent three days last week hunting for mysterious bug which caused
<code>factory_girl</code> factories randomly fail with <code>Trait not registered: class</code>
message during full test suite run, but when you run all controller or model
tests separately &ndash; everything is fine and all tests which were failing during
full run worked perfectly.</p>

<p>At first I ignored this issue &ndash; I had just added two new factories, which
coincidentally used <code>class</code> parameter in their definition to specify generated
class explicitly, and I needed these factories to test code I was working on
and I though that I&rsquo;ll fix or just remove these factories later.</p>

<p>But as it usually happens it wasn&rsquo;t simple as I thought. Suddenly I discovered
that tests started failing with the same symptoms on common <code>develop</code> branch
and not on only topic branch. And I broke tests already in two other places, so
clean up was really needed.</p>

<p>First two days I spent trying to find out what happening in <code>factory_girl</code>
internals using old-school print logging and later pry-debugger but without
much success except that I was able to locate single spec file in
<code>spec/workers/</code> which caused failure of all consecutive factory calls. Then I
started looking at git history trying to find commit which introduced this
issue. Luckily, in spite of heavy rebasing and few backported commits, my
<code>master</code> branch didn&rsquo;t have this issue and I was able to pinpoint this to
single commit. At first glance this commit looked almost innocent &ndash; it just
extracted code from model and moved it to <code>app/workers/</code>. But there were two tests
added to failing spec file and they were the tests which caused cascading
failure of all remaining tests in suite. After reviewing the code under test I
had found that real culprit was memory leak debugging code I quickly slapped in
without running tests:</p>

<figure class='code'><figcaption><span></span></figcaption>  counts = Hash.new { 0 }
  ObjectSpace.each_object do |o|
    counts[o.class] += 1
  end
  counts.reject! { |_k, v| v < 100 }</figure>


<p>It seems that <code>FactoryGirl::DefinitionsProxy</code> undefines all methods including
<code>class</code> and <code>method_missing</code> in this class adds any calls as traits to factory
so walking through <code>ObjectSpace</code> and calling <code>class</code> on every object wreaks
havoc on factories.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Сборка русской версии Movable Type]]></title>
    <link href="https://alexeydemidov.com/2008/04/01/sborka-russkoj-versii-movable/"/>
    <updated>2008-04-01T20:35:21+00:00</updated>
    <id>https://alexeydemidov.com/2008/04/01/sborka-russkoj-versii-movable</id>
    <content type="html"><![CDATA[<p>Для сборки руссифицированной версии Movable Type нам опять понадобятся исходники из <a href="https://github.com/movabletype/movabletype">svn-репозитария</a>. Делаем чекаут, как описано в предыдущей <a href="https://alexeydemidov.com/2008/03/30/rabota-s-failom-perevoda-mt/">статье о работе с файлом перевода</a> и прикладываем к полученным исходникам патч <a href="https://alexeydemidov.com/files/patch-rubuild.gz">patch-rubuild.gz</a>, который добавляет возможность сборки русской версии.</p>

<p>Следующим шагом вносим в исходники все необходимые изменения для поддержки русского языка:</p>

<ul>
<li><a href="https://alexeydemidov.com/files/patch-rudate41.gz">patch-rudate41.gz</a> — добавляет русский формат дат</li>
<li><a href="https://alexeydemidov.com/files/patch-dirify.gz">patch-rudirify41.gz</a> — добавляет русские символы в таблицы преобразования заголовков статей в имена файлов (в отличии от <a href="https://blog.lexa.ru/files/patch-dirify.gz">патча</a> <a href="https://blog.lexa.ru/">Алексея Тутубалина</a> изменяется также и Javascript код, а преобразование русских символов в латиницу сделано в соответствии с <a href="https://ru.wikipedia.org/wiki/ISO_9#.D0.93.D0.9E.D0.A1.D0.A2_7.79.E2.80.942000">ГОСТ 7.79-2000</a>)</li>
<li><a href="https://blog.lexa.ru/files/patch-nofollow.gz">patch-nofollow.gz</a> — добавляет поддержку тэга <noindex> необходимого для российских поисковиков (автор Алексей Тутубалин)</li>
<li><a href="https://blog.lexa.ru/files/patch-monday-mt41.gz">patch-monday-mt41.gz</a> — делает первым днем недели в календаре понедельник (автор Алексей Тутубалин)</li>
</ul>


<p>После внесения изменений в код и шаблоны осталось добавить собственно файл перевода lib/MT/L10N/ru.pm ( <a href="https://movable-type.ru/forums/viewtopic.php?id=75">перевод</a> проекта <a href="https://movable-type.ru">movable-type.ru</a>, либо мой, менее полный <a href="https://alexeydemidov.com/files/ru.pm">перевод</a>), файл стилей <a href="https://alexeydemidov.com/files/styles_ru.css">mt-static/styles_ru.css</a> и два HTML-файла — index.html.ru и readme.html.ru (они есть в <a href="https://code.google.com/archive/p/movabletype/source">репозитарии</a> русской версии Movable Type на Google Code). Файл перевода необходимо обработать в соответствии с инструкциями в предыдущей <a href="https://alexeydemidov.com/2008/03/30/rabota-s-failom-perevoda-mt/">статье</a>. В завершении можно изменить некоторые настройки по умолчанию (часовой пояс, кодировки, ссылки на новости, портал и техподдержку) для собираемого пакета, отредактировав файл build/mt-dists/ru.mk.</p>

<p>Сделав все необходимые изменения выполняем сборку пакета MTOS:</p>

<pre><code>env LANG=C ./build/exportmt.pl --local --pack=MTOS --lang=ru --prod
</code></pre>

<p>В результате выполнения данной команды получим два архива с русской версией Movable Type — MTOS-4.1-ru.tar.gz и MTOS-4.1-ru.zip.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Работа с файлом перевода Movable Type]]></title>
    <link href="https://alexeydemidov.com/2008/03/30/rabota-s-failom-perevoda-mt/"/>
    <updated>2008-03-30T09:35:28+00:00</updated>
    <id>https://alexeydemidov.com/2008/03/30/rabota-s-failom-perevoda-mt</id>
    <content type="html"><![CDATA[<p>Совпадение, но через два дня после публикации <a href="https://alexeydemidov.com/2008/03/24/russifikacija-movable-type-41/">последней статьи</a> на тему руссификации Movable Type, вышел <a href="https://movable-type.ru/forums/viewtopic.php?id=75">перевод</a> от проекта <a href="https://movable-type.ru/">movable-type.ru</a>. В сравнении с моим переводом, он более полный, но делался, судя по всему, по французскому файлу трансляции от версии 4.0 и в нем отсутствует более 300 строк, а также полтора десятка строк остались на французском.</p>

<p>Для того, чтобы синхронизировать файл перевода с текущим релизом (а также будущими релизами) и добавить в него отсутствующие строки, нужен полный комплект исходников <a href="https://movabletype.org/">Movable Type</a> из <a href="https://github.com/movabletype/movabletype">svn-репозитария</a> — в них пристутствует набор скриптов для манипуляций с файлами переводов (для работы скриптов нужен perl и набор unix-утилит - shell, awk, find и т.д.). Сначала <a href="https://movabletype.org/news/2007/12/mtos_subversion_tips.html">делаем чекаут</a> исходников из репозитария:</p>

<pre><code>svn co https://code.sixapart.com/svn/movabletype/latest mtos-latest
</code></pre>

<p>После выполнения данной команды в каталоге mtos-latest будет три подкаталога: dev, stable и release. Нам нужен подкаталог release, в котором находится последний релиз Movable Type с которым мы будем работать. Первым делом кладем наш файл перевода ru.pm в подкаталог lib/MT/L10N/. Следующим шагом запускаем собственно генерацию обновленного файла перевода из каталога release:</p>

<pre><code>sh build/l10n/make-l10n ru
</code></pre>

<p>В результате работы данного скрипта в каталоге /tmp будет сгенерировано четыре файла:</p>

<ul>
<li>ru-base.pm — файл со всеми строками трансляции (в том числе дублирующиеся строки)</li>
<li>ru-nodupe.pm — файл с удаленными дубликатами строк</li>
<li>ru-old.pm —строки трансляции которые более не используются</li>
<li>ru.pm —новый файл перевода</li>
</ul>


<p>В новом файле перевода /tmp/ru.pm будут добавлены отсутствующие строки перевода (отмечены комментарием «Translate - New») и будут отчемены строки с регистром перевода отличающимся от оригинала — например, переводы следующего вида:</p>

<pre><code>'video' =&gt; 'Видео'
</code></pre>

<p>Этот файл уже готов к использованию, но для полноценной поддержки русского языка я сделал дополнительный фильтр <a href="https://alexeydemidov.com/files/ru-filter.pl">ru-filter.pl</a>, через который надо пропустить файл перевода:</p>

<pre><code>cat /tmp/ru.pm | perl ru-filter.pl &gt; lib/MT/L10N/ru.pm
</code></pre>

<p>Данный фильтр делает следующее:</p>

<ul>
<li>отмечает комментарием «# Translate - No translation» непереведенные строки, в которых оригинал и перевод совпадают</li>
<li>отмечает комментарием «# Translate - No russian char» строки перевода в которых не встречаются русские буквы</li>
<li>отключает строки с пустым переводом</li>
<li>добавляет perl’овый код для правильной поддержки множественного числа</li>
</ul>


<p>Поддержка множественного числа сделана через переопределение функцииquant из <a href="https://metacpan.org/dist/Locale-Maketext">Locale::Maketext</a>. Теперь второй и третий параметры этой функции принимают формы слова для значений числительного от 2 до 4 и от 5 до 0 соответственно. Таким образом можно писать «[quant, _1,минута,минуты,минут]» и соответствующий текст будет генерироваться как «1 минута, 2 минуты, 5 минут» вместо «1 минута, 2 минут., 3 минут.»</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Руссификация Movable Type 4.1]]></title>
    <link href="https://alexeydemidov.com/2008/03/24/russifikacija-movable-type-41/"/>
    <updated>2008-03-24T11:59:10+00:00</updated>
    <id>https://alexeydemidov.com/2008/03/24/russifikacija-movable-type-41</id>
    <content type="html"><![CDATA[<p><a href="https://alexeydemidov.com/2008/03/20/uskorieniie-movable-types-apache-i-fastcgi/">Продолжаем</a> работу с Movable Type. Руссификацию пришлось делать в два захода. На первом заходе, после предварительного гугленья, был применен <a href="https://blog.lexa.ru/2008/01/26/movabletype_41_i_russkij_jazik.html">рецепт</a> для руссификации <a href="https://movabletype.org/">Movable Type</a> версии 4 от <a href="https://blog.lexa.ru/">Алексея Тутубалина</a> с небольшими изменениями:</p>

<ul>
<li>приложен патч <a href="https://blog.lexa.ru/files/patch-monday-mt41.gz">patch-monday-mt41.gz</a> для того чтобы в календаре неделя начиналась с понедельника</li>
<li>приложен патч <a href="https://blog.lexa.ru/files/patch-dirify.gz">patch-dirify.gz</a> для конвертации русских символов в заголовках статей в латиницу при генерации имени файла</li>
<li>приложен патч <a href="https://blog.lexa.ru/files/patch-nofollow.gz">patch-nofollow.gz</a> для поддержки тэга <noindex> для Яндекса</li>
</ul>


<p>Патч <a href="https://blog.lexa.ru/files/patch-rudate.gz">patch-rudate.gz</a> не прикладывал, но посмотрев в него, сделал свой. Вместо того, чтобы использовать итальянский формат, добавил русский в список форматов, изменив lib/MT/Util.pm и tmpl/cms/cfg_entry.tmpl. Изменен также и формат отображения — вместо <em>24.03.2008</em> будет <em>Март, 24 2008</em>. Патч <a href="https://alexeydemidov.com/files/patch-rudate41.gz">patch-rudate41.gz</a> прилагается.</p>

<p>Далее последовал наиболее нудный этап — перевод шаблонов. Разбор со <a href="https://blog.lexa.ru/2007/09/23/eksport_templejtov_movable_type_variant_2.html">скриптом Алексея Тутубалина</a> для экспорта/импорта шаблонов я оставил на потом (этот скрипт был сделан до выхода версии 4.1 и неизвестно как он с ней работает, к тому же он требует установки плугина TemplateInstaller), поэтому шаблоны перевел вручную через админку Movable Type. Хотя шаблонов около полусотни, но сами они немногословны и переводить можно не все, а только те, которые реально используются. Поэтому первые результаты появился сравнительно быстро, но в процессе перевода увлекся файн-тюнингом оформления и внес слишком много изменений в шаблоны, так что собравшись создавать блог для своей SO, обнаружил что мне надо либо вычищать свое оформление, либо возвращаться к стандартным шаблонам и переводить их опять с нуля.</p>

<p>Немного поразмышляв, решил выбрать другой подход. В Movable Type есть в наличии полная инфраструктура для локализации базирующаяся на perl’овом <a href="https://metacpan.org/dist/Locale-Maketext">Locale::Maketext</a> и <a href="https://github.com/movabletype/Documentation/wiki/Translation-and-Localization">документация рекомендуют</a> именно этот подход. Проект <a href="https://movable-type.ru/">movable-type.ru</a> после выхода версии 4.0 в августе 2007 вроде бы огранизовал <a href="https://movable-type.ru/2007/08/localization-mt4.php">коллективную попытку</a> сделать русский перевод и месяц назад даже <a href="https://movable-type.ru/forums/viewtopic.php?id=66">заявлена</a> 90% готовность, но реальных результатов пока не наблюдается. Конечно объем текста в более чем 4 тысячи строк на первый взгляд выглядит ужасающе, но ничто не мешает переводить текст по частям, например, для начала только те строки, которые используются в самих блогах, а не в админке.</p>

<p>За пару вечеров у меня был готов перевод следующих компонентов используемых в опубликованном блоге:</p>

<ul>
<li>шаблоны default_templates/*.mtml которые используются при генерации шаблонов нового блога</li>
<li>шаблоны plugins/WidgetManager/default_widgets/*.mtml для widget’ов</li>
<li>php-скрипты динамической публикации</li>
</ul>


<p>Все это занимает где-то чуть более 600 строк ( <a href="https://alexeydemidov.com/files/ru-published-only.pm">ru-published-only.pm</a>). Данный файл необходимо переименовать в ru.pm и положить в lib/MT/L10N/, при этом русский язык автоматически станет доступен для выбора в профиле пользователя. Стоит иметь в виду, что перевод фраз в шаблонах на русский производится в момент генерации шаблонов и только если у пользователя выбран русский язык в профиле. Если блог уже создан, то можно перегенерировать шаблоны заново, зайдя в своем блоге через меню «Design» → «Manage Templates» → «Refresh Blog Templates». При перегенерации большинство новых руссифицированных шаблонов будут иметь русские имена и у меня лично слетели Archive Maps и наборы widget’ов. Нужно будет их пересоздать, предварительно удалив все старые шаблоны и widget’ы.</p>

<p>Все остальное пока находится в процессе перевода, хотя наиболее часто используемые части админки уже переведены практически полностью (можно скачать <a href="https://alexeydemidov.com/files/ru-20080402.pm">Файл перевода</a>). В качестве основы для этого файла взят es.pm, поэтому непереведенные строки будут по испански. Также добавлен скрипт <a href="https://alexeydemidov.com/files/mt_ru.js">mt_ru.js</a> который содержит локализованные строки для JavaScript и локализованную таблицу для динамической dirtif’икации заголовков в имена файлов. Файл стилей <a href="https://alexeydemidov.com/files/styles_ru.css">styles_ru.css</a> содержит одну строчку — модификация ширины колонок в виджете календаря.</p>

<p>По мере дальнейшего перевода я планирую выкладывать обновленную версию файла <a href="https://alexeydemidov.com/files/ru-20080402.pm">ru.pm</a>, следите за обновлениями.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Ускорение Movable Types ( Apache и FastCGI )]]></title>
    <link href="https://alexeydemidov.com/2008/03/20/movabletypes-fastcgi/"/>
    <updated>2008-03-20T17:44:05+00:00</updated>
    <id>https://alexeydemidov.com/2008/03/20/movabletypes-fastcgi</id>
    <content type="html"><![CDATA[<p>После установки <a href="https://movabletype.org/">Movable Type</a> обнаружилось что
работа с его админкой (CMS) достаточно некомфортна из-за её существенной
тормознутости &mdash; время загрузки главной страницы составляет около 15 секунд.
Впрочем, это не удивительно, поскольку админка реализована как достаточно
тяжелый perl’овый скрипт со множеством зависимостей которые при работе в режиме
стандратного CGI подгружаются при каждом <acronym>HTTP</acronym>-запросе.</p>

<p>Первой мыслью было попробовать запустить его под <a href="https://perl.apache.org/">mod_perl</a>, который у меня уже есть, но как оказалось <a href="https://movabletype.org/">Movable Type</a> поддерживает только <a href="https://perl.apache.org/">mod_perl</a> 1.x, а я уже третий год как переехал на <a href="https://httpd.apache.org/">Apache</a> 2.x. Поэтому пришлось обратиться к альтернативном варианту — <a href="https://en.wikipedia.org/wiki/FastCGI">FastCGI</a>, отличие которого от классического CGI именно в том, что скрипт обслуживает не один, а множество <acronym>HTTP</acronym>-запросов, чем минимизируются временные затраты на запуск и инициализацию скрипта.</p>

<p>Сначала обеспечиваем поддержку <a href="https://en.wikipedia.org/wiki/FastCGI">FastCGI</a> в <a href="https://httpd.apache.org/">Apache</a> — это делается с помощь модуля <a href="https://en.wikipedia.org/wiki/FastCGI">mod_fastcgi</a> (или альтернативная реализация <a href="https://httpd.apache.org/mod_fcgid/">mod_fcgid</a>). Ставим модуль и подключаем его в httpd.conf.настройки для <a href="https://en.wikipedia.org/wiki/FastCGI">mod_fastcgi</a>:</p>

<pre><code>LoadModule fastcgi_module libexec/apache22/mod_fastcgi.so
&lt;IfModule mod_fastcgi.c&gt;
  AddHandler fastcgi-script .fcgi
  FastCgiIpcDir /var/run/fcgidsock/
&lt;/IfModule&gt;
</code></pre>

<p>настройки для <a href="https://httpd.apache.org/mod_fcgid/">mod_fcgid</a>:</p>

<pre><code>LoadModule fcgid_module libexec/apache22/mod_fcgid.so
&lt;IfModule mod_fcgid.c&gt;
  AddHandler fcgid-script .fcgi 
  SocketPath /var/run/fcgidsock/
&lt;/IfModule&gt;
</code></pre>

<p>Сам <a href="https://movabletype.org/">Movable Type</a> начиная с версии 4 поддерживает работу через <a href="https://en.wikipedia.org/wiki/FastCGI">FastCGI</a> из <a href="https://movabletype.org/documentation/administrator/maintenance/fastcgi.html">коробки</a>, поэтому для его настойки достаточно указать для каталога где находятся скрипты mt*.cgi следующую директиву:</p>

<pre><code>AddHandler fcgid-script .cgi
</code></pre>

<p>или</p>

<pre><code>AddHandler fastcgi-script .cgi
</code></pre>

<p>После включения данной опции тестирование с помощью <a href="https://httpd.apache.org/docs/2.0/programs/ab.html">ApacheBench</a> показало на моей домашней машинке ускорение более чем на порядок — с 0.75 запросов в секунду до 15. Но особого ускорения при работе с админкой не прозошло. Налицо классический случай «premature optimization», которая «root of all evil» © <a href="https://en.wikiquote.org/wiki/C._A._R._Hoare">C.A.R. Hoare</a>.</p>

<p>Поэтому пришлось все-таки более внимательно посмотреть в <a href="https://developer.yahoo.com/yslow/">YSlow</a> и последовательно применить предлагаемые <a href="https://developer.yahoo.com/performance/rules.html">рекомендации</a>. Основная проблема оказалась в большом количестве объектов на странице и как следствие, большом количестве HTTP запросов (около 50 в данном случае), причем большинство запросов были If-Modified-Since на файлы из mt-static, в осовном изображения. Лечиться это добавлением заголовков Expires для каталога mt-static с помощью <a href="https://httpd.apache.org/docs/2.0/mod/mod_expires.html">mod_expires</a>:</p>

<pre><code>ExpiresActive On
ExpiresDefault "access plus 1 month"
</code></pre>

<p>С включением данной опции броузер перестал обращаться к каждой картинке при каждой загрузке страницы, что сократило количество запросов с полусотни до ровно одного — обращение собственно к mt.cgi. Общий объем передаваемых данных также сократился с 100Kb (уже с учетом включеной компрессии для text/*) до 10Kb. Общее же время загрузки страницы сократилось с 15 секунд до 5-6 секунд по таймеру <a href="https://developer.yahoo.com/yslow/">YSlow</a> и до 3-4 секунд по секундомеру с отключеным <a href="https://getfirebug.com/">FireBug</a>‘ом и <a href="https://developer.yahoo.com/yslow/">YSlow</a>.</p>

<p>Достигнутый результат меня вполне устроит и тему оптимизации можно отложить. Дальше нужно будет заниматься уже переводом на русский и переносом на боевой сервер в тесный jail. А пока в процессе первых дней эксплуатации выяснилось пара нюансов. Во-первых, не все скрипты <a href="https://movabletype.org/">MT</a> работают в режиме <a href="https://en.wikipedia.org/wiki/FastCGI">FastCGI</a>, поэтому лучше использовать следующие настройки подсмотренные в <a href="https://movabletype.org/documentation/administrator/maintenance/fastcgi.html">документации Movable Type</a>:</p>

<pre><code>&lt;FilesMatch "^mt(?:-(?:comments|search|tb|view))?\.cgi$"&gt;
     AddHandler fcgid-script .cgi
&lt;/FilesMatch&gt;
</code></pre>

<p>Во-вторых, процесс mt.cgi со временем может разрастаться до достаточно неприличиных размеров (более 100 Mb) и поэтому лучше настроить <a href="https://en.wikipedia.org/wiki/FastCGI">mod_fastcgi</a> так, чтобы он переодические его перезапускал:</p>

<pre><code>FastCgiConfig -maxProcesses 10 -killInterval 3600
</code></pre>

<p>и для <a href="https://httpd.apache.org/mod_fcgid/">mod_fcgid</a></p>

<pre><code>ProcessLifeTime 3600
   MaxProcessCount 10
   MaxRequestsPerProcess 500
</code></pre>

<p>И в третьих, при работе через <a href="https://en.wikipedia.org/wiki/FastCGI">FastCGI</a> возникают неудобства с изменением кода MT на ходу — необходимо постоянно перезапускать mt.cgi чтобы он перечитал измененные модули.</p>

<p><strong><em> <a href="https://alexeydemidov.com/2008/03/24/russifikacija-movable-type-41/">Продолжение следует</a> …</em></strong></p>
]]></content>
  </entry>
  
</feed>
