Quantcast
Channel: How to keep executable code in memory even under memory pressure ? in Linux - Stack Overflow
Viewing all articles
Browse latest Browse all 4

How to keep executable code in memory even under memory pressure ? in Linux

$
0
0

The goal here is to keep every running process' executable code in memory during memory pressure, in Linux.
In Linux, I am able to instantly (1 sec) cause high memory pressure and trigger the OOM-killer bystress --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 + 4000;}'< /proc/meminfo)k --vm-keep -m 4 --timeout 10s (code from here)with 24000MB max RAM inside a Qubes OS R4.0 Fedora 28 AppVM. EDIT4: Perhaps relevant, and yet I forgot to mention, is the fact that I've no swap enabled (ie. CONFIG_SWAP is not set)

dmesg reports:

[  867.746593] Mem-Info:[  867.746607] active_anon:1390927 inactive_anon:4670 isolated_anon:0                active_file:94 inactive_file:72 isolated_file:0                unevictable:13868 dirty:0 writeback:0 unstable:0                slab_reclaimable:5906 slab_unreclaimable:12919                mapped:1335 shmem:4805 pagetables:5126 bounce:0                free:40680 free_pcp:978 free_cma:0

The interesting parts are active_file:94 inactive_file:72 they are in kilobytes and are very low.

The problem here is that, during that period of memory pressure, executable code is being re-read from disk causing disk thrashing which leads to frozen OS. (but in the above case it only happens for less than 1 second)

I see an interesting code in kernel mm/vmscan.c:

        if (page_referenced(page, 0, sc->target_mem_cgroup,&vm_flags)) {                nr_rotated += hpage_nr_pages(page);                /*                 * Identify referenced, file-backed active pages and                 * give them one more trip around the active list. So                 * that executable code get better chances to stay in                 * memory under moderate memory pressure.  Anon pages                 * are not likely to be evicted by use-once streaming                 * IO, plus JVM can create lots of anon VM_EXEC pages,                 * so we ignore them here.                 */                if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {                        list_add(&page->lru, &l_active);                        continue;                }        }

I'm think that if someone could point out how to change this so that instead of give them one more trip around the active list we get it to give them infinite trips around the active list, then job should be done. Or maybe there's some other way?

I can patch and test custom kernel. I just don't have the know-how as to what to change in the code in order to always keep active executable code in memory(which in effect, I believe, would avoid disk thrashing).

EDIT: Here's what I got working so far (applied on top of kernel 4.18.5):

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.hindex 32699b2..7636498 100644--- a/include/linux/mmzone.h+++ b/include/linux/mmzone.h@@ -208,7 +208,7 @@ enum lru_list { #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)-#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)+#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_INACTIVE_FILE; lru++) static inline int is_file_lru(enum lru_list lru) {diff --git a/mm/vmscan.c b/mm/vmscan.cindex 03822f8..1f3ffb5 100644--- a/mm/vmscan.c+++ b/mm/vmscan.c@@ -2234,7 +2234,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,    anon  = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +        lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);-   file  = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) ++   file  = //lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +        lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);    spin_lock_irq(&pgdat->lru_lock);@@ -2345,7 +2345,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc             sc->priority == DEF_PRIORITY);    blk_start_plug(&plug);-   while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||+   while (nr[LRU_INACTIVE_ANON] || //nr[LRU_ACTIVE_FILE] ||                    nr[LRU_INACTIVE_FILE]) {        unsigned long nr_anon, nr_file, percentage;        unsigned long nr_scanned;@@ -2372,7 +2372,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc         * stop reclaiming one LRU and reduce the amount scanning         * proportional to the original scan target.         */-       nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];+       nr_file = nr[LRU_INACTIVE_FILE] //+ nr[LRU_ACTIVE_FILE]+           ;        nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];        /*@@ -2391,7 +2392,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc            percentage = nr_anon * 100 / scan_target;        } else {            unsigned long scan_target = targets[LRU_INACTIVE_FILE] +-                       targets[LRU_ACTIVE_FILE] + 1;+                       //targets[LRU_ACTIVE_FILE] ++                       1;            lru = LRU_FILE;            percentage = nr_file * 100 / scan_target;        }

Also seen here on github because in the above code, tabs got transformed into spaces! (mirror1, mirror2)
I've tested the above patch(on 4000MB max RAM now, yes 20G less than before!) even with a Firefox compilation that was known to disk thrash the OS into a permanent freeze, and it does not happen anymore (oom-killer is almost instantly killing the offending process(es)), also with the above stress command which now yields:

[  745.830511] Mem-Info:[  745.830521] active_anon:855546 inactive_anon:20453 isolated_anon:0                active_file:26925 inactive_file:76 isolated_file:0                unevictable:10652 dirty:0 writeback:0 unstable:0                slab_reclaimable:26975 slab_unreclaimable:13525                mapped:24238 shmem:20456 pagetables:4028 bounce:0                free:14935 free_pcp:177 free_cma:0

That's active_file:26925 inactive_file:76, almost 27 megs of active file...
So, I don't know how good this is. Am I keeping all active files instead of just executable files in memory ? During firefox compilation I've had like 500meg of Active(file)(EDIT2: but that's according to: cat /proc/meminfo|grep -F -- 'Active(file)' which shows different value than the above active_file: from dmesg!!!) which makes me doubt it was only exes/libs...
Maybe someone can suggest how to keep ONLY executable code ?(if that's not what's already happening)
Thoughts?

EDIT3: with the above patch, it seems perhaps necessary to (periodically?) run sudo sysctl vm.drop_caches=1 to free some stale memory(?), so that if I call stress after a firefox compilation I get: active_file:142281 inactive_file:0 isolated_file:0 (142megs) then drop file caches (another way: echo 1|sudo tee /proc/sys/vm/drop_caches) then run stress again, I get: active_file:22233 inactive_file:160 isolated_file:0 (22megs) - I am unsure...

Results without the above patch: here
Results with the above patch: here


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images