A couple months ago I was given a Lenovo laptop that was broken. The owner of the laptop didn’t want to try to fix it and bought a new one. I took the broken laptop to either fix it or scrap it for parts.
background
The previous owner ran windows and the only issue was occasional bluescreens. The blue screens weren’t terribly common but often enough to be annoying.
Because I am a nerd, I decide to use this machine to test out Fedora Sway. I installed the OS no problem and began working on a couple Rust projects. All was dandy for a few days, I was configuring my window manager, coding in rust, everything was swell! I though I just got a free laptop. Then I started to see issues.
I started to get cryptic errors when compiling my rust project every now and then. I wasn’t doing any funny business with linkers but would fail builds with a linker error. The real red flag was that the errors were happening inconsistently. A rebuild after an error would sometimes succeed. And a rebuild after a build success would sometimes fail :thonk:.
After some thinking (playing factorio, eating snacks), I realized this was probably connected to the blue screens. It turns out this is true and that it is probably because of bad ram. Hm. Okay well ram is pretty important. But let’s make sure this is actually the issue.
testing the memory
I have never tested memory before so i was excited to learn a new skill. The standard utility to do this is Memtest86+. Memtest86+ is an open-source, stand-alone memory tester for x86/x86-64 computers. Sounds perfect!
It seems to be based on PCMemTest, which is a fork of Memtest86+ v5, which is a fork of MemTest-86. So maybe by the time you read this there will be another fork that is more up to date.
Using it is easy. I added the image to my Ventoy flashdrive but you can also just write it to a usb drive.
After booting Memtest86+ just starts testing and immdiately found some bad memory. It is cool that you can see actual vs expected results. In my case it seemed that only one byte in each word was incorrect.
The obvious fix here is to replace the ram. This is probably smart, but I am not sure how easy it would be on this particular machine as modern laptops are usually pretty hard to repair. Also, that would cost money.
Using f1
to configure the error results, I changed to an error summary. This showed me that the errors were limited between 0x16f3381b0-0x16f33fe78. That’s great news! That’s only about 32 kB. If I can just ignore those 32 kilobytes then I should have a working system!
Some surfing the web revealed a stack overflow post detailing that you can configure Linux to not use a section of memory.
From the Linux kernel parameters docs.
memmap=nn[KMG]!ss[KMG]
[KNL,X86] Mark specific memory as protected.
Region of memory to be used, from ss to ss+nn.
The memory region may be marked as e820 type 12 (0xc)
and is NVDIMM or ADR memory.
This memmap syntax marks region as protected so kernel doesn’t use it. Pretty neat! memmap
is usually used to protect memory that other parts of the system not managed by Linux would use. I also saw that it is used for persistent memory which is a niche technology providing a speed-cost target between DRAM and SSDs. Intel’s brand name for this is Intel Optane which I had heard of. The idea is you tell Linux to protect your persistent memory and then it shows up as /dev/pmem[n]
which you can then format with a filesystem or what have you.
On a traditional Linux distro you can edit the kernel parameters by modifying /etc/default/grub
and append you option to the GRUB_CMDLINE_LINUX_DEFAULT
line:
GRUB_CMDLINE_LINUX_DEFAULT="root=UUID=0a3407de-014b-458b-b5c1-848e92a327a3 rw memmap=100M!0x16b3380f0"
Then update-grub
or using grubby
grubby --update-kernel=ALL --args="memmap=100M!0x16b3380f0"
or on an OSTree distro like Fedora Silverblue
rpm-ostree kargs --append="memmap=100M!0x16b3380f0"
I ended protected a much larger portion of memory just to be safe.
badram
There is actually a untility exactly for this scenario! Grub’s badram utility takes a list of addr,mask
pairs and filters them from the memory map that is passed to the Linux kernel. This is super easy because Memtest86+ outputs results this exact syntax for you to copy over.
To do this, add GRUB_BADRAM="addr,mask,[addr,mask],..."
to your grub configuration with all the addr-mask pairs given by Memtest86+.
You can also blacklist ram pages in windows although I haven’t tried it.
Big warning: The command is not allowed* when lockdown is enforced, i.e. UEFI secure boot. As far as I can see the memmap solution works with secure boot.
conclusion
I gave the computer to someone with web-browsing workloads and they have used it for a couple weeks with no troubles.
Moral of the story is that a little bit of faulty ram never hurt anybody, try this before you toss out the old laptop. All jokes aside, I expect the computer to fail within the next year and I will reevaluate the cost-benefit to trying to replace the sticks.