cross-posted from: https://lemmy.sdf.org/post/43150819
I have a Lenovo ThinkCentre M900 Tiny (10FL) which has been working fine. Recently, I had an HDD failure, and replaced it with an SSD. This time, I decided to go with ZFS (single drive, kinda pointless, but I get scrubbing). Every once in a while, I’m getting errors while scrubbing. It’s always 1-2 read or write errors. And it never reappears if I clear the error and run another scrub.
The data isn’t important, and it’s backed up, so I’m not too worried about it. But, the symptoms make me think that it’s an issue of the SATA port inside? Is it possible to replace the SATA port inside this device. I wasn’t able to find anything like part number etc. online and it looks like I need to replace the whole board. Any help is appreciated.
Isn’t the data port plugged in via a flex cable? Maybe check the connections first, unplug and replug. Make sure they are clean.
Don’t think so. This is what the inside looks like. I don’t see any SATA cable. But I’m not very knowledgeable when it comes to hardware, so I might be wrong.
You are right, this generation does have a direct connection. Have you tried cleaning the connector?
Yeah, but I’ll try doing it again.
Its possible usually, but soldering/desoldering on motherboards can be quite tricky, they often have substantial grounds that wick away heat.
Why do you think its the physical port?
soldering/desoldering on motherboards can be quite tricky
The I guess it’s not gonna work for me. I’m not much of a hardware guy, and have only (badly) soldered a few times.
It’s a new SSD, no SMART failures, and it’s always read/write errors. I’m not an expert, but I think if the SSD was faulty, I’d get some checksum errors. I don’t think any other parts like RAM and CPU are involved since there’s another 4-drive DAS attached to the machine over USB, and it’s been running ZFS with RAIDZ2 without any issues for more than a year. So that just leaves the port.
Also, the scrub timings seem to be pretty bad. When I bought the SSD, a scrub would take around 15 minutes. But now it’s taking around 3 hours. The performance is the same in normal use.
Could be a few other things. Power issues, motherboard chipset, SATA cable. I’d try replacing the sata cable first, but I dont think I’d get too fixated on the port itself.
If its your first time, dont do it. You’ll kill the port with too much heat, or lift a pad. Not worth the risk.
Sorry, but what do I replace here? This is what the inside looks like. I don’t see any SATA cable.
Oh, I see, its directly attached. That is even worse to try remove. Do other disks have errors in the slot? Any motherboard firmware updates available?
The last disk (an HDD) failed (it’s failing outside too, so not just due to this, but this may have contributed). I don’t have any more drives to test this out. And this happens once every month or so, so it’s pretty hard to test it by putting this SSD inside another device.
I don’t have any more drives to test this out
you have a drive from a budget-tier vendor. it could very easily a glitchy drive. the only way to know is to replace it with something else.
Do you have reliable power? An issue that infrequent doesnt sound like a physical problem, usually they are a lot more on/off kinds of problems.
I would think so. I also have another M.2 and a DAS with 4 HDDs attached to it. They run with no issues at all.
Have you checked the SMART value 199/0xC7 “UltraDMA CRC Error Count”? This should tell you how many checksum errors happened on the SATA interface between the CPU and the drive. If this is higher than zero your hypothesis is correct and there’s something bad with the connection, if it’s zero then the problem is more likely to be elsewhere.
Seems to be zero. Here’s the whole smart output in case anything pops out.