My server (fedora) stops all podman containers after 2-3 hours since 3 days. I can start all containers again, and the same happens after a while. I do not know where to look for the problem.

In top, I found a oom message. I assume that the system runs out of memory and stops all services. How can I find the problem? I can’t find anything in the container logs.

I can see that systemctl status is always starting. It doesn’t become “running”. But I do not know how to proceed.

  • neidu2@feddit.nl
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 months ago

    The issue with diagnosing memory issues is that it usually results in no memory available to handle the logging of such a problem when it happens.

    I’ve found that the easieat approach is to set up a file as additional swap space, and swapon, then see if the problem disappears, either partially or fully.

  • Successful_Try543@feddit.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    When I had the issue with mariadb demon been killed, I think either in dmesg or syslog there was an entry reading "Out of memory: Kill process… " or similar.

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    2 months ago

    If you saw an OOM anything, it’s getting OOMkill’d by the kernel trying to keep the machine up. Check syslogs and dmesg, and it should say what was killed, and there’s your problem container. You probably have a memory leak, so just check your container stats every so often and see what is growing out of control with memory usage.

    Enable swap regardless. Would also help to know what you’re running.

  • iluminae@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 months ago

    Are you running them from your user session? If so, when you log out it will stop your processes, unless you have enabled ‘linger’ mode.

  • Diplomjodler@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 months ago

    I would start all containers except one. If everything works that one is the cause of the problem. Keep trying with a different container every time.