Bootclean Whacked My Sockets

Raven

Bootclean Whacked My Sockets

PostgreSQL and MySQL can both use local sockets to handle communication between the client and the server. An indepth description of sockets can be found here, but basically they are special files that act like network connections. The difference is that instead of being available over the network, it’s only available locally. If you only use local connections, make sure you disable the network connections for increased security.

PostgreSQL’s local socket looks something like this:
srwxrwxrwx 1 postgres postgres 0 Oct 17 16:57 /tmp/.s.PGSQL.5432

And MySQL’s looks like this:
srwxrwxrwx 1 mysql daemon 0 Aug 17 17:57 /var/lib/mysql/mysql.sock

Since the local sockets are actually files, they are susceptible to unlinking. The reason this is noteworthy is that there are often some init functions that clean out /tmp, and that’s not something you want to do after you open a local socket.

On Debian servers, /etc/init.d/bootclean contains the cleanup functions, and these functions are called by mountall.sh and mountnfs.sh. If, for instance, mountnfs.sh is linked in the boot process after PostgreSQL, the local socket could be removed. Restarting PostgreSQL will bring it back, but the next time the server boots it will come up broken.

We found a system with mountnfs.sh linked in as S99, and postgresql linked in as S20. After reviewing the other startup scripts, we moved mountnfs.sh to S19 and now the boot process no longer interfere’s with PostgreSQL’s local sockets.

Related Posts Plugin for WordPress, Blogger...

Tell us what you think

  • http://www.sitening.com/ Thomas

    Well, I don’t know (yet) what the Debian maintainers of postgres use as their defaults, but I notice that in contrib/start-scripts/linux, the comments indicate that postgres should use S98, so clearly the default of 20 is too early.

    I agree with your conclusion: trust your distribution and package maintainers and help them implement best practices. Reinventing the wheel for system administration (despite the fun of compiling from source) can be fraught with as much peril as the failure to use CPAN.

  • http://www.sitening.com/ Scott

    After digging a little deeper, I think we’re closer to the root of the problem.

    First, I read this:
    http://www.ida.liu.se/~TDDI05/labs/NFS%20-%20Network%20File%20Systems.pdf

    “There is a bug in the version of UML that we use, that is triggered by mounting NFS volumes too
    early in the boot process. In Debian, the /etc/init.d/mountnfs.sh script is responsible for
    mounting NFS directories. You must reconfigure your system to mount NFS volumes at the latest
    possible moment. The following commands will do the job:
    update-rc.d –f mountnfs.sh remove
    update-rc.d mountnfs.sh mountnfs.sh start 99 2 .”

    So those exact commands are in our build.sh script. I don’t remember why I did that, but I must’ve run into a problem and Googled (recklessly) around for an answer. It looks like the default for mountnfs.sh is 45, which puts it after networking starts up. The default for update-rc.d is 20, which would be too early for mountnfs.sh, and is really too early for postgres and apache.

    So the real answer to your question is that 99 is too late for mountnfs, and the default of 45 is probably what we want. 20 is way too early for postgres or apache, it looks like the debian package for apache uses 91, and I don’t know about postgres but you can find that out.

    The problem, therefore, was incorrect usage of update-rc.d, specifically not researching the proper boot order for postgres and mountnfs, and screwing them both up. update-rc.d’s default of 20 is arbitrary, and not recommended for anything in particular. I don’t know if update-rc.d supports comments in the header of init scripts the way chkconfig does, but that’s just a hack anyway.

    Ultimately, the best practice would be to always install from packages, and not to screw with update-rc.d on a packaged script (in this case mountnfs.sh). This would require us to either use Debian’s postgres package, or to package up our own keeping as close to the original as possible.

  • http://www.sitening.com/ Thomas

    I’d be interested to know how the S99 symlink was created in the first place, but the more interesting question is: is this an administration problem that can be solved?

    A potential answer lies in the value of TMPTIME in /etc/default/rcS. If this is set to -1 (rather than the default of 0), /tmp is protected from bootup cleansing.

    Since this is a problem that seems peculiar to packages on Debian that use /tmp for important things in combination with some of the scripts provided in initscripts being able to be automatically added to /etc/rc.d with update-rc.d or manually added with no warnings about the two-digit sequence code used for boot process ordering, it seems like this might be a safety on the foot-gun of automatic bootup cleanup.

    On a Debian box with initscripts installed, man rcS for more info.