Monday, August 9, 2010

Recovering a corrupted OpenLDAP database

Sometimes, OpenLDAP (a key component of Open Directory) will try to launch, and will find itself with a corrupted database. Naturally, when your LDAP server (slapd, which I always imagine as a daemon that constantly slaps you in the face) can't start, everything that relies on it will fail. That's usually bad.

You see errors like this:

8/9/10 12:11:55 PM com.apple.launchd[1] (org.openldap.slapd[547]) Exited with exit code: 1 
8/9/10 12:11:55 PM com.apple.launchd[1] (org.openldap.slapd) Throttling respawn: Will start in 10 seconds


If you run slapd in Tool mode, you can figure out the exact problem, and it'll probably look something like this:

$ /usr/libexec/slapd -Tt
bdb(dc=example,dc=com): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30978. Restore from backup!
bdb(dc=example,dc=com): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30978)


The problem is that the Berkeley DB that holds all of OpenLDAP's information has become corrupted. It happens, and it's what the db_recover command is for.
First things first: make sure slapd is not running before trying to recover the database.

$ sudo launchctl unload /System/Library/LaunchDaemons/org.openldap.slapd.plist

By default, db_recover uses the current working directory as the home for the database environment. You can specify that with the -h flag, or you can just go there first.

$ cd /var/db/openldap/openldap-data

You should probably back up the contents of this folder before continuing. You can just copy the openldap-data folder to another folder, or use tar, but you should back it up. Just in case.

Now use db_recover to fix whatever is broken.
$ sudo /usr/bin/db_recover

Once that's done (and it shouldn't take long), run slapd -Tt again to make sure it did the trick. It should just tell you that it verified the config file. Reload the slapd launchdaemon, and (hopefully) it will launch.

$ sudo launchctl load /System/Library/LaunchDaemons/org.openldap.slapd.plist

I should point out that you should absolutely be taking periodic archives of Open Directory. You can do that from Server Admin, or you can script it using the serveradmin command, but you need to do it. The OpenLDAP database isn't the only thing that can get corrupted, and sometimes restoring from an Open Directory Archive is the only way you'll get things working again.

2 comments:

  1. hi i am facing problem at starting the ldap service...i am seeing this issue for late 2 days...while starting ldap, i am seeing this issue


    root@localhost ~]# service ldap start
    /var/lib/ldap/id2entry.bdb is not owned by "ldap"[WARNING]
    /var/lib/ldap/dn2id.bdb is not owned by "ldap"[WARNING]
    /var/lib/ldap/__db.004 is not owned by "ldap"[WARNING]
    /var/lib/ldap/__db.006 is not owned by "ldap"[WARNING]
    /var/lib/ldap/__db.005 is not owned by "ldap"[WARNING]
    /var/lib/ldap/__db.003 is not owned by "ldap"[WARNING]
    /var/lib/ldap/__db.001 is not owned by "ldap"[WARNING]
    /var/lib/ldap/__db.002 is not owned by "ldap"[WARNING]
    Checking configuration files for slapd: bdb_db_open: alock package is unstable
    backend_startup_one: bi_db_open failed! (-1)
    slap_startup failed (test would succeed using the -u switch)
    [FAILED]
    stale lock files may be present in /var/lib/ldap[WARNING]
    [root@localhost ~]#

    ReplyDelete
    Replies
    1. I'm assuming you're running a flavor of Linux, because OS X Server doesn't have a service command. I only bring this up because if you're looking at the commands above and thinking about trying any of them, you should know that the paths are all different.

      I think the warnings are telling you all you need to know. The database files that openldap uses need to be writable by the user running slapd, which is (I assume, based on the error) the ldap user.

      Something like this:

      $ sudo chown -R ldap /var/lib/ldap

      should get it running again, or at least eliminate one possibility. But I don't have a linux box running openldap handy, so I can't vouch for what else is in there. Now's a good time to check your backups. You have backups, right?

      Delete