Fixing YaCy?

23 December 2020

A couple of weeks back I decided to upgrade the YaCy installs running on Leandra to the latest supported versions, because they'd been lagging behind for a while. Due to the fact that they're enterprisey Java web applications and I can't readily get hold of any live chickens to sacrifice, I'd been putting it off as much as possible.

As it turned out, the lack of sacrificial barnyard fowl wound up being a crucial factor in how things transpired.

The first install that I upgraded was an install from source code and was indexing my personal library. It got re-indexed every night so I wasn't too concerned about losing that set of databases. The (admittedly minor) problem I ran into was that I was no longer able to get YaCy to index the library running on Leandra. The formerly working configuration made it work upwards in the site (not downwards), which resulted in nothing I wanted being indexed, and everything I didn't getting swept up. After a couple of days of tinkering with that install, I kicked it in the head and replaced it, but that's a post for a different day.

The other YaCy install, the one I used to index the Net wherever parts of me go, I'd installed from Arch Linux's AUR repository - the unofficial packages contributed and maintained by the community. The rebuild and upgrade went as expected, and I figured everything was fine. Until I woke up one morning about a week later to discover that every last one of my Huginn job workers was locked up solid. As it turned out, it seems there's a bug in YaCy v1.922 which causes the user webapp itself to stop responding. It also locks up the API, so that no indexing or search requests work. This seems to have hit a corner case in my job workers and caused them to freeze. I had to manually delete those frozen "submit this URL to YaCy's indexer" jobs from the database and then disable the YaCy-related agents to get everything running smoothly again. That procedure is a separate post in itself, though.

Off to the bug tracker. A couple of folks had mentioned having similar problems on different platforms, and I was able to figure out what, roughly, was going on. Unfortunately, none of the suggestions anyone made worked. After some failed attempts at downgrading, swearing, and head scratching I managed to get Leandra's existing copy of YaCy running and responsive enough to trigger a database dump for safekeeping. Truth be told, it was a matter of having everything I needed in advance, starting up that YaCy install, and being fast enough to log in and hit every page that needed to be hit to start the index backup. Thankfully, the index backup ran to completion even after the front-end and API webapps locked up again.

Once I was sure it was ready, I decided to do a brand-new, unpackaged, me-built installation of YaCy using the latest and greatest clone of the repository on Github. The build itself was nothing to write home about. The front-end web application seems to not go comatose anymore. However, now the API doesn't work: It ignores any and all authentication attempts so I still can't submit URLs for indexing. There is an existing ticket for this bug but the person who opened it closed it a few days later with no explanation or solution. I'm probably going to open a new one to see if I can get some traction on it.

I'm limping along as best I can right now. I can still run searches on the index but growing the index has been significantly delayed until I can get everything worked out.