Technomancer Tools: YaCy

Oct 28 2017

If you've been squirreling away information for any length of time, chances are you tried to keep it all organized for a certain period of time and then gave up the effort when the volume reached a certain point.  Everybody has therir limit to how hard they'll struggle to keep things organized, and past that point there are really only two options: Give up, or bring in help.  And by 'help' I mean a search engine of some kind that indexes all of your stuff and makes it searchable so you can find what you need.  The idea is, let the software do the work while the user just runs queries against its database to find the documents on demand.  Practically every search engine parses HTML to get at the content but there are others that can read PDF files, Microsoft Word documents, spreadsheets, plain text, and occasionally even RSS or ATOM feeds.  Since I started offloading some file downloading duties to yet another bot my ability to rename files sanely has... let's be honest... it's been gone for years.  Generally speaking, if I need something I have to search for it or it's just not getting done.  So here's how I fill that particular niche in my software ecosystem.

Building your own Google Alerts with Huginn and Searx.

Sep 30 2017

A Google feature that doesn't ordinarily get a lot of attention is Google Alerts, which is a service that sends you links to things that match certain search terms on a periodic basis.  Some people use it for  vanity searching because they have a personal brand to maintain, some people use it to keep on top of a rare thing they're interested in (anyone remember the show Probe?), some people use it for bargain hunting, some people use it for intel collection... however, this is all predicated on Google finding out what you're interested in, certainly interested enough to have it send you the latest search results on a periodic basis.  Not everybody's okay with that.

A while ago, I built my own version of Google Alerts using a couple of tools already integrated into my exocortex which I use to periodically run searches, gather information, and compile reports to read when I have a spare moment.  The advantage to this is that the only entities that know about what I'm interested in are other parts of me, and it's as flexible as I care to make it.  The disadvantage is that I have some infrastructure to maintain, but as I'll get to in a bit there are ways to mitigate the amount of effort required.  Here's how I did it...