Organizing a data hoard with YaCy.

Feb 02 2019

 It should come as little surprise to anyone out there that I have a bit of a problem with hoarding data.  Books, music, and of course files of all kinds that I download and read or use in a project for something.  Legal briefs, research papers (arXiv is the bane of my existence), stuff people ask me to review, the odd Humble Bundle... So much so that a scant few years ago I rebuilt Leandra to better handle the volume of data in my library.  However, it's taken me this long to both figure out and get around to making it easier to find anything in all that mess.  If I can't find it, I can't do anything with it, or even figure out what I do or don't have.  I also don't often have console access so it's not as if I can SSH in and grep for what I need.  I use Nginx as a web server on Leandra so actually getting access to files when I need them is trivial.

Click for the rest of the article...

Sometimes the old ways may be best.

Feb 02 2019

A couple of weeks back, I found myself in a discussion with a couple of friends about searching on the Internet and how easy it is to get caught up in a filter bubble and not realize it.  To put not too fine a point on it, because the big search engines (Google, Bing, and so forth) profile users individually and tailor search results to analyses of their search histories (and other personal data they have access to), it's very easy to forget that there are other things out there that you don't know about for the simple reason that they don't show stuff outside of that profile they've built up.  If you're a hardcore code hacker you might find it very difficult to find poetry or the name of a television show you saw once unless you take fairly drastic action.  The up-side of this profiling is that, inside of your statistical profile search results are great.  You can find what you need, when you need it.  But outside of that?  Good luck.

The point of the discussion was that there were ways that we could escape this filter bubble through application of self-hosted software and a little cooperation.

Ironically, searching through my conversation history I can't seem to find the thread in question so I'm relying entirely upon on-board storage (as it were).  So, go ahead and laugh while I geek out.  First, a little bit of Internet history.

Click for the rest of the article...

A friendly introduction to the Fediverse.

Jan 04 2019

If you've been kicking around on the Net for the past year or so, you've probably come across a thinkpiece or two about Mastodon, an open source social network that's kind of like Twitter, kind of like Facebook, and kind of like... well, nobody's really sure what else would fit there.  It's a bit of a wildcard.  That seems to throw a lot of people, and because this is the Internet we're talking about that means a lot of "this could never possibly work" posts, nevermind a busy network of several thousand instances and several hundred thousand users doing everything from venting their spleens to asking for (and surprisingly oftentimes receiving) assistance, collaborating on projects, goofing around, and mourning their fallen...

This ambiguity and confusion makes it hard to understand why you'd even want to consider joining yet another social network.  Let me see if I can help a little.

Click for the rest of the article...

The Doctor's favorite podcasts of 2018.ev.

Jan 04 2019

I know this is kind of late, but I thought I'd put together a list of the podcasts I enjoyed listening to in 2018.ev, in the hope of introducing folks to the work of some really talented people:

Weird Things

Roleplaying Public Radio's Actual Plays

The Neo-Anarchist Podcast, an in-character ongoing series set in the world of Shadowrun.

On Her Majesty's Secret Podcast.  More about James Bond than you thought it was possible to know.

The Black Vault

The Secret Broadcast



Jan 01 2019

Happy New Year, everyone.

Systembot: Adventures in system monitoring.

Dec 28 2018

If you've been following the development activity of Systembot, the bot I wrote to monitor my machines (physical as well as virtual) you've probably noticed that I changed a number of things around pretty suddenly.  This is because the version of Systembot in question had some pretty incorrect assumptions about how things should work.  For starters, I thought I was being clever when I wrote the temperature monitoring code when I decided to use what the drivers thought were high or critical values for sending "something is wrong" alerts.  No math (aside from a Centigrade-to-Fahrenheit conversion), just a couple of values helpfully supplied by the drivers by way of psutil (which is a fantastic module, by the way; I don't play with it enough).  This was hunky-dory until Leandra started running a backup job and her CPU temperature spiked to 125 degrees Fahrenheit while encrypting the data.  125 degrees isn't terribly hot as servers go, but the lm_sensors drivers seem to disagree.  Additionally, my assumptions of how often to send the "high temperature" alerts (after every four cycles through the "do stuff" loop) were... naive? Optimistic?

Let's go with optimistic.

What it boiled down to was that I was getting hammered with "temperature is too high!" warning messages roughly six times a second.  Some experiments with changing the delay were equally optimistic and futile.  I bit the bullet and made the delay-between-alerts configurable.  What I have yet to do is make the frequency of different kinds of warning events configurable, because right now they all use the same delay (defined in time_between_alerts).  Setting this value to 0 disables sending warnings entirely.  This is less suboptimal at best but it's not waking me up every few seconds so I think it'll hold for a couple of days until I can break this logic out a little.

The second assumption that came back to bite me (hardcoding values until something like this happened aside) was that alerting on 80% of a disk being in use without any context isn't necessarily a good idea.  My media server at home was also chirping several times a second because one of the hard drives is currently at 85% of capacity.  This seems reasonable at first scratch but when you dig a little deeper it's not.  85% of capacity in this case means that there are "only" 411 gigabytes of space left on a 4 terabyte hard drive.  Stuff doesn't get added to that drive very often, so that 400+ gigs will last me another couple of months, at least.  There's no reason to alert on this, so making this value a parameter in the config file buys me some time before I have to buy another hard drive.

Click for the rest of the article...