Maybe we should start over.

28 December 2023

There is a conspiracy theory online called the Dead Internet Theory. So the story goes, some years ago people - actual, organic people sitting at keyboards or holding phones - stopped posting anything, anywhere online. Depending on who you talk to (and this includes credentialed folks who study various aspects of the Net, not just denizens of image boards or random users on forums), the proliferation of spambots, botnets, folks who use bots to age Twitter accounts to sell (link anonymized) for various purposes (like astroturfing) and SEO shenanagains effectively pushed organics out through sheer numbers. One person can use custom software to run thousands of accounts simultaneously with relatively simple algorithmic text generation. I would be remiss if I didn't mention one of the cypherpunks' theories (that was eventually revealed as true), automated perception management systems for the purpose of social engineering on a large scale. More recently, LLMs (large language models) like ChatGPT have been incorporated into the Dead Internet Theory because people really are using them to generate entire websites for SEO purposes.

The problem is this: If it was a conspiracy theory before, it isn't now.

Since ChatGPT and its work-alikes went general use there has been an explosion of websites that look professionally developed, are chock full of text, and are all over the first two pages of Google results (and who will really admit to looking at more than the first five hits?)... and that text is all useless. Jokes about some entries in Lenovo's online glossary aside, it's gotten very difficult to find websites with actual hard information on them. One of the problems with LLMs is that they're not actually intelligent as people ordinarily use the term. They are the product of gargantuan amounts of fancy math done on terabytes of data gathered from the Net, more fancy math done in realtime to match what a user types to what is likely to have something to do with some of the data in the LLM's core, and a not-so-clever euphemism for using a random number generator to shake things up a little bit (colloquially called "heat"). This combination of statistics and randomness can result in something that might be useful but is equally likely to be complete and total jetwash because LLMs aren't great at knowing what they don't know. The term for this phenomenon that gets used a lot today is 'hallucination' but I think 'confabulation' is a better fit because that implicitly includes pulling something out of one's ass. Couple that with the general pushback against adblocking, and we're looking at nothing good.

Something I've learned over the years is that learning about history provides context for why things are the way they are. To put it another way, the now came from then, and learning how then became now can suggest changes or alternatives. I've said before that perhaps we should consider looking at the old ways to figure a way out of this situation. So, a little history lesson.

September of 1993.ev went down in history as the September that never ended. That was when AOL opened its links to the rest of the Net and unleashed millions of customers who had no idea how to act online because there were no resources to tell them that there were social norms it would be really nice for them to follow.1 It wasn't really associated with Eternal September but not long after that, Yahoo! was founded. Before it turned into the world's largest provider of spamtrap e-mail addresses Yahoo! provided the Net with a gargantuan directory of websites, curated by users, organized into a hierarchy of topics, and a bit later the Yahoo! search engine came online. A few years after that, in 1998.ev something called DMOZ (a contraction of appeared. During a time when personal websites were still a thing3 DMOZ was in many ways like a personal website's link page on steroids. The kicker at DMOZ was its carefully organized ontology - its formal categories and evaluation criteria thereof, a hierarchy of categories and sub-(sub-sub-sub-...)categories, and the relationships between them. The idea was that you could wander around in the directory of links and stumble across stuff you didn't know existed. Unsurprisingly, DMOZ no longer exists (though what is in many ways a re-implementation, called Curlie exists and is fairly active these days).

That catches us up to the present. Things kind of suck right now. What to do about it?

Lori over at was thinking about this a couple of months back and made an observation: We could do this by just bringing back link pages. She called the idea New Yahoo and the concept is simplicity itself. To have a directory of useful web pages, you first need curated links to a lot of web pages - this is where personal sites' link pages come in. Then, and this is the important bit, don't over-engineer it. In fact, don't engineer it at all, just let it come together.4

It is a truism in software development that the data suggests the algorithm. If you know what the data looks like - how it's structured, what the data types of the constituent elements are, how it's intended to be accessed most of the time - actually writing the code that does stuff with and to the data is much more straightforward. It doesn't make much sense to try to architect a distributed New Yahoo directory from the get-go so let's see what shakes out before we try anything serious. If a webring winds up being the most effective way of building a directory, then why not? If the #NewYahoo hashtag makes for a good data source, let's start with that. Maybe a Shaarli plugin will come out of it.

Maybe it's time to start over.

  1. There were really only two ways to learn this: Reading newsgroups' FAQs and the hard way. 

  2. The corporate history of DMOZ (purchased by Netscape, which was then acquired by America On-Line) isn't really relevant here. 

  3. If you haven't noticed that my website's been a lot more snappy lately, I moved all of my stuff to A2 Hosting a few months ago. While A2 Hosting has been pretty awesome they at first thought I was using them for cloud storage or backups or something because my website is so big. It took some time to convince them that no, my website really is several dozen gigabytes in size because I've had a website for 23 years as I write this article (my site started off as a bunch of photographs I took at H2k). This is not to throw static at A2 - that is how uncommon having your own website is these days. Uncommon enough that a company that specializes in hosting websites doesn't expect someone... to have an actual website. 

  4. "We reject: Kings, presidents and voting. We believe in: Rough consensus and running code."