Antarctica Starts Here.

A while back I wrote an article about web applications that can live wherever you can store a file and not necessarily on a web server out of your control. I probably should have posted a link to Google Group dedicated to unhosted applications, but that's neither here nor there. To recap briefly, what I discussed in the previous article are called unhosted communications applications, like social networking or instant messaging software. This begs a crucial question: Assuming that you're running an unhosted application in your web browser, how do you tell other people how to connect to you with their own copies? If an application is running on a server someplace then everybody who visits that server can potentially communicate through it with everyone else connected to it. This is how garden variety web applications operate. But what if running a server won't work for your use cases? I talk a lot about decentralized applications because they're harder to shut down. Central points of failure are too easy to find and take out but peer to peer services are harder to contain by their very nature.

(Disclaimer: I hope I got this bit right. If I didn't, expect future edits.) Let's take everyone's favorite example, BitTorrent. Applications like BitTorrent have two modes of operation: They can connect to one or more centrally located servers called trackers to download .torrent files and advertise their participation in one or more torrents so that other clients partaking of the same torrents can contact them. The other mode of operation involves so-called trackerless torrents, where a distributed hash table takes the place of a tracker. In the BitTorrent DHT every node pseudorandomly picks a hash to identify itself. That hash is computed from addressing information, and by searching the contents of the table it can find other nodes on the public Net which are either running the same torrent or possess addressing information of nodes that are. Nodes which are aware of one another can also ask one another for addressing information which is then locally cached and possibly replicated to other nodes later.
Then there are application protocols which try to be helpful by embedding the IP address of the machine in packets where NAT can't do anything about it because it operates at a lower OSI layer. Other clients tend to throw errors because they think they can't reach you. BitTorrent clients and some instant messaging applications are notorious for these kinds of problems which is why many have a configuration option for entering your public IP address. There is also a protocol called uPNP, which among other things can ask your local firewall to temporarily punch holes through itself so network applications can operate normally. Unfortunately, sometimes the implementation you've got works, and sometimes it doesn't. Smarter applications have ways of detecting the routable IP address they're behind and can configure themselves appropriately. Tor does this, for example.

I think these examples illustrate the point that no matter how you cut it unhosted applications need ways to find to one another when they don't have any pre-existing knowledge of where to start looking. They also need ways to find one another when IP addresses periodically change (thus invalidating local caches) or cannot be reached directly for whatever reason. The solution employed by Torchat relies upon Tor hidden services so clients can locate one another. Every Torchat user is identified with a .onion address which users can then add human readable aliases to (thus squaring Zooko's Triangle). The .onion hostnames of Tor hidden services are public keys, so by communicating exclusively over the Tor network it becomes possible for applications to find one another and dodge firewall problems as well as confirming the identity of the client on the other side (because the hidden service has to have the matching private key for the connection to complete). However, not everybody wants to (or can) install the Tor Browser Bundle.

The BitTorrent tracker model could be used by unhosted apps to find one another - a bunch of services could be set up across the Net that our hypothetical unhosted communication apps use to find one another. Clients would bootstrap themselves by connecting to a known hub (or small group of hubs), collect IP addresses of clients and other hubs and bootstrap their connection networks. Clients could also cross-polinate the URLs of newly discovered hubs which would then begin swapping directory information using either RESTful APIs, reading and parsing feeds like RSS or ATOM, or using a protocol developed for just this purpose like DSNP. All it would need to be is a single PHP script and a database, and maybe a cron job to run a script or two; these are basic features of practically every web hosting package out there, so the barrier to entry is actually quite low. But those services can be blocked or shut down like any other website. If the URLs they use look sufficiently unique it wouldn't be too hard to write a couple of rules to block them en masse. Also, this model relies upon lots of people setting up these hub applications on their web servers and telling people about them. This is definitely possible but grassroots efforts are hard to start.

Bitmessage uses a technique called DNS bootstrapping to put clients in touch with one another. Essentially, the application in question does some DNS lookups on a known hostname (like bootstrap.application.org) when it tries to contact the network. Every time it does so the site's canonical DNS sends it a different IP address which happens to belong to a client that is already part of the network. At the end of this process it has five or six IP addresses which it can contact. Those IP addresses are also cached locally for a certain period of time so it doesn't have to go through this process again the next time it is started. However, this doesn't work when there isn't a central service - somebody has to set up a DNS for this service along with the requisite support software which updates the zone records, and like any other centralized service it would be fairly easy to block. That hasn't stopped this technique from seeing a decent amount of use but it is something to keep in mind.

Ideally some applications could piggyback on top of other services which are harder to censor. For years certain newsgroups (like alt.anonymous.messages on Usenet have been used to exchange encrypted messages; the idea is that everyone can see the cyphertext but only one has the matching private key to decrypt it. Bitcoin can use IRC channels on one of the larger networks to connect clients when it is started for the first time. The Bitcoin client can send specially formatted messages to the channel which catch the attention of other Bitcoin nodes and the responses are used to build local caches of IP addresses. This has the fringe benefit of telling the Bitcoin client what its public IP address really is in the event of NAT, because once connected to the IRC network it can then ask it what its IP address seems to be. In theory just about any public service could be used as a communications channel, from Twitter or other microblogs to semi-anonymous public sites like Pirate Pad or Pastebin. Public profile sites like about.me and Coderwall could become clearinghouses for client connection info because their APIs allow them to exchange information with other sites (and possibly clients that speak HTTP). Wikis would also work for this - case in point, The Hidden Wiki. While it's not actually an unhosted app Jappix uses the "any user can contact any other user on any other server" property of XMPP to piggyback on top of instant messaging services.

Social networking sites could also be used by unhosted apps to post addressing information. An application could post specially formatted updates (maybe similar to magnet links) that tell clients where to contact one another. The problem with this technique is that it would just make more sense to use the social network like everybody else rather than use it to post contact information for your own app. All things being equal, people tend to prefer the path of least resistance. Blogs and content management software like Wordpress, Drupal, and PivotX have plugin frameworks that let the user add additional functionality. It is conceivable that each user of the unhosted application could install a plugin on their homepage that other clients connect to, effectively making the URL to your website part of your user ID. The user's client, when it starts up, pings a certain page on their website which includes some unique information similar to an API key that gives it write access to a table in the database rather than read-only access. Additionally every instance of the unhosted client knows to visit a certain page on the website of every person in their contact list (like /unhostedclient) whenever it starts up to look for your current location. For purposes of privacy read-only keys could be distributed whenever someone is friended ("Click on this link to my homepage to add me to your friends list.") I thought I was pulling this out of my ear but it seems that work is underway on such a thing for Wordpress already. A potential downside of this is that people who opt to use services like Blogger or wordpress.com are usually not be able to install plugins, so unless our hypothetical CMS plugin is offered as a feature chances are they'd be out of luck.

It also may be the case that the IP addresses of some users of unhosted applications may not change very often often. Every unhosted communications app can and should store the last known IP addresses of every other instance it knows about and try those before falling back to some combination of other measures to find additional clients. This is what many of the new generation of distributed applications do, and it seems to work well for them. It also seems reasonable (though it would also constitute a potential privacy leak) for clients that are one degree of separation apart to exchange the IP addresses of clients they have in common to strengthen the network. For example, if Alice is on the friends lists of Bob and Charlie, it would make sense for Alice to give Bob Charlie's address, and Charlie Bob's address so that they could contact one another directly or pass their IP addresses along to other clients. In the former case it would be advisable for this to happen if Alice was on the friends lists of both Bob and Charlie, meaning it was conceivable that each would see reposts of and comments by the other. The latter case would make the network stronger by making it easier for clients to find one another on a "just in case" basis, but also leaks actionable information to passive observers in the form of who seems to know whom.

Anyway, these are some of the things that I think about when I can't sleep at night. Bootstrapping networks is a hard problem when you can't even begin to guess where other nodes might be. There are ways to do it, but fewer ways to do it efficiently, and just a handful of methods appropriate to each type of application being discussed. The Bitcoin method probably isn't suitable for an unhosted social network, and the BitTorrent techniques seem pretty wasteful for an instant messaging application, to name some examples. Not all techniques are good for every kind of application, and it is entirely possible that some of them aren't feasible in the context of unhosted applications due to the limitations inherent in the implementations of JavaScript in modern web browsers. There may also be size limitations in the persistent storage functions of modern browsers which may render them unsuitable for some applications.