Searching Bookstack with SearxNG.

13 November 2024

Note: I used the tag 'searx' for this post even though I've been using SearxNG for quite a while. There's enough compatibility between the two that the stuff I've written (so far) will work. However, I haven't decided if it's worth the hassle of changing the tag and possibly making things harder to find.

A constant problem when you have a sizeable external memory is finding what you need, when you need it. It's a problem that I've been poking at for a while and, which I probably don't have optimal solutions I've found a couple that work well enough for me, and hopefully might help a few other folks.

A while ago I started using Bookstack for my personal wiki, and I've quite fallen in love with it. It's everything I need, just about nothing that I don't, and runs nicely in shared hosting. I've also been using Searx, and later SearxNG as both a unified search engine and a search API. I've written a bit about how to interface things with Searx over the years; a certain amount of trial and error is involved because, while SearxNG has excellent facilities for connecting to stuff they're not always well documented. So, let's start at the top:

Bookstack's REST API includes access to its internal search function. You have to set an API token and secret per the documentation. The way that you use those two things is that you send them as part of an HTTP Authentication header in your code (whatever that happens to be). It would look something like this if you did it on the command line:

curl -XGET -vv -H "Authorization: Token <API token>:<API secret>" https://bookstack.example.com/api/search?query=thing%20you're%20searching%20for

Note that your search term is URL encoded. A positive search result is a JSON document that looks like this:

{
    "data": [
      {
        "id": 985,
        "name": "Shaarli Bot",
        "slug": "shaarli-bot",
        "book_id": 8,
        "chapter_id": 0,
        "draft": false,
        "template": false,
        "priority": 184,
        "created_at": "2024-02-18T22:19:55.000000Z",
        "updated_at": "2024-07-08T22:38:17.000000Z",
        "url": "https:\/\/bookstack.example.com\/books\/projects\/page\/shaarli-bot",
        "type": "page",
        "tags": [],
        "preview_html": {
            "name": "<strong>Shaarli<\/strong> Bot",
            "content": "...tamp of the JWT.  UTC.  time_t format.  Only good for nine (9) minutes.\n{\n&quot;iat&quot;: time_t datestamp\n}\n\n\n* Base64 of  +\n* A single period (.) + \n* Base64 of \n\nSeriously, just use a library to do it.  It&#039;s easier.\n\nhttps:\/\/<strong>shaarli<\/strong>.github.io\/api-documentation\/\n"
        }
    },
    ...
    ]
}

As search results go, this is really straightforward to parse. We can send this to SearxNG's json_engine, which doesn't have the greatest docs but is pretty easy to figure out with a little trial and error. Basically, you give it a JSON document, tell it where the essentials live in the JSON, and make sure it's enabled. Here's the config block that I use (with, I should add, a few directives that I don't for documentation purposes):

# The name is arbitrary and must be unique.
- name: bookstack

  # Use the json_engine to do the thing.
  engine: json_engine

  # This is also arbitrary and must be unique.
  shortcut: bs

  # Time in seconds before SearxNG gives up.
  timeout: 120

  # If anything goes wrong, display the error in your browser.
  display_error_messages: true

  # Categories this search will appear in.
  # More than one category must appear in a [ python list], as below.
  # Single and double quotes can be used but aren't mandatory.
  categories: [ external memory ]

  # Search results can be returned one page at a time.
  paging: true

  # URL of the Bookstack instance's search API.
  #   {query} == URL encoded search term
  #   {pageno} == page number of search results.  Defaults to 1.
  search_url: https://bookstack.example.com/api/search?query={query}&page={pageno}

  # In the JSON document returned from the API, the key or path the search
  # results can be found under.  For Bookstack, this is 'data' but can be
  # left out if it doesn't apply.
  # This corresponds to the JSONpath
  #     $.data
  results_query: data

  # Where inside the JSON document returned by Bookstack the URL to a search
  # hit can be found.
  # Mandatory.
  url_query: url

  # Where inside the JSON document returned by Bookstack the title of a page
  # with a search hit can be found.
  # Mandatory.
  title_query: name

  # Where inside the JSON document returned by Bookstack the matching text
  # of a search hit can be found.  Tracing a path to a key is done in the
  # form key_1/key_2/.../key_n
  # The example below is the equivalent of the JSONpath
  #     $.data.*.preview_html.content
  # or, relative to $.data.*,
  #     $.preview_html.content
  content_query: preview_html/content

  # The maximum number of search results per page.
  number_of_results: 20

  # Is this thing on?
  disabled: false

  # An optional set of HTTP request headers.
  # In this case, authentication to Bookstack's REST API.
  headers:
    # Name of header: Authorization
    # Value of header:
    #   The word "Token"
    #   Your Bookstack API token
    #   A colon (":")
    #   Your Bookstack API secret
    Authorization: Token <API token>:<API secret>

  # Optional information about the thing being searched.  You can leave this
  # stuff out if you want but it does help document things for later.
  about:
    # URL to the thing being searched, or the thing's homepage.
    website: https://bookstack.example.com/

    # Wikidata ID code for the thing being searched.
    # https://www.wikidata.org/wiki/Wikidata:Main_Page
    wikidata_id: Q107122654

    # URL to the official search API documentation of the thing.
    official_api_documentation: https://demo.bookstackapp.com/api/docs#search-all

    # Using an official API or something else?
    use_official_api: true

    # Does the API require a key?
    require_api_key: true

    # file format of the results.
    results: JSON

You can create new categories of search engine by just using them, but they won't show up on the SearxNG page unless you add them to the categories_as_tabs: list in searxng/searx/settings.yml.

The upshot of all of this? Search for "!bookstack nfc rfid":

  • Search engine: bookstack
  • Search terms: "nfc rfid"

...and the results you should see (if you have anything about NFC or RFID in your wiki) is a list of pages in your wiki that mention RFID and NFC.