Optimizing Searx with UWSGI.

Long time readers have probably read about some of the stuff I do with Searx and I hope that some of you have given some of them a try on your own. If you have you're probably wondering how I get the performance I do because there are some limitations of Searx that have to be worked around. Most of those limitations have to do with the global interpreter lock that is part of the Python programming language which haven't been completely solved yet. What this basically adds up to is that multithreading in Python doesn't actually make great use of systems that have more than one processor core in them (which is most of them these days). The easiest way to work around this limitation is to start up multiple copies of the process so that they can run in parallel. It's not always obvious how to do that, though.

A brief digression: There is a thing called WSGI, Web Server Gateway Interface, which is a protocol by which a web server on the front end can relay requests to a web application on the back end in a much faster way than just proxying the requests. Not all web servers speak WSGI natively but there is a nifty workaround: There are some implementations of WSGI which have a super-optimized, dedicated purpose HTTP server built in, so that reverse proxying web servers can be used without needing extensive reconfiguration or additional software. Or in this case, without needing a front-end webserver at all to communicate with the application.

I use an implementation called uWSGI alongside Searx to get the best of both worlds (as well as web searches that run like a ferret on meth). No modifications have to be made to Searx so I don't have to maintain my own fork. Due to some folks I've run into who haven't been able to get the official installation instructions working, I've decided to add my own, which I use regularly to good effect (user output elided for clarity):

{17:56:22 @ Sat Feb 13}
[drwho @ windbringer ~] () $ git clone https://github.com/asciimoo/searx
Cloning into 'searx'...

{17:56:33 @ Sat Feb 13}
[drwho @ windbringer ~] () $ cd searx
{17:57:01 @ Sat Feb 13}
[drwho @ windbringer searx] () $ python3 -mvenv env
{17:57:06 @ Sat Feb 13}
[drwho @ windbringer searx] () $ . env/bin/activate

(env) {17:57:41 @ Sat Feb 13}
[drwho @ windbringer searx] () $ pip install --upgrade pip
...

(env) {17:57:13 @ Sat Feb 13}
[drwho @ windbringer searx] () $ pip install -r requirements.txt
...

# Here's where you configure Searx.
(env) {17:58:04 @ Sat Feb 13}
[drwho @ windbringer searx] () $ vi searx/settings.yml

I know the conventional wisdom is to install an OS package of uwsgi but I have never, ever gotten it to work. What I do instead is install the uwsgi package from Pypi:

# First install some dependencies.
# Yes, I know this is on Windbringer.  I'm working from my notes.
(env) {18:03:27 @ Sat Feb 13}
[drwho @ windbringer searx] () $ sudo apt-get install build-essential python python-dev
...

(env) {18:03:27 @ Sat Feb 13}
[drwho @ windbringer searx] () $ pip install uwsgi
...

Now we need one more thing: A config file for uwsgi. In the ~/searx/ directory create a file called searx.ini:

[uwsgi]

; Spin up 12 copies, 2 for each processor.
workers = 12

; Each uwsgi subprocess starts a single Python interpreter.
single-interpreter = true

; Start a master uwsgi process that manages the subprocesses.
master = true

; Enable Python support.  Yes, you have to do this even though uwsgi is
; written in Python.
plugin = python

; Start a copy of the Searx worker subprocess.
lazy-apps = true

; Enable threads in the subprocesses.  This seems to make some of Searx's
; engines more stable.
enable-threads = true

; This is the Python module path to the code to start up, relative to where
; this file is found.  In this case searx.ini is in ~/searx so the path is
; searx/webapp.py.  For more information, read up on how Python generates
; its module paths.
module = searx.webapp

; Where the virtualenv lives.
virtualenv = /home/drwho/searx/env/

; Where Searx was checked out to.
pythonpath = /home/drwho/searx/

; The directory where the guts of Searx are.
chdir = /home/drwho/searx/searx/

; Start the HTTP server, listen on the loopback interface, port 8888.  This
; is what your web browser (or agents) connect to.
http-socket = 127.0.0.1:8888

; Don't log anything.
disable-logging = true

; Always route requests to workers without condition checking.
; Sets the script name without altering the path info.
route-run = fixpathinfo:

; Later commits of Searx require that you configure an in-memory cache.
; This is the uWSGI new generation cache subsystem.
; Name the cache 'searxcache'.  Store a maximum of 2000 items in the cache.
; Set the number of blocks in the cache to 2000, each block is 4k in size.
; Create an internal bitmap that keeps track of whether or not a block has
; been used to store something.
cache2 = name=searxcache,items=2000,blocks=2000,blocksize=4096,bitmap=1

Let's give it a test drive:

(env) {19:14:37 @ Sat Feb 13}
[drwho @ windbringer searx] () $ uwsgi --ini searx.ini 
[uWSGI] getting INI configuration from searx.ini
...

If you open http://localhost:8888/ in a web browser, you should see the Searx frontpage. If you run a couple of test searches you should see not only search results but they should be really snappy. With this setup I typically run a couple of dozen automated searches with Huginn per minute, without much in the way of slowdown. But now we need an easy way to start up Searx. Here's my shell script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/usr/bin/env bash

SEARX=/home/drwho/searx

# Change to the Searx installation directory.
cd $SEARX

# Initialize the Python virtual environment.
. env/bin/activate

# Start up Searx.
uwsgi --ini searx.ini

And to automate startup and shutdown, here's my ~/.config/systemd/user/searx.service systemd service file:

[Unit]
Description=Searx

[Service]
Type=simple
WorkingDirectory=%h
ExecStart=%h/searx.sh
Restart=on-failure

[Install]
WantedBy=default.target

Happy hacking!