schneiderbox


Erastus on Fly

November 27, 2022

I deployed Erastus to Fly. You can play it at https://erastus-ai.fly.dev.

It’s running on a single shared-cpu-1x, their smallest compute VM. Erastus seems to run with 40K iterations and 1 worker without issue.

The process was pretty simple. Here’s my fly.toml:

app = "erastus-ai"

[build]
  dockerfile = "Dockerfile-fly"

[env]
  MAX_ITERATIONS = 40000
  MAX_WORKERS = 1
  UI_PATH = "/ui/index.html"
  GUNICORN_WORKERS = 1

[[statics]]
  guest_path = "/opt/erastus/ui"
  url_prefix = "/ui"

[[services]]
  internal_port = 5000
  protocol = "tcp"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.http_checks]]
    interval = "10s"
    grace_period = "5s"
    method = "get"
    path = "/actions/0000000000000000000000000xxxxxxxx1"
    protocol = "http"
    timeout = "2s"

[services.concurrency]
  type = "requests"
  hard_limit = 1
  soft_limit = 1

The file is pretty straightforward. The [build] section says that the app is deployed using an image built from Dockerfile-fly. This is just a slightly-tweaked version of the Dockerfile that builds the API, modified to use Fly’s static file feature to serve the UI (instead of relying on a separate Nginx app, like in the Docker Compose deployment).

The [env] block defines the environment variables that limit the playing strength. The [[statics]] block defines a mapping between the URL and filesystem for serving static files (double brackets indicate a TOML array, i.e. you can have multiple blocks with the same name).

I had to tweak a few things to support environment variable-based limits. I initially set up an entrypoint to updated the UI’s config.js with the limits when the app started. However, that wouldn’t work with Fly’s static file serving–apparently those files are pulled separately from the running image, so the entrypoint’s changes weren’t present in the file being served. I ended up creating a /limits endpoing in the API that the UI requests during setup, which is a much superior solution in any case.

The [[services]] block and its sub-blocks define a mapping of Erastus' internal port to external ports. Fly has a built-in system for managing certificates with Let’s Encrypt and redirecting HTTP to HTTPs.

The [[services.http_checks]] block tells Fly to use the /actions endpoint (with the starting position) to check if the app is healthy. The API uses the erastus binary to determine the available actions, so it vouches that the full Python+C stack is working. It is also computationally inexpensive.

I set hard_limit = 1, which means Fly will only send one HTTP request to the app instance at a time, holding others at the proxy level until space opens. This lets us cap iterations based on the VM’s total available RAM, without worrying about multiple erastus processes exhausting memory. Obviously this wouldn’t handle any real traffic very well, but that’s not a concern I have in this case. ;)

I could have mounted a Fly volume to store puzzle submissions, but I decided that the somewhat ad-hoc puzzle system wasn’t worth persisting.

Overall, the entire process was smooth. I like the Dockerfile-based deployment process (a fun fact is they’re not actually using Docker), and the CLI flyctl program is enjoyable to use (and open source!).