Zioth
More pages

PathAI interview thoughts

My brain doesn't like leaving problems unsolved.

On Friday night, I figured out what I think was a reasonable DB architecture and system design for your interview question, that did away with my quad tree rabbit hole and switched to simple image tiles, like Google Maps uses. At that point, I let it go, disappointed that I hadn't been able to solve the problem earlier.

This morning, however, I realized that the problem could be much simpler than I thought.

When PathAI's algorithms analyze an image, they need access to every annotation, but a user only cares about their own annotations. If we assume that the average annotation is 256 bytes, and the maximum number of annotations created by one user is 256, then we're only talking 65k of data at most. This is small enough to be downloaded by the client, offloading almost all the work.

As for image tiles, it would not be unreasonable to have the client download the entire set of data for those as a JSON file on a CDN, but we can actually do away with tile data entirely. If you structure urls on the CDN like this:

/images/:id/:zoom/:x.:y.jpg

then the only data you need to define the entire set of tiles is the base CDN url and the maximum zoom level.

Example flow:

  • A GET request is made, which comes back with:
    • All annotations created by the user (with zoom level, tile coords and tile-relative coords).
    • The base url /images/12345
    • The maximum zoom level of 12.
  • The base image is loaded from the CDN.
  • You zoom down to level 9, at the center of the image.
  • The client calculates that you're looking at a 512×512 grid, and that you want to see tiles 254,254, 254,255 etc - enough to fill the screen.
  • The client requests those files from the CDN, by inserting zoom, x and y in the proper positions (for example, /images/12345/9/254.254.jpg). These can be predictively loaded to make panning and zooming faster.
  • The client looks into its local database of annotations at +- one zoom level, and renders them on each tile as appropriate. For annotations at higher zoom levels, it centers aggregate counts at the position where the tiles will be rendered, once you zoom down. These calculations should be relatively simple and fast on the client.
  • The user makes an annotation.
    • The client stores it in local data and renders it.
    • The client sends a PUT with annotation text, zoom level, tile coords and tile-relative coords.
    • The server adds a row to the DB. If this happens too frequently, a periodically-flushed cache can be used.

The DB structure is simple:

Users

id | other user data

Images

id | cdn | max_zoom

Annotations

id | user_id | zoom_level | tile_coords | tile_relative_coords | content

There's one obvious performance issue - getting 64kb of annotation data is expensive. This can be mitigated with a local cache, or by periodically caching data as JSON to a CDN, and only requesting data from the DB if it's newer than the last CDN cache update.

Even if we want to allow everyone to see everyone else's annotations, the total amount of data involved shouldn't be overwhelming for the client to handle, but we'd want a strong caching solution, so we weren't triggering a thousand DB calls with every image load. If the data is too large, a less client-focused solution would be required.