My brain doesn't like leaving problems unsolved.
On Friday night, I figured out what I think was a reasonable DB architecture and system design for your interview question, that did away with my quad tree rabbit hole and switched to simple image tiles, like Google Maps uses. At that point, I let it go, disappointed that I hadn't been able to solve the problem earlier.
This morning, however, I realized that the problem could be much simpler than I thought.
When PathAI's algorithms analyze an image, they need access to every annotation, but a user only cares about their own annotations. If we assume that the average annotation is 256 bytes, and the maximum number of annotations created by one user is 256, then we're only talking 65k of data at most. This is small enough to be downloaded by the client, offloading almost all the work.
As for image tiles, it would not be unreasonable to have the client download the entire set of data for those as a JSON file on a CDN, but we can actually do away with tile data entirely. If you structure urls on the CDN like this:
/images/:id/:zoom/:x.:y.jpg
then the only data you need to define the entire set of tiles is the base CDN url and the maximum zoom level.
Example flow:
The DB structure is simple:
id | other user data
id | cdn | max_zoom
id | user_id | zoom_level | tile_coords | tile_relative_coords | content
There's one obvious performance issue - getting 64kb of annotation data is expensive. This can be mitigated with a local cache, or by periodically caching data as JSON to a CDN, and only requesting data from the DB if it's newer than the last CDN cache update.
Even if we want to allow everyone to see everyone else's annotations, the total amount of data involved shouldn't be overwhelming for the client to handle, but we'd want a strong caching solution, so we weren't triggering a thousand DB calls with every image load. If the data is too large, a less client-focused solution would be required.