ANN v3: 100B+ vectors @ 200ms p99

Permissions

When a namespace contains documents belonging to multiple users or groups, queries should only return documents the user has access to.

Permissions in turbopuffer currently have to be implemented at the user-level with filters, as turbopuffer doesn't provide built-in mechanisms for row/document-level RBAC.

Store the user_id or group_ids that have read access directly on each document. At query time, fetch the user's id and groups from your auth layer and pass them as a filter. Generally this approach is more performant than passing document ids in a filter.

An array can be up to 8Mib in size so any group and user id identifiers stored on each document have to fit into this limit. We store filterable attributes in an inverted index structure that allows us to efficiently filter 10 000s of user ids without performance degradation.

To reduce storage costs associated with storing user and group permissions on each document, encode them as uuids. Note that the uuid type needs to be explicitly specified in the schema, otherwise the type will be inferred as a slower and more expensive string type.

Follow
BlogRSS