Permissions

When a namespace contains documents belonging to multiple users or groups, queries should only return documents the user has access to.

Permissions in turbopuffer currently have to be implemented at the user-level with filters, as turbopuffer doesn't provide built-in mechanisms for row/document-level RBAC.

Store the user_id or group_ids that have read access directly on each document. At query time, fetch the user's id and groups from your auth layer and pass them as a filter. Generally this approach is more performant than passing document ids in a filter.

An array can be up to 8Mib in size so any group and user id identifiers stored on each document have to fit into this limit. We store filterable attributes in an inverted index structure that allows us to efficiently filter 10 000s of user ids without performance degradation; the sidebar widget shows representative p90 latency as the number of permission ids in the query grows.

To reduce storage costs associated with storing user and group permissions on each document, encode them as uuids. Note that the uuid type needs to be explicitly specified in the schema, otherwise the type will be inferred as a slower and more expensive string type.

Marking documents readable by everyone

To mark a document as readable by everyone, write a boolean attribute like is_public: true at upsert time and filter on ["is_public", "Eq", true]. Making universal access an explicit signal is safer than inferring it from an empty permission array, which should mean no access. It's also faster: turbopuffer builds the inverted index from the elements of an array, so filtering for an empty array like ["groups", "Eq", []] has no element to narrow on and must post-filter, scanning far more data, whereas the boolean is served by a fast indexed prefilter.

import os
import turbopuffer


tpuf = turbopuffer.Turbopuffer(
    region='gcp-us-central1', # choose best region: https://turbopuffer.com/docs/regions
    api_key=os.getenv('TURBOPUFFER_API_KEY'),
)

ns = tpuf.namespace(f'permissions-example-py')

# write a few sample documents that are permissioned by group and user_ids

ns.write(
    upsert_rows=[
        {
            'id': 1,
            'vector': [1, 1],
            'content': 'changes in the leadership team',
            'groups': [],
            'user_ids' : [123, 453, 125, 189],
            'is_public': False
        },
        {
            'id': 2,
            'vector': [2, 1],
            'content': 'simon & nikhil - 1:1 notes',
            'groups': [],
            'user_ids' : [123, 125],
            'is_public': False
        },
        {
            'id': 3,
            'vector': [6, 1],
            'content': 'notes on planned Kubernetes migration',
            'groups': ['eng'],
            'user_ids' : [96],
            'is_public': False
        },
        {
            # this doc has no group/user restrictions and is visible to everyone.
            # use a boolean attribute so the query can find it with an indexed
            # lookup instead of filtering on an empty array (e.g. groups Eq [])
            'id': 4,
            'vector': [3, 1],
            'content': 'company-wide resources and the latest meeting notes',
            'groups': [],
            'user_ids' : [],
            'is_public': True
        }
    ],
    schema={
        'content': {
            'type': 'string',
            'full_text_search': True
        }
    },
    distance_metric='cosine_distance'
)

# now we can query the data passing in the appropriate permissions

result = ns.query(
    rank_by=('content', 'BM25', 'notes'),
    filters=('Or', (
        ('groups', 'Contains', 'design'),
        ('user_ids', 'Contains', 96),
        ('is_public', 'Eq', True))),
    limit=10,
    include_attributes=['content']
)
print(result.rows)

# doc 3 (accessible via user_ids) and doc 4 (public) both match and contain "notes":
# [Row(id=3, vector=None, $dist=0.9686553, content='notes on planned Kubernetes migration'),
#  Row(id=4, vector=None, $dist=0.6209813, content='company-wide resources and the latest meeting notes')]