Connecting to Analytics Buckets
This feature is in Private Alpha. API stability and backward compatibility are not guaranteed at this stage. Reach out from this Form to request access
When interacting with Analytics Buckets, you authenticate against two main services - the Iceberg REST Catalog and the S3-Compatible Storage Endpoint.
The Iceberg REST Catalog acts as the central management system for Iceberg tables. It allows Iceberg clients, such as PyIceberg and Apache Spark, to perform metadata operations including:
- Creating and managing tables and namespaces
- Tracking schemas and handling schema evolution
- Managing partitions and snapshots
- Ensuring transactional consistency and isolation
The REST Catalog itself does not store the actual data. Instead, it stores metadata describing the structure, schema, and partitioning strategy of Iceberg tables.
Actual data storage and retrieval operations occur through the separate S3-compatible endpoint, optimized for reading and writing large analytical datasets stored in Parquet files.
Authentication
To connect to an Analytics Bucket, you will need
-
An Iceberg client (Spark, PyIceberg, etc) which supports the REST Catalog interface.
-
S3 credentials to authenticate your Iceberg client with the underlying S3 Bucket. To create S3 Credentials go to Project Settings > Storage, for more information, see the S3 Authentication Guide. We will support other authentication methods in the future.
-
The project reference and Service key for your Supabase project. You can find your Service key in the Supabase Dashboard under Project Settings > API.
You will now have an Access Key and a Secret Key that you can use to authenticate your Iceberg client.
Connecting via PyIceberg
PyIceberg is a Python client for Apache Iceberg, facilitating interaction with Iceberg Buckets.
Installation
1pip install pyiceberg pyarrow
Here's a comprehensive example using PyIceberg with clearly separated configuration:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263from pyiceberg.catalog import load_catalogimport pyarrow as paimport datetime# Supabase project refPROJECT_REF = "<your-supabase-project-ref>"# Configuration for Iceberg REST CatalogWAREHOUSE = "your-analytics-bucket-name"TOKEN = "SERVICE_KEY"# Configuration for S3-Compatible StorageS3_ACCESS_KEY = "KEY"S3_SECRET_KEY = "SECRET"S3_REGION = "PROJECT_REGION"S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3"CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg"# Load the Iceberg catalogcatalog = load_catalog( "analytics-bucket", type="rest", warehouse=WAREHOUSE, uri=CATALOG_URI, token=TOKEN, **{ "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", "s3.endpoint": S3_ENDPOINT, "s3.access-key-id": S3_ACCESS_KEY, "s3.secret-access-key": S3_SECRET_KEY, "s3.region": S3_REGION, "s3.force-virtual-addressing": False, },)# Create namespace if it doesn't existcatalog.create_namespace_if_not_exists("default")# Define schema for your Iceberg tableschema = pa.schema([ pa.field("event_id", pa.int64()), pa.field("event_name", pa.string()), pa.field("event_timestamp", pa.timestamp("ms")),])# Create table (if it doesn't exist already)table = catalog.create_table_if_not_exists(("default", "events"), schema=schema)# Generate and insert sample datacurrent_time = datetime.datetime.now()data = pa.table({ "event_id": [1, 2, 3], "event_name": ["login", "logout", "purchase"], "event_timestamp": [current_time, current_time, current_time],})# Append data to the Iceberg tabletable.append(data)# Scan table and print data as pandas DataFramedf = table.scan().to_pandas()print(df)
Connecting via Apache Spark
Apache Spark allows distributed analytical queries against Iceberg Buckets.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556from pyspark.sql import SparkSession# Supabase project refPROJECT_REF = "<your-supabase-ref>"# Configuration for Iceberg REST CatalogWAREHOUSE = "your-analytics-bucket-name"TOKEN = "SERVICE_KEY"# Configuration for S3-Compatible StorageS3_ACCESS_KEY = "KEY"S3_SECRET_KEY = "SECRET"S3_REGION = "PROJECT_REGION"S3_ENDPOINT = f"https://{PROJECT_REF}.supabase.co/storage/v1/s3"CATALOG_URI = f"https://{PROJECT_REF}.supabase.co/storage/v1/iceberg"# Initialize Spark session with Iceberg configurationspark = SparkSession.builder \ .master("local[*]") \ .appName("SupabaseIceberg") \ .config("spark.driver.host", "127.0.0.1") \ .config("spark.driver.bindAddress", "127.0.0.1") \ .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,org.apache.iceberg:iceberg-aws-bundle:1.6.1') \ .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \ .config("spark.sql.catalog.my_catalog", "org.apache.iceberg.spark.SparkCatalog") \ .config("spark.sql.catalog.my_catalog.type", "rest") \ .config("spark.sql.catalog.my_catalog.uri", CATALOG_URI) \ .config("spark.sql.catalog.my_catalog.warehouse", WAREHOUSE) \ .config("spark.sql.catalog.my_catalog.token", TOKEN) \ .config("spark.sql.catalog.my_catalog.s3.endpoint", S3_ENDPOINT) \ .config("spark.sql.catalog.my_catalog.s3.path-style-access", "true") \ .config("spark.sql.catalog.my_catalog.s3.access-key-id", S3_ACCESS_KEY) \ .config("spark.sql.catalog.my_catalog.s3.secret-access-key", S3_SECRET_KEY) \ .config("spark.sql.catalog.my_catalog.s3.remote-signing-enabled", "false") \ .config("spark.sql.defaultCatalog", "my_catalog") \ .getOrCreate()# SQL Operationsspark.sql("CREATE NAMESPACE IF NOT EXISTS analytics")spark.sql(""" CREATE TABLE IF NOT EXISTS analytics.users ( user_id BIGINT, username STRING ) USING iceberg""")spark.sql(""" INSERT INTO analytics.users (user_id, username) VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')""")result_df = spark.sql("SELECT * FROM analytics.users")result_df.show()
Connecting to the Iceberg REST Catalog directly
To authenticate with the Iceberg REST Catalog directly, you need to provide a valid Supabase Service key as a Bearer token.
1234curl \ --request GET -sL \ --url 'https://<your-supabase-project>.supabase.co/storage/v1/iceberg/v1/config?warehouse=<bucket-name>' \ --header 'Authorization: Bearer <your-service-key>'