dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg
Source: Dev.to
Overview
Modern data teams need simple tools for working with Iceberg tables and Dremio. Two new Python libraries—DremioFrame and IceFrame—provide concise, readable APIs for managing Dremio catalogs and Iceberg tables. Both are currently in alpha, making it an ideal time to try them, share feedback, and report issues.
A free 30‑day Dremio Cloud trial (including $400 in credits) gives you access to a built‑in Apache Polaris‑based Iceberg catalog, so you can experiment with both libraries without any additional setup.
DremioFrame
DremioFrame is a Python client that wraps the Dremio Cloud and Dremio Software REST APIs. It offers high‑level methods for:
- Managing sources, folders, views, tags, and security rules
- Running SQL queries and retrieving results as Pandas/Polars DataFrames
- Administering users, roles, and reflections
The library’s design keeps you from handling raw request URLs or version tags. Typical usage looks like:
from dremioframe.client import DremioClient
client = DremioClient(
token="YOUR_DREMIO_CLOUD_PAT",
project_id="YOUR_PROJECT_ID"
)
# Run a simple query
df = client.sql.run("SELECT 1 AS value")
print(df)
# Create a view in the catalog
client.catalog.create_view(
path=["Samples", "small_view"],
sql="SELECT * FROM Samples.samples.Employees"
)
Key Features
- Catalog management – create/delete sources, folders, and views.
- Security – assign policies and role‑based access controls.
- Reflections – manage query acceleration objects.
- Automation‑ready – suitable for batch operations or single‑script updates.
IceFrame
IceFrame provides a focused set of tools for interacting with Iceberg tables via PyIceberg and Polars, with native extensions for performance. Core capabilities include:
- Compacting small files and rewriting data
- Evolving partition specs and sorting data
- Cleaning up old snapshots and orphan files
- Managing Iceberg views (when supported by the catalog)
Example usage:
from iceframe import IceFrame
from datetime import datetime
ice = IceFrame(
{
"uri": "https://catalog.dremio.cloud/api/iceberg/v1",
"token": "YOUR_DREMIO_CLOUD_PAT",
"project_id": "YOUR_PROJECT_ID"
}
)
# Create a simple Iceberg table
data = [
{"id": 1, "name": "Ada", "created_at": datetime.utcnow()},
{"id": 2, "name": "Max", "created_at": datetime.utcnow()}
]
ice.create_table("my_table", data=data)
# Query the table
result = ice.query("my_table").limit(10).execute()
print(result)
AI Assistant
IceFrame includes an AI‑powered assistant that can answer natural‑language questions about table schemas, generate example code, and suggest filters or joins, helping new users explore data quickly.
Getting Started with the Dremio Cloud Trial
- Sign up for the free 30‑day trial (includes $400 credits) → Get started page.
- Create a personal access token in Dremio Cloud.
- Install the libraries (see below) and configure the token and project ID as shown in the examples.
The trial provides a hosted Iceberg catalog, so you can create tables and view them instantly from both DremioFrame and IceFrame without provisioning external storage.
Using Both Libraries Together
A typical workflow combines IceFrame for local table creation/maintenance and DremioFrame for catalog registration and federation:
- Create/maintain Iceberg tables with IceFrame.
- Register those tables in the Dremio catalog using DremioFrame.
- Build views or apply governance policies across multiple data sources.
This unified approach eliminates the need to juggle multiple tools or craft lengthy REST requests.
Installation
pip install dremioframe iceframe
Both packages are available on PyPI:
- DremioFrame:
- IceFrame:
You can also explore the source repositories if you wish to contribute:
- DremioFrame repo:
- IceFrame repo:
Full Example
# DremioFrame client
from dremioframe.client import DremioClient
client = DremioClient(
token="YOUR_DREMIO_CLOUD_PAT",
project_id="YOUR_PROJECT_ID"
)
# IceFrame client
from iceframe import IceFrame
ice = IceFrame(
{
"uri": "https://catalog.dremio.cloud/api/iceberg/v1",
"token": "YOUR_DREMIO_CLOUD_PAT",
"project_id": "YOUR_PROJECT_ID"
}
)
# Create an Iceberg table with IceFrame
from datetime import datetime
data = [
{"id": 1, "name": "Ada", "created_at": datetime.utcnow()},
{"id": 2, "name": "Max", "created_at": datetime.utcnow()}
]
ice.create_table("my_table", data=data)
# Register the table in Dremio catalog (example)
client.catalog.create_view(
path=["Iceberg", "my_table_view"],
sql="SELECT * FROM iceberg.my_table"
)
# Run a query via DremioFrame
df = client.sql.run("SELECT * FROM Iceberg.my_table_view LIMIT 5")
print(df)
This snippet demonstrates how the two libraries complement each other, providing an end‑to‑end path from data creation to query execution.
Feel free to experiment with the libraries during the trial, share your ideas, and report any issues you encounter. Your feedback will help shape the future of DremioFrame and IceFrame.