# Import necessary libraries
import os
import tiledb
import tiledb.sql
import pandas as pd
# You should set the appropriate environment variables with your keys.
# Get the keys from the environment variables.
= os.environ["AWS_ACCESS_KEY_ID"]
aws_access_key_id = os.environ["AWS_SECRET_ACCESS_KEY"]
aws_secret_access_key
# Get the bucket and region from environment variables
= os.environ["S3_BUCKET"]
s3_bucket = os.environ["S3_REGION"]
s3_region
# Set the AWS keys and region to the config of the default context
# This context initialization can be performed only once.
= tiledb.Config(
cfg
{"vfs.s3.region": s3_region,
"vfs.s3.aws_access_key_id": aws_access_key_id,
"vfs.s3.aws_secret_access_key": aws_secret_access_key,
"vfs.s3.no_sign_request": True,
}
)= tiledb.Ctx(cfg)
ctx
# Set tab;e URI
= "basic_s3"
table_name = s3_bucket + "/" + table_name
table_uri = (
csv_uri "s3://tiledb-inc-demo-data/examples/notebooks/nyc_yellow_tripdata/taxi_first_10.csv"
)
# Clean up previous data
if tiledb.array_exists(table_uri, ctx=ctx):
=ctx) tiledb.Array.delete_array(table_uri, ctx
Basic S3 Example
We recommend running this tutorial, as well as the other tutorials in the Tutorials section, inside TileDB Cloud. By using TileDB Cloud, you can experiment while avoiding all the installation, deployment, and configuration hassles. Sign up for the free tier, spin up a TileDB Cloud notebook with a Python kernel, and follow the tutorial instructions. If you wish to learn how to run tutorials locally on your machine, read the Tutorials: Running Locally tutorial.
This tutorial shows how to use TileDB’s tabular offering to store a table on S3, and query it efficiently without the need to download it locally. For more information on how TileDB efficiently works on object stores, visit the Array Key Concepts: Object Stores section.
The only difference to working with local tables is twofold:
- Set the appropriate AWS credentials in environment variables and load them into a configuration object in a TileDB context.
- Use an
s3://
URI instead of a local path for the table location.
Other than those differences, the rest of the operations are the same as for local tables.
First, load the appropriate libraries, set the AWS credentials in a context, specify the table S3 URI, and delete any already-created table with the same URI.
Ingest a CSV file to create the table on S3.
Query the table with SQL.
Clean up in the end by removing the table from S3: