Installation
TileDB Cloud Self-Hosted edition is available for installation in Kubernetes clusters though a Helm chart. The Helm chart will install all components of TileDB Cloud. The instructions below will walk you though getting the Helm chart, getting access to the private Docker registry and setting up the installation.
Accounts
In order to use TileDB Cloud Self-Hosted, you will need to get access to the private Docker registry and the private Helm registry. Please contact your TileDB, Inc. account representative for credentials to these services.
Installation
After setting up the necessary prerequisites for installing TileDB Cloud Self-Hosted locally, the installation process involves the following steps:
- Add the Helm repository.
- Create the Kubernetes namespace.
- Create a custom
values.yaml
. - Install TileDB Cloud.
- Validate your installation.
Prerequisites
In order to successfully install TileDB Cloud Self-Hosted locally, you need to set up the following:
Kubernetes
A Kubernetes cluster is required for installation. Setting up a Kubernetes cluster is outside the scope of this document. Please contact your account representative if you need assistance with this.
TileDB Cloud has been officially tested with the following Kubernetes environments:
- EKS (AWS)
- AKS (Azure)
- GKE (GCP)
Kubernetes components
The minimum Kubernetes version TileDB Cloud Self-Hosted supports is v1.25.0
. If your cluster is older than this, you will need to upgrade. The latest tested version is v1.29
. Newer Kubernetes versions are not yet tested for compatibility, and you may encounter issues. Please contact your account representative if you need a newer Kubernetes version.
You will also need the following components configured in your cluster:
- Metric Server for auto-scaling
- Ingress for exposing the service
Kubernetes worker nodes
Your Kubernetes worker nodes should have at least 24GB of memory and at least 8 CPUs. This is based on the default configurations of TileDB Cloud for allowing up to 2GB of RAM for computation requests. If you adjust the configurations for TileDB Cloud memory usage, it’s likely the optimal k8s worker nodes’ sizes will change.
Helm
TileDB Cloud Self-Hosted uses Helm charts to install TileDB services in the Kubernetes cluster. You will need to have Helm v3 installed on your local machine to facilitate the installation. Helm v3 does not require any components inside the Kubernetes cluster.
Helm v3.6
or later is required.
MariaDB
MariaDB 10.3 or later is required. This is used for persistent storage of user account details, organizations, tasks, and more. While MySQL should be compatible, the TileDB team has not tested it thoroughly enough to fully support it. Thus, only MariaDB 10.3 and later versions are officially supported.
It is strongly recommended to enable SSL and at-rest encryption with MariaDB.
Add the Helm repository
To get started with installing TileDB Cloud Self-Hosted, you will need to add the TileDB Helm chart repository. This repository requires authentication. Please use the username/password provided to you by your account representative.
# TileDB Chart is for the TileDB Cloud service itself
helm repo add tiledb https://charts.tiledb.com --username <provided by TileDB>
Create the Kubernetes namespace
TileDB Cloud will be installed into a dedicated namespace, tiledb-cloud
.
kubectl create namespace tiledb-cloud
Create a custom values.yaml
Before you install TileDB Cloud Self-Hosted, it is important to set up and customize your installation. This involves creating a custom values file for Helm. Below is a sample file you can save and edit.
Save this file as values.yaml
. To make sure your installation is successful, you must make several required changes to values.yaml
. All sections that require changes are prefixed with a comment of # REQUIRED:
. Examples of the changes needed include setting your Docker registry authentication details and updating the domain names to which you would like to deploy TileDB Cloud.
values.yaml
# Default values for tiledb-cloud-enterprise.
# This is a YAML-formatted file.
# Should hosted notebooks be enabled? If you would like to disable them set this to false
notebooks:
enabled: true
# REQUIRED: Set the docker registry image credentials to pull TileDB Cloud docker images
# The password should be provided to you by your account representative
imageCredentials:
password: ""
##################################
# TileDB Cloud REST API settings #
##################################
tiledb-cloud-rest:
# Service Account to run deployment under
# Change this if you have different RBAC requirements
serviceAccountName: default
# The autoscaling of the service can be adjusted if required
# The following settings are the recommended defaults
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 300
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 50
# .spec.volumes
#volumes:
# - name: test
# emptyDir: {}
# - name: nfs-volume
# nfs:
# server: nfs.example.com
# path: /nfs/
# .spec.containers[*].volumeMounts
# A volume with the same name declared here
# must exist in volumes.
#volumeMounts:
# - name: test
# mountPath: /test
# readOnly: true
# - name: nfs-volume
# mountPath: /nfs_data
# key:value pairs defined below are configured
# as ENV variables on all rest pod containers
#extraEnvs:
# - KEY1: value1
# - KEY2: value2
# Config ingress, be sure to set the url to where you want to expose the api
ingress:
url:
# REQUIRED: Change this to the hostname you'd like the API service to be at
- api.tiledb.example.com
# optional TLS
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources:
# REQUIRED:
# Set the resource limits for the REST service.
# We recommend a minimum of 8 cpus and 24 GB of ram on the worker nodes
# We set REST to slightly below this to allow for other pods on the same worker node.
# These setting effect the number of concurrent operations
# limits:
# cpu: 100m
# memory: 128Mi
requests:
# cpu: 16000m
# memory: 16Gi
cpu: 7000m
memory: 17Gi
resourcesDind:
# Set the resources for the Docker-in-Docker pod, this is where the UDFs run
# The resources here directly effect the number of concurrent UDFs that can be run
requests:
memory: 6Gi
restConfig:
# REQUIRED: Set the private dockerhub registry credentials, these are the same as the `imageCredentials` above
ContainerRegistry:
DockerhubPassword: ""
# REQUIRED: Set the initial passwords for the internal users of Rest
# Replace "secret" with a strong password
# This config can be removed after the first run of Rest
ComputationUserInitialPassword: "secret"
PrometheusUserInitialPassword: "secret"
CronUserInitialPassword: "secret"
UIUserInitialPassword: "secret"
DebugUserInitialPassword: "secret"
# REQUIRED: Set the signing secret(s) for api tokens, this should be a secure value
# We recommend creating a random value with `openssl rand -hex 32`
# This is a list of token signing secrets. Zero element of the list is used
# for signing, the rest are used for validation.
# This mechanism provides a way to rotate signing secrets.
# In case there are active tokens signed with a key and this key is removed from
# the list, the tokens are invalidated.
TokenSigningSecrets:
- "Secret"
# REQUIRED: This is needed for the TileDB Jupyterlab Prompt User Options extension
CorsAllowedOrigins:
- "https://jupyterhub.tiledb.example.com"
# REQUIRED: Define supported storage types and locations, if you want to use NFS
# enable "local"
StorageLocationsSupported:
- "s3"
#- "local"
#- "hdfs"
#- "azure"
#- "gcs"
ArraySettings:
# When enabled, AWS credentials will be auto-discovered
# from the Environment, config file, EC2 metadata etc.
AllowS3NoCredentials: false
# Change to false avoid any region checks for s3 compatible storage.
CheckS3Region: true
# A default location can be set so users creating new arrays
# do not need to know the full storage paths.
# Example usage is with nfs mounted storage
# DefaultLocation: "/nfs_data/tiledb-cloud"
# Set a whitelist allowed paths. This limits users to writing to specified mount path.
# LocationWhitelist:
# - "/nfs_data"
Email:
# Should users be required to confirm their email addresses
# By default email confirmation is disabled as this requires a working SMTP setup
DisableConfirmation: True
# REQUIRED: The UI Server address is used for sending a link to the reset password email
UIServerAddress: "https://console.tiledb.example.com"
# Email Accounts
Accounts:
Noreply: "no-reply@example.com"
Admin: "admin@example.com"
# SMTP settings are used for sending emails, such as email confirmation,
# password reset and notifications
SMTP:
Enabled: False
Host: "smtp.example.local"
Port: 587
#Username: ""
#Password: ""
SSLVerify: True
# REQUIRED: Configure main database. It is recommended to host a MariaDB or MySQL instance outside of the kubernetes cluster
Databases:
# `main` is a required database configuration
main:
Driver: mysql
Host: "{{ .Release.Name }}-mariadb.{{ .Release.Namespace }}.svc.cluster.local"
Port: 3306
Schema: tiledb_rest
Username: tiledb_user
Password: password
# Set log level, 1=Panic, 2=Fatal, 3=Error, 4=Warning, 5=Info, 6=Debug
LogVerbosity: 4
# Configure any default TileDB Open Source settings using key: value mapping
# Example setting to override s3 endpoint
# TileDBEmbedded:
# Config:
# "vfs.s3.endpoint_override": "s3.wasabisys.com"
# LDAP settings. Enable and configure if you wish to allow LDAP for user account login
# Ldap:
# Enable: false
# EnableTLS: false
# Hosts:
# - ldap.example.com
# Port: 389
# HostsTLS:
# - ldap.example.com
# PortTLS: 389
# BaseDN: DC=ldaplab,DC=local
# UserDN: CN=tiledb,CN=Users,DC=ldaplab,DC=local
# UserBaseDN:
# - dc=mylab,dc=local,OU=Employees
# - dc=mylab,dc=local,OU=Contractors
# # can be set via config or env variable (TILEDB_REST_LDAP_PASSWORD)
# # Setting via ENV is recommended.
# #PASSWORD: ""
# CommonNames:
# - Users
# - IT
# - Managers
# # OPENLDAP
# # Attributes:
# # email: mail
# # name: givenName
# # username: uid
# Attributes:
# email: mail
# name: name
# username: userPrincipalName
# Configure TLS settings. If you wish to use TLS inside k8s
#Certificate:
# Absolute path to certificate
#CertFile: ""
# Absolute path to private key
#PrivateKey: ""
# TLS Minimum Version options
# 0x0301 #VersionTLS10
# 0x0302 #VersionTLS11
# 0x0303 #VersionTLS12
# 0x0304 #VersionTLS13
#MinVersion: 0x0304
# TLS 1.0 - 1.2 cipher suites. Leaving empty will enable all
#TLS10TLS12CipherSuites:
# - 0x0005 #TLS_RSA_WITH_RC4_128_SHA
# - 0x000a #TLS_RSA_WITH_3DES_EDE_CBC_SHA
# - 0x002f #TLS_RSA_WITH_AES_128_CBC_SHA
# - 0x0035 #TLS_RSA_WITH_AES_256_CBC_SHA
# - 0x003c #TLS_RSA_WITH_AES_128_CBC_SHA256
# - 0x009c #TLS_RSA_WITH_AES_128_GCM_SHA256
# - 0x009d #TLS_RSA_WITH_AES_256_GCM_SHA384
# - 0xc007 #TLS_ECDHE_ECDSA_WITH_RC4_128_SHA
# - 0xc009 #TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
# - 0xc00a #TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
# - 0xc011 #TLS_ECDHE_RSA_WITH_RC4_128_SHA
# - 0xc012 #TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
# - 0xc013 #TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
# - 0xc014 #TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
# - 0xc023 #TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
# - 0xc027 #TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
# - 0xc02f #TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
# - 0xc02b #TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
# - 0xc030 #TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
# - 0xc02c #TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
# - 0xcca8 #TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
# - 0xcca9 #TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
# TLS 1.3 cipher suites. Leaving empty will enable all
#TLS13CipherSuites:
# - 0x1301 #TLS_AES_128_GCM_SHA256
# - 0x1302 #TLS_AES_256_GCM_SHA384
# - 0x1303 #TLS_CHACHA20_POLY1305_SHA256
#PreferServerCipherSuites: false
# CurveID is the type of a TLS identifier for an elliptic curve
# Leaving empty will enable all
#CurveID:
# - 23 #CurveP256 CurveID
# - 24 #CurveP384 CurveID
# - 25 #CurveP521 CurveID
# - 29 #X25519
# SSO:
# Okta Service details.
# The Okta domain is required.
# If SCIM support is desired, provide a list of passwords accepted by the
# Okta provider. At least one password is required.
# Okta:
# Domain: "domain-name.okta.com"
# SCIMPasswords:
# - abcdef
# - ghijkl
# - mnopqr
# It is not recommend to run the database inside k8s for production use, but it is helpful for testing
mariadb:
# Set to true if you wish to deploy a database inside k8s for testing
enabled: false
image:
repository: bitnami/mariadb
tag: 10.5.8
pullPolicy: IfNotPresent
auth:
# Auth parameters much match with the restConfig.Databases.main above
database: tiledb_rest
username: tiledb_user
password: password
rootPassword: changeme
primary:
# Enable persistence if you wish to save the database, again running in k8s is not recommend for production use
persistence:
enabled: false
# Set security context to user id of mysqld user in tiledb-mariadb-server
podSecurityContext:
enabled: true
fsGroup: 999
containerSecurityContext:
enabled: true
runAsUser: 999
####################################
# TileDB Cloud UI Console settings #
####################################
tiledb-cloud-ui:
# Service Account to run deployment under
# Change this if you have different RBAC requirements
serviceAccountName: default
# The autoscaling of the service can be adjusted if required
# The following settings are the recommended defaults
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 300
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 50
# REQUIRED: set the url of the jupyterhub server
config:
# REQUIRED: Set a secret here with `openssl rand -hex 32`
SessionKey: "secret"
RestServer:
# REQUIRED: This needs to be set to
# the same value as restConfig.UIUserInitialPassword
Password: "secret"
JupyterhubURL: "https://jupyterhub.tiledb.example.com"
# SSOOkta:
# Domain: "domain-name.okta.com"
# ClientID: "client_id"
# ClientSecret: "secret"
# REQUIRED: Config ingress, be sure to set the hostname to where you want to expose the UI
ingress:
enabled: true
# REQUIRED: Set URL for web console
url:
- console.tiledb.example.com
# optional TLS
tls: []
####################################
# TileDB Cloud Map Server settings #
####################################
tiledb-cloud-mapserver:
enabled: false
serviceAccountName: default
resources:
requests:
cpu: 2000m
memory: 4Gi
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 2
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 50
requestsPerSecond: 10k
# config:
# Addr: "0.0.0.0"
# Port: 8030
# RestServer:
# Host: "api.tiledb.example.com"
# HostScheme: "http"
# Scheme: http
#########################################
# TileDB Cloud Hosted Notebook Settings #
#########################################
jupyterhub:
# REQUIRED: Set the private registry credentials, these are the same as the `imageCredentials` above
imagePullSecret:
password: ""
proxy:
# REQUIRED: Set a signing secret here with `openssl rand -hex 32`
secretToken: "Secret"
# The pre-puller is used to to ensure the docker images for notebooks are pre-pulled to each node
# This can improve notebook startup time, but add additional storage requirements to the nodes
# If you wish to use dedicated k8s node groups for notebooks, see:
# https://zero-to-jupyterhub.readthedocs.io/en/0.8.2/optimization.html?highlight=labels#using-a-dedicated-node-pool-for-users
prePuller:
hook:
enabled: false
continuous:
# NOTE: if used with a Cluster Autoscaler, also add user-placeholders
enabled: false
scheduling:
# You can enable at least one warm instance for users by enabling the userPlaceholder
userPlaceholder:
enabled: false
replicas: 1
# Disable podPriority, it is only useful if userPlaceholders are enabled
podPriority:
enabled: false
singleuser:
startTimeout: 900
# Set the size of the user's persisted disk space in notebooks
storage:
capacity: 2G
# JupyterHub expects the Kubernetes Storage Class to be configured
# with "volumeBindingMode: Immediate" and "reclaimPolicy: Retain".
# If your default Storage Class does not support this, you can
# create a new one and configure it bellow.
#dynamic:
# storageClass: "jupyterhub"
hub:
config:
CryptKeeper:
# REQUIRED: Set the jupyterhub auth secret for persistence, this should be a secure value
# We recommend creating a random value with `openssl rand -hex 32`
keys:
- "Secret"
TileDBCloud:
# REQUIRED: Set the oauth2 secret, this should be a secure value
# We recommend creating a random value with `openssl rand -hex
client_secret: "Secret"
# REQUIRED: Set the domain for the jupyterhub and the oauth2 service
# it is likely you just need to replace `example.com` with your own internal domain
# This should match the ingress settings above and the hydra settings below
oauth_callback_url: "http://jupyterhub.tiledb.example.com/hub/oauth_callback"
token_url: "http://oauth2.tiledb.example.com/oauth2/token"
authorize_url: "http://oauth2.tiledb.example.com/oauth2/auth"
userdata_url: "http://oauth2.tiledb.example.com/userinfo"
# Uncomment for any extra settings
# extraConfig:
# Uncomment to disable SSL validation. Useful when testing deployments
# ssl_config: |
# c.Spawner.env_keep.append("TILEDB_REST_IGNORE_SSL_VALIDATION")
# Uncomment to modify the securityContext of JupyterHub pods
# securityContext: |
# c.Spawner.extra_container_config = {
# "securityContext": {
# "runAsGroup": 100,
# "runAsUser": 1000,
# "allowPrivilegeEscalation": False,
# "capabilities": {
# "drop": ["ALL"]
# }
# }
# }
# REQUIRED: Set the domain for the REST API and the oauth2 service
# it is likely you just need to replace `example.com` with your own internal domain
# This should match the tiledb-cloud-rest settings above and the hydra settings below
extraEnv:
OAUTH2_AUTHORIZE_URL: "https://oauth2.tiledb.example.com/oauth2/auth"
OAUTH2_USERDATA_URL: "https://oauth2.tiledb.example.com/userinfo"
TILEDB_REST_HOST: "https://api.tiledb.example.com"
# Uncomment to disable SSL validation. Useful when testing deployments
# TILEDB_REST_IGNORE_SSL_VALIDATION: "true"
ingress:
enabled: true
# REQUIRED: set the ingress domain for hosted notebooks
hosts:
- "jupyterhub.tiledb.example.com"
# optional TLS
tls: []
########################################
# TileDB Cloud Oauth2 Service Settings #
########################################
hydra:
hydra:
# REQUIRED: Set the domain for the jupyterhub
# it is likely you just need to replace `example.com` with your own internal domain
# This should match the ingress settings above and the hydra settings below
dangerousAllowInsecureRedirectUrls:
- http://jupyterhub.tiledb.example.com/hub/oauth_callback
config:
# Optionally set the internal k8s cluster IP address space to allow non-ssl connections from
# This defaults to all private IP spaces
# serve:
# tls:
# allow_termination_from:
# Set to cluster IP
# - 172.20.0.0/12
secrets:
# REQUIRED: Set the oauth2 secret, this should be a secure value
# We recommend creating a random value with `openssl rand -hex 32`
system:
- "secret"
cookie:
- "Secret"
# REQUIRED: Set MariaDB Database connection, this defaults to the in k8s development settings.
# You will need to set this to the same connection parameters as the tiledb-cloud-rest section
dsn: "mysql://tiledb_user:password@tcp(tiledb-cloud-mariadb.tiledb-cloud.svc.cluster.local:3306)/tiledb_rest?parseTime=true"
urls:
self:
# REQUIRED: Update the domain for the oauth2 service and the web console ui
# It is likely you can just replace `example.com` with your own internal domain
issuer: "https://oauth2.tiledb.example.com/"
public: "https://oauth2.tiledb.example.com/"
login: "https://console.tiledb.example.com/oauth2/login"
consent: "https://console.tiledb.example.com/oauth2/consent"
# Configure ingress for oauth2 service
ingress:
public:
hosts:
# REQUIRED: set the ingress domain for oauth2 service
- host: "oauth2.tiledb.example.com"
paths:
- path: /
pathType: ImplementationSpecific
# optional TLS
tls: []
######################
# Ingress Controller #
######################
ingress-nginx:
# This is provided for ease of testing, it is recommend to establish your own ingress which fits your environment
enabled: false
## nginx configuration
## Ref: https://github.com/kubernetes/ingress/blob/master/controllers/nginx/configuration.md
##
controller:
name: controller
autoscaling:
enabled: true
minReplicas: 2
config:
use-proxy-protocol: "true"
log-format-escape-json: "true"
log-format-upstream: '{ "time": "$time_iso8601", "remote_addr": "$proxy_protocol_addr", "x-forward-for": "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user": "$remote_user", "bytes_sent": $bytes_sent, "request_time": $request_time, "status": $status, "vhost": "$host", "request_proto": "$server_protocol", "path": "$uri", "request_query": "$args", "request_length": $request_length, "duration": $request_time, "method": "$request_method", "http_referrer": "$http_referer", "http_user_agent": "$http_user_agent" }'
# Set timeouts to 1 hour
proxy-send-timeout: "3600"
proxy-read-timeout: "3600"
send-timeout: "3600"
client-max-body-size: "3076m"
proxy-body-size: "3076m"
proxy-buffering: "off"
proxy-request-buffering: "off"
proxy-http-version: "1.1"
ingressClass: nginx
## Allows customization of the external service
## the ingress will be bound to via DNS
publishService:
enabled: true
service:
annotations:
# Enable public facing load balancer
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
# Set any needed annotations. The default ones we have set are for aws ELB nginx
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
# Set aws-load-balancer-internal to allow all traffic from inside
# the vpc only, the -internal makes it not accessible to the internet
service.beta.kubernetes.io/aws-load-balancer: "0.0.0.0/0"
# Set timeout to 1 hour
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"
## Set external traffic policy to: "Local" to preserve source IP on
## providers supporting it
## Ref: https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-typeloadbalancer
externalTrafficPolicy: "Local"
type: LoadBalancer
Install TileDB Cloud
Once you have created the values.yaml
file, you can install TileDB Cloud by running the following helm
command.
helm install \
\
--namespace tiledb-cloud \
--values values.yaml \
--wait \
--timeout 15m \
tiledb-cloud tiledb/tiledb-cloud-enterprise
Validate your installation
After you have installed TileDB Cloud, you can verify the installation works by doing the following:
- Create an account.
- Create your first array.
- View your array in the Web Console.
- Launch a Jupyter notebook.
Create an account
The first step after completing the installation is to try logging in to the web UI. The URL is dependent on your installation. In the values.yaml
file, you should have replaced console.tiledb.example.com
with the domain on which to access it. Navigate in your web browser to that URL and create an account.
Completing this step means that both the TileDB Cloud UI and TileDB Cloud REST components are working.
Create your first array
Now that you have an account, you can create your first array. This step will show you that creating, writing to, and reading from an array are functioning, as well as give you an array and a task to view in the UI.
For this section, you will use a Python script. This script will create, write to, and read from an array. Note the two sections where you need to adjust the configuration for your TileDB Cloud instance and set the array storage location.
Prerequisites
This section requires the TileDB-Py
API installed. You can get this from pip
or conda
. Once you have installed TileDB-Py, copy the following script to check_installation.py
and modify the first few lines as required
import numpy as np
import sys
import tiledb
# username/password for TileDB Cloud instance
# Note you could also use an api token, which is generally preferred, however
# for simplicity of the example we'll use username/password combo here
= ""
username = ""
password # Where should the array be stored? This can be a object store,
# or a path inside the rest server where a nfs server is mounted
= "file:///nfs/tiledb_arrays/example"
storage_path = "tiledb://{}/{}/quickstart_sparse".format(username, storage_path)
array_uri
# Set the host to your TileDB Cloud host
= "http://api.tiledb.example.com"
host
= tiledb.Ctx(
ctx "rest.username": username, "rest.password": password, "rest.server_address": host}
{
)
def create_array():
# The array will be 4x4 with dimensions "rows" and "cols", with domain [1,4].
= tiledb.Domain(
dom ="rows", domain=(1, 4), tile=4, dtype=np.int32, ctx=ctx),
tiledb.Dim(name="cols", domain=(1, 4), tile=4, dtype=np.int32, ctx=ctx),
tiledb.Dim(name=ctx,
ctx
)
# The array will be sparse with a single attribute "a" so each (i,j) cell can store an integer.
= tiledb.ArraySchema(
schema =dom,
domain=True,
sparse=[tiledb.Attr(name="a", dtype=np.int32, ctx=ctx)],
attrs=ctx,
ctx
)
# Create the (empty) array on disk.
tiledb.SparseArray.create(array_uri, schema)
def write_array():
# Open the array and write to it.
with tiledb.SparseArray(array_uri, mode="w", ctx=ctx) as A:
# Write some simple data to cells (1, 1), (2, 4) and (2, 3).
= [1, 2, 2], [1, 4, 3]
I, J = np.array(([1, 2, 3]))
data = data
A[I, J]
def read_array():
# Open the array and read from it.
with tiledb.SparseArray(array_uri, mode="r", ctx=ctx) as A:
# Slice only rows 1, 2 and cols 2, 3, 4.
= A[1:3, 2:5]
data = data["a"]
a_vals for i, coord in enumerate(zip(data["rows"], data["cols"])):
print("Cell (%d, %d) has data %d" % (coord[0], coord[1], a_vals[i]))
create_array()
write_array() read_array()
Run this script with the following:
python check_installation.py
If this script ran and printed out the output without errors, then your installation is working successfully for creating, reading from, and writing to TileDB arrays.
View your array in the web console
The newly created array, quickstart_sparse
, should now be viewable in the web console. If you navigate to the arrays
page, you will see it listed there.
Launch a Jupyter notebook
It is recommended that you attempt to launch a Jupyter notebook to confirm that the cluster and deployment is correctly configured. You can launch a notebook by going to the Monitor tab in the web console and selecting Launch server.
Upgrades
Visit Upgrades for more information.
Example configurations
This section will define a handful of example configurations for common scenarios. These examples serve as a basis for you to modify for your specific deployment.
Minimum configuration for deployment
The following configuration is a reduced set that brings up services with minimum features enabled. This includes no configuration for email (SMTP), only local accounts (no SSO/LDAP/AD), no TLS certificates, etc. This should NOT be used for production deployments, but it is helpful for getting an initial deployment that can be then modified.
Minimum Configuration
# Minimum default values for tiledb-cloud-enterprise.
# This is a YAML-formatted file.
# REQUIRED: Set the docker registry image credentials to pull TileDB Cloud docker images
# The password should be provided to you by your account representative
imageCredentials:
password: ""
##################################
# TileDB Cloud REST API settings #
##################################
tiledb-cloud-rest:
# Config ingress, be sure to set the url to where you want to expose the api
ingress:
url:
# REQUIRED: Change this to the hostname you'd like the API service to be at
- api.tiledb.example.com
# optional TLS
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources:
# REQUIRED:
# Set the resource limits for the REST service.
# We recommend a minimum of 8 cpus and 24 GB of ram on the worker nodes
# We set REST to slightly below this to allow for other pods on the same worker node.
# These setting effect the number of concurrent operations
# limits:
# cpu: 100m
# memory: 128Mi
requests:
# cpu: 16000m
# memory: 16Gi
cpu: 7000m
memory: 17Gi
resourcesDind:
# Set the resources for the Docker-in-Docker pod, this is where the UDFs run
# The resources here directly effect the number of concurrent UDFs that can be run
requests:
memory: 6Gi
restConfig:
# REQUIRED: Set the private dockerhub registry credentials, these are the same as the `imageCredentials` above
ContainerRegistry:
DockerhubPassword: ""
# REQUIRED: Set the initial passwords for the internal users of Rest
# Replace "secret" with a strong password
# This config can be removed after the first run of Rest
ComputationUserInitialPassword: "secret"
PrometheusUserInitialPassword: "secret"
CronUserInitialPassword: "secret"
UIUserInitialPassword: "secret"
DebugUserInitialPassword: "secret"
# REQUIRED: Set the signing secret(s) for api tokens, this should be a secure value
# We recommend creating a random value with `openssl rand -hex 32`
# This is a list of token signing secrets. Zero element of the list is used
# for signing, the rest are used for validation.
# This mechanism provides a way to rotate signing secrets.
# In case there are active tokens signed with a key and this key is removed from
# the list, the tokens are invalidated.
TokenSigningSecrets:
- "Secret"
# REQUIRED: This is needed for the TileDB Jupyterlab Prompt User Options extension
CorsAllowedOrigins:
- "https://jupyterhub.tiledb.example.com"
# REQUIRED: Define supported storage types and locations, if you want to use NFS
# enable "local"
StorageLocationsSupported:
- "s3"
#- "local"
#- "hdfs"
#- "azure"
#- "gcs"
# REQUIRED: Configure main database. It is recommended to host a MariaDB or MySQL instance outside of the kubernetes cluster
Databases:
# `main` is a required database configuration
main:
Driver: mysql
Host: "{{ .Release.Name }}-mariadb.{{ .Release.Namespace }}.svc.cluster.local"
Port: 3306
Schema: tiledb_rest
Username: tiledb_user
Password: password
# It is not recommend to run the database inside k8s for production use, but it is helpful for testing
mariadb:
# Set to true if you wish to deploy a database inside k8s for testing
enabled: false
image:
repository: bitnami/mariadb
tag: 10.5.8
pullPolicy: IfNotPresent
auth:
# Auth parameters much match with the restConfig.Databases.main above
database: tiledb_rest
username: tiledb_user
password: password
rootPassword: changeme
primary:
# Enable persistence if you wish to save the database, again running in k8s is not recommend for production use
persistence:
enabled: false
# Set security context to user id of mysqld user in tiledb-mariadb-server
podSecurityContext:
enabled: true
fsGroup: 999
containerSecurityContext:
enabled: true
runAsUser: 999
####################################
# TileDB Cloud UI Console settings #
####################################
tiledb-cloud-ui:
# REQUIRED: set the url of the jupyterhub server
config:
# REQUIRED: Set a secret here with `openssl rand -hex 32`
SessionKey: "secret"
RestServer:
# REQUIRED: This needs to be set to
# the same value as restConfig.UIUserInitialPassword
Password: "secret"
JupyterhubURL: "https://jupyterhub.tiledb.example.com"
# SSOOkta:
# Domain: "domain-name.okta.com"
# ClientID: "client_id"
# ClientSecret: "secret"
# REQUIRED: Config ingress, be sure to set the hostname to where you want to expose the UI
ingress:
enabled: true
# REQUIRED: Set URL for web console
url:
- console.tiledb.example.com
# optional TLS
tls: []
#########################################
# TileDB Cloud Hosted Notebook Settings #
#########################################
jupyterhub:
# REQUIRED: Set the private registry credentials, these are the same as the `imageCredentials` above
imagePullSecret:
password: ""
proxy:
# REQUIRED: Set a signing secret here with `openssl rand -hex 32`
secretToken: "Secret"
# The pre-puller is used to to ensure the docker images for notebooks are pre-pulled to each node
# This can improve notebook startup time, but add additional storage requirements to the nodes
# If you wish to use dedicated k8s node groups for notebooks, see:
# https://zero-to-jupyterhub.readthedocs.io/en/0.8.2/optimization.html?highlight=labels#using-a-dedicated-node-pool-for-users
hub:
config:
CryptKeeper:
# REQUIRED: Set the jupyterhub auth secret for persistence, this should be a secure value
# We recommend creating a random value with `openssl rand -hex 32`
keys:
- "Secret"
TileDBCloud:
# REQUIRED: Set the oauth2 secret, this should be a secure value
# We recommend creating a random value with `openssl rand -hex
client_secret: "Secret"
# REQUIRED: Set the domain for the jupyterhub and the oauth2 service
# it is likely you just need to replace `example.com` with your own internal domain
# This should match the ingress settings above and the hydra settings below
oauth_callback_url: "http://jupyterhub.tiledb.example.com/hub/oauth_callback"
token_url: "http://oauth2.tiledb.example.com/oauth2/token"
authorize_url: "http://oauth2.tiledb.example.com/oauth2/auth"
userdata_url: "http://oauth2.tiledb.example.com/userinfo"
# REQUIRED: Set the domain for the REST API and the oauth2 service
# it is likely you just need to replace `example.com` with your own internal domain
# This should match the tiledb-cloud-rest settings above and the hydra settings below
extraEnv:
OAUTH2_AUTHORIZE_URL: "https://oauth2.tiledb.example.com/oauth2/auth"
OAUTH2_USERDATA_URL: "https://oauth2.tiledb.example.com/userinfo"
TILEDB_REST_HOST: "https://api.tiledb.example.com"
# Uncomment to disable SSL validation. Useful when testing deployments
# TILEDB_REST_IGNORE_SSL_VALIDATION: "true"
ingress:
enabled: true
# REQUIRED: set the ingress domain for hosted notebooks
hosts:
- "jupyterhub.tiledb.example.com"
# optional TLS
tls: []
########################################
# TileDB Cloud Oauth2 Service Settings #
########################################
hydra:
hydra:
# REQUIRED: Set the domain for the jupyterhub
# it is likely you just need to replace `example.com` with your own internal domain
# This should match the ingress settings above and the hydra settings below
dangerousAllowInsecureRedirectUrls:
- http://jupyterhub.tiledb.example.com/hub/oauth_callback
config:
# Optionally set the internal k8s cluster IP address space to allow non-ssl connections from
# This defaults to all private IP spaces
# serve:
# tls:
# allow_termination_from:
# Set to cluster IP
# - 172.20.0.0/12
secrets:
# REQUIRED: Set the oauth2 secret, this should be a secure value
# We recommend creating a random value with `openssl rand -hex 32`
system:
- "secret"
cookie:
- "Secret"
# REQUIRED: Set MariaDB Database connection, this defaults to the in k8s development settings.
# You will need to set this to the same connection parameters as the tiledb-cloud-rest section
dsn: "mysql://tiledb_user:password@tcp(tiledb-cloud-mariadb.tiledb-cloud.svc.cluster.local:3306)/tiledb_rest?parseTime=true"
urls:
self:
# REQUIRED: Update the domain for the oauth2 service and the web console ui
# It is likely you can just replace `example.com` with your own internal domain
issuer: "https://oauth2.tiledb.example.com/"
public: "https://oauth2.tiledb.example.com/"
login: "https://console.tiledb.example.com/oauth2/login"
consent: "https://console.tiledb.example.com/oauth2/consent"
# Configure ingress for oauth2 service
ingress:
public:
hosts:
# REQUIRED: set the ingress domain for oauth2 service
- host: "oauth2.tiledb.example.com"
paths:
- path: /
pathType: ImplementationSpecific
# optional TLS
tls: []
######################
# Ingress Controller #
######################
ingress-nginx:
# This is provided for ease of testing, it is recommend to establish your own ingress which fits your environment
enabled: false
## nginx configuration
## Ref: https://github.com/kubernetes/ingress/blob/master/controllers/nginx/configuration.md
##
controller:
name: controller
autoscaling:
enabled: true
minReplicas: 2
config:
use-proxy-protocol: "true"
log-format-escape-json: "true"
log-format-upstream: '{ "time": "$time_iso8601", "remote_addr": "$proxy_protocol_addr", "x-forward-for": "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user": "$remote_user", "bytes_sent": $bytes_sent, "request_time": $request_time, "status": $status, "vhost": "$host", "request_proto": "$server_protocol", "path": "$uri", "request_query": "$args", "request_length": $request_length, "duration": $request_time, "method": "$request_method", "http_referrer": "$http_referer", "http_user_agent": "$http_user_agent" }'
# Set timeouts to 1 hour
proxy-send-timeout: "3600"
proxy-read-timeout: "3600"
send-timeout: "3600"
client-max-body-size: "3076m"
proxy-body-size: "3076m"
proxy-buffering: "off"
proxy-request-buffering: "off"
proxy-http-version: "1.1"
ingressClass: nginx
## Allows customization of the external service
## the ingress will be bound to via DNS
publishService:
enabled: true
service:
annotations:
# Enable public facing load balancer
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
# Set any needed annotations. The default ones we have set are for aws ELB nginx
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
# Set aws-load-balancer-internal to allow all traffic from inside
# the vpc only, the -internal makes it not accessible to the internet
service.beta.kubernetes.io/aws-load-balancer: "0.0.0.0/0"
# Set timeout to 1 hour
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"
## Set external traffic policy to: "Local" to preserve source IP on
## providers supporting it
## Ref: https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-typeloadbalancer
externalTrafficPolicy: "Local"
type: LoadBalancer
MinIO usage for deployment
MinIO is commonly used as a self-hosted S3-compatible object store. MinIO works well with TileDB Cloud.
Provided that after you installed MinIO you have created a bucket called minio-tiledb-cloud-intermediate-result-storage
, the configuration below shows only the section that needs to be modified for use with MinIO.
MinIO Configuration
# MinIO configuration values for tiledb-cloud-enterprise.
# This is a YAML-formatted file.
##################################
# TileDB Cloud REST API settings #
##################################
tiledb-cloud-rest:
restConfig:
# REQUIRED: Define supported storage types and locations, if you want to use NFS
# enable "local"
StorageLocationsSupported:
- "s3"
ArraySettings:
# When enabled, AWS credentials will be auto-discovered
# from the Environment, config file, EC2 metadata etc.
AllowS3NoCredentials: false
# Change to false avoid any region checks for s3 compatible storage.
CheckS3Region: false
ResultStorage:
Config:
# REQUIRED: Configure with MinIO access details for task results storage bucket
- "vfs.s3.aws_access_key_id": "key"
- "vfs.s3.aws_secret_access_key": "secret"
# REQUIRED: Set to a minio bucket location for task results storage
Path: "s3://minio-tiledb-cloud-intermediate-result-storage"
Workflows:
BatchTaskParamsStorage:
S3BatchTaskParamsStorage:
# REQUIRED: Set to a minio bucket location for Task Graph Storage
Bucket: "s3://minio-tiledb-cloud-intermediate-result-storage"
Path: "argo-workflows"
TileDBEmbedded:
Config:
- "vfs.s3.scheme": "http"
- "vfs.s3.region": ""
# REQUIRED: Set to your minio host
- "vfs.s3.endpoint_override": "minio.example.com:9999"
- "vfs.s3.use_virtual_addressing": "false"
##################
# Argo Workflows #
##################
argo-workflows:
useStaticCredentials: true
artifactRepository:
s3:
# insecure will disable TLS. Primarily used for minio installs not configured with TLS
insecure: true
# REQUIRED: The Bucket where the artifacts are stored
bucket: "minio-tiledb-cloud-intermediate-result-storage"
# REQUIRED: Set to MinIO host
endpoint: minio.example.com:9999
# REQUIRED: Configure with minio access details
accessKeySecret:
name: minio-secret
key: MINIO_ACCESS_KEY_ID
secretKeySecret:
name: minio-secret
key: MINIO_SECRET_ACCESS_KEY