Drug Discovery App

life sciences

single cell (soma)

tutorials

dashboards

notebooks

python

streamlit

Learn how to build a drug discovery app in TileDB that accepts an AnnData file and produces rich visualizations with Streamlit.

Streamlit turns Python code into shareable web applications, without the need for front end knowledge. With some extra configuration, you can create Streamlit dashboards based on Jupyter notebooks in TileDB.

Here, you will build a drug discovery app that analyzes single-cell RNA-seq data. This app shows a summary of the number of cells and genes, a visualization of the proximity of genes to cells generated with either the Leiden algorithm or the Louvain method, and the marker gene expression.

Prerequisites

To complete this tutorial successfully, make sure you do the following:

Launch a new notebook server with the Genomics image.
Create the streamlit_jupyter plugin.

Steps

Start by creating a new notebook in TileDB. In the first cell, you’ll load the streamlit_jupyter plugin you created earlier.

Python

%load_ext streamlit_jupyter

Install Streamlit and Louvain:

Python

%pip install --quiet --user -U --no-warn-script-location streamlit louvain

Add the %%streamlit cell magic, and add the rest of your code to build your dashboard:

Python

%%streamlit
import scanpy as sc
import pandas as pd
import matplotlib.pyplot as plt

# App Title
st.title("Single-Cell Drug Discovery App")
st.markdown("""
This app allows you to analyze single-cell RNA-seq data for drug discovery. 
Upload your dataset, visualize cell clusters, and explore marker gene expression.
""")

# Sidebar for user inputs
st.sidebar.header("Upload Options")
uploaded_file = st.sidebar.file_uploader("Upload h5ad File", type=["h5ad"])

# Sidebar parameters for visualization
st.sidebar.header("Visualization Parameters")
cluster_method = st.sidebar.selectbox("Clustering Method", ["leiden", "louvain"])
marker_gene = st.sidebar.text_input("Marker Gene", "GAPDH")
show_umap = st.sidebar.checkbox("Show UMAP", True)

# Main content
if uploaded_file:
    # Load the dataset
    st.write("## Dataset Summary")
    adata = sc.read_h5ad(uploaded_file)
    st.write(f"Dataset contains {adata.n_obs} cells and {adata.n_vars} genes.")

    # Perform clustering and UMAP
    st.write("## Clustering and Visualization")
    sc.pp.neighbors(adata)
    sc.tl.umap(adata)
    sc.tl.leiden(adata) if cluster_method == "leiden" else sc.tl.louvain(adata)

    # Visualize UMAP
    if show_umap:
        fig, ax = plt.subplots(figsize=(6, 6))
        sc.pl.umap(adata, color=cluster_method, show=False, ax=ax)
        st.pyplot(fig)

    # Marker gene visualization
    st.write("## Marker Gene Expression")
    if marker_gene in adata.var_names:
        fig, ax = plt.subplots(figsize=(6, 6))
        sc.pl.umap(adata, color=marker_gene, show=False, ax=ax)
        st.pyplot(fig)
    else:
        st.error(f"Marker gene '{marker_gene}' not found in dataset.")

    # Export clustered data
    st.write("## Download Clustered Data")
    output_file = f"clustered_{cluster_method}.h5ad"
    adata.write(output_file)
    with open(output_file, "rb") as f:
        st.download_button("Download Clustered Data", f, file_name=output_file)

else:
    st.info("Please upload a `.h5ad` file to start.")

Convert the notebook to a dashboard, and launch the dashboard in the existing server. It should look like the following:

Upload an H5AD file. Select the Clustering Method you want to use (possible values are leiden or louvain). Pass a Marker Gene. Now, the Streamlit app should generate the visualization.