Skip to content

What Is Streamlit Caching? How to Use st.cache_data and st.cache_resource

Updated on

Streamlit caching is how you stop your app from repeating expensive work on every rerun.

What Streamlit caching means

If your app reloads a large CSV, rebuilds a model pipeline, or calls the same API every time a user changes one widget, caching is usually the fix. In modern Streamlit, that fix is split into two tools:

  • st.cache_data for data results
  • st.cache_resource for long-lived shared objects

Streamlit reruns your script from top to bottom whenever a widget changes. Caching tells Streamlit which work can be safely reused instead of recomputed. That is why caching matters much more in Streamlit than in many traditional web frameworks.

Why Streamlit caching matters

Caching matters because Streamlit apps rerun often, and those reruns can make an otherwise simple app feel slow, noisy, or wasteful if expensive work keeps happening over and over.

What Streamlit caching is actually for

Before getting into APIs, it helps to answer the practical question: what does caching improve?

It usually improves four things:

  1. app speed
  2. responsiveness during interaction
  3. resource usage
  4. code clarity around expensive operations

For example, caching can help when you:

  • load the same dataset on every rerun
  • query a database repeatedly with the same parameters
  • call an external API that is slow or rate-limited
  • initialize a machine learning model that should not reload for every user action

After reading this guide, you should be able to decide:

  • whether a slow step belongs in cache at all
  • which cache API fits that step
  • how to avoid stale data and common cache bugs

Which cache API should you use?

Use st.cache_data when the function returns data such as a DataFrame, dict, list, or query result.

Use st.cache_resource when the function returns a reusable object such as a model, database engine, or SDK client.

Do not start new code with legacy @st.cache.

Quick Start

A simple data-caching example

This first example shows the most common caching use case: reading the same data file without reloading it on every rerun.

import pandas as pd
import streamlit as st
 
@st.cache_data(ttl="10m", show_spinner="Loading data...")
def load_sales_data(path: str) -> pd.DataFrame:
    return pd.read_csv(path)
 
df = load_sales_data("sales.csv")
st.dataframe(df)

A simple resource-caching example

This second example shows the other side of caching: reusing a heavyweight object that should be initialized once and then shared.

import streamlit as st
from transformers import pipeline
 
@st.cache_resource
def load_sentiment_model():
    return pipeline("sentiment-analysis")
 
model = load_sentiment_model()
st.write(model("Streamlit caching is fast and predictable."))

Why the old and new cache APIs feel different

Older Streamlit tutorials often use @st.cache. That is no longer the right default mental model, and it is also why many users now search for terms like st.cache deprecated, st.cache warning, or allow_output_mutation errors after upgrading Streamlit.

Modern Streamlit splits caching into two clearer concepts:

  • data caching
  • resource caching

That split is useful because a DataFrame and a database engine are not the same kind of thing. One is returned data. The other is a long-lived object that may be shared.

st.cache_data vs st.cache_resource

Use caseRecommended APIWhy
DataFrame loaded from diskst.cache_dataThe function returns data that can be reused safely.
API responsest.cache_dataIt avoids repeating slow network calls.
SQL query resultsst.cache_dataQuery output is data, not a shared singleton.
ML model objectst.cache_resourceThe model should be loaded once and reused.
Database engine or clientst.cache_resourceIt is expensive to initialize and may be shared.
Per-user interaction statest.session_stateThat belongs to one session, not the global cache.

What st.cache_data does

st.cache_data is for functions that return data. Streamlit hashes the function code and inputs. If nothing relevant changed, it returns the cached result instead of running the function again.

That makes it a good fit for:

  • reading files
  • transforming DataFrames
  • calling APIs
  • running queries
  • expensive deterministic calculations
import requests
import streamlit as st
 
@st.cache_data(ttl=900)
def fetch_metrics(api_url: str) -> dict:
    response = requests.get(api_url, timeout=30)
    response.raise_for_status()
    return response.json()

Think of st.cache_data as "remember the output of this data-producing function until something important changes."

Useful st.cache_data parameters

ParameterWhat it doesWhen to care
ttlExpires cached entries after a duration or number of secondsUse when the source data changes over time
max_entriesLimits how many cached entries to keepUseful when users trigger many unique inputs
show_spinnerControls the loading message on cache missGood for slow data loads
show_timeDisplays execution timingHelpful while tuning performance
persistPersists cache to diskUse only when you need cache to survive app restarts
hash_funcsAdds custom hashing behaviorNeeded only for advanced types

What st.cache_resource does

st.cache_resource is for long-lived objects you want to initialize once and reuse, such as:

  • model pipelines
  • database engines
  • API clients
  • reusable service objects

Unlike st.cache_data, the cached object is shared. That means thread safety matters.

import streamlit as st
from sqlalchemy import create_engine
 
@st.cache_resource
def get_engine(database_url: str):
    return create_engine(database_url, pool_pre_ping=True)
 
engine = get_engine("sqlite:///analytics.db")

If a resource should not be shared across the whole app, move it into st.session_state or use session scope instead.

import sqlite3
import streamlit as st
 
@st.cache_resource(scope="session", on_release=lambda conn: conn.close())
def get_connection(path: str):
    return sqlite3.connect(path, check_same_thread=False)

Think of st.cache_resource as "create this expensive object once, then keep reusing it."

Useful st.cache_resource parameters

ParameterWhat it doesWhen to care
ttlRebuilds a resource after a period of timeUseful for expiring tokens or unstable clients
validateChecks whether the cached resource is still validHelpful for stale connections
max_entriesCaps the number of cached resourcesUseful when many unique configs exist
show_spinnerShows a message when the resource is createdGood for slow model initialization
on_releaseCleans up when Streamlit releases the resourceUseful for closing handles cleanly
scopeChooses app-wide or session-scoped cachingImportant for non-global resources

Why caching helps real projects

Caching is not only about shaving milliseconds off toy demos. In real projects it can:

  • make dashboard filters feel instant
  • stop repeated file reads from dominating reruns
  • lower API cost and rate-limit pressure
  • keep heavyweight models from reloading over and over
  • make apps feel more stable under repeated user interaction

In other words, caching often turns a "works, but feels clumsy" Streamlit app into one that feels production-ready.

Replacing legacy @st.cache

If an older codebase still uses @st.cache, translate it into one of the new APIs instead of copying it forward unchanged.

If the old function returns data, move it to st.cache_data.

Legacy data pattern:

@st.cache
def load_data(path):
    return pd.read_csv(path)

Modern replacement:

@st.cache_data
def load_data(path: str):
    return pd.read_csv(path)

If the old function was effectively caching a shared object with allow_output_mutation=True, move it to st.cache_resource.

Legacy model-loading pattern:

@st.cache(allow_output_mutation=True)
def load_model():
    return pipeline("sentiment-analysis")

Modern replacement:

@st.cache_resource
def load_model():
    return pipeline("sentiment-analysis")

Common caching patterns

Cache a DataFrame transform

This pattern is useful when your raw data is large enough that cleaning and aggregation become part of the slow step.

import pandas as pd
import streamlit as st
 
@st.cache_data
def prepare_revenue_table(raw: pd.DataFrame) -> pd.DataFrame:
    cleaned = raw.dropna(subset=["region", "revenue"]).copy()
    cleaned["revenue"] = cleaned["revenue"].round(2)
    return cleaned.groupby("region", as_index=False)["revenue"].sum()

Cache a query while skipping an unhashable connection argument

Arguments prefixed with _ are ignored by Streamlit's hasher.

import pandas as pd
import streamlit as st
 
@st.cache_data(ttl="5m")
def run_query(_engine, sql: str) -> pd.DataFrame:
    return pd.read_sql(sql, _engine)

Clear the cache

When you need a manual reset during debugging or after a data change, use one of the built-in clear methods:

load_sales_data.clear()      # Clear one cached function
st.cache_data.clear()        # Clear all data caches
st.cache_resource.clear()    # Clear all resource caches

Common mistakes and how to fix them

1. Using st.cache_resource for normal data

If a function returns a DataFrame, list, or dict, use st.cache_data. st.cache_resource is for shared long-lived objects.

2. Caching mutable shared objects by accident

If you cache a resource and then mutate it, every user may see the changed object. That is acceptable for some shared resources, but dangerous for user-specific state.

3. Expecting caching to fix every slow app

Caching helps repeated work. It does not fix slow plotting code, poor query design, or oversized page layouts by itself.

4. Expecting TTL to behave the same with disk persistence

When persist="disk" is enabled for st.cache_data, TTL is ignored. Use persistence only when you really need it.

5. Caching code with side effects

Cached functions should not quietly write files, mutate global state, or depend on hidden ambient state. Keep them predictable.

Why am I seeing st.cache deprecation warnings?

This is one of the most common migration questions.

If you upgraded Streamlit and started seeing warnings around st.cache, the short answer is that Streamlit split the old API into two more specific APIs:

  • st.cache_data for data
  • st.cache_resource for shared objects

In practice, the warning usually means your old caching code still works poorly for the newer mental model, even if the app has not fully broken yet. The right fix is usually migration, not suppression.

Troubleshooting

Why is my cached function still rerunning?

Usually one of these changed:

  • the function code
  • one of the function arguments
  • the cache entry expired because of ttl
  • the cache was cleared manually

Why does my app show stale data?

Your cache is doing what you asked it to do. Lower ttl, clear the cache, or include the changing input in the function arguments.

Can I use widgets inside a cached function?

Avoid it. Cached functions should focus on computation or resource setup, not UI flow.

Should I cache async objects?

No. Streamlit warns against caching async objects or objects that depend on event loops.

Is cache data safe?

st.cache_data stores return values in pickled form, so cached values should be treated as trusted application data only.

Caching vs session state

Caching and session state solve different problems.

  • use caching when the goal is to avoid recomputation
  • use st.session_state when the goal is to remember a user's current interaction state

Examples of session state:

  • selected filters
  • wizard progress
  • draft form values
  • current view or step

Related Guides

Frequently Asked Questions

What is Streamlit caching in simple terms?

It is Streamlit's way of reusing expensive work between reruns so your app does not have to recompute everything every time a widget changes.

What replaced @st.cache in Streamlit?

Use st.cache_data for data results and st.cache_resource for shared objects such as models or connections.

Why am I seeing st.cache is deprecated in Streamlit?

Because modern Streamlit split the old cache API into st.cache_data and st.cache_resource. If your code still uses @st.cache, update it based on whether the function returns data or a shared object.

When should I use st.cache_data?

Use it for DataFrames, query results, API responses, and other pure function outputs that should only recompute when the inputs change.

When should I use st.cache_resource?

Use it for expensive shared objects such as model pipelines, clients, and database engines that should be created once and reused.