What Is Streamlit Caching? How to Use st.cache_data and st.cache_resource
Updated on
Streamlit caching is how you stop your app from repeating expensive work on every rerun.
What Streamlit caching means
If your app reloads a large CSV, rebuilds a model pipeline, or calls the same API every time a user changes one widget, caching is usually the fix. In modern Streamlit, that fix is split into two tools:
st.cache_datafor data resultsst.cache_resourcefor long-lived shared objects
Streamlit reruns your script from top to bottom whenever a widget changes. Caching tells Streamlit which work can be safely reused instead of recomputed. That is why caching matters much more in Streamlit than in many traditional web frameworks.
Why Streamlit caching matters
Caching matters because Streamlit apps rerun often, and those reruns can make an otherwise simple app feel slow, noisy, or wasteful if expensive work keeps happening over and over.
What Streamlit caching is actually for
Before getting into APIs, it helps to answer the practical question: what does caching improve?
It usually improves four things:
- app speed
- responsiveness during interaction
- resource usage
- code clarity around expensive operations
For example, caching can help when you:
- load the same dataset on every rerun
- query a database repeatedly with the same parameters
- call an external API that is slow or rate-limited
- initialize a machine learning model that should not reload for every user action
After reading this guide, you should be able to decide:
- whether a slow step belongs in cache at all
- which cache API fits that step
- how to avoid stale data and common cache bugs
Which cache API should you use?
Use st.cache_data when the function returns data such as a DataFrame, dict, list, or query result.
Use st.cache_resource when the function returns a reusable object such as a model, database engine, or SDK client.
Do not start new code with legacy @st.cache.
Quick Start
A simple data-caching example
This first example shows the most common caching use case: reading the same data file without reloading it on every rerun.
import pandas as pd
import streamlit as st
@st.cache_data(ttl="10m", show_spinner="Loading data...")
def load_sales_data(path: str) -> pd.DataFrame:
return pd.read_csv(path)
df = load_sales_data("sales.csv")
st.dataframe(df)A simple resource-caching example
This second example shows the other side of caching: reusing a heavyweight object that should be initialized once and then shared.
import streamlit as st
from transformers import pipeline
@st.cache_resource
def load_sentiment_model():
return pipeline("sentiment-analysis")
model = load_sentiment_model()
st.write(model("Streamlit caching is fast and predictable."))Why the old and new cache APIs feel different
Older Streamlit tutorials often use @st.cache. That is no longer the right default mental model, and it is also why many users now search for terms like st.cache deprecated, st.cache warning, or allow_output_mutation errors after upgrading Streamlit.
Modern Streamlit splits caching into two clearer concepts:
- data caching
- resource caching
That split is useful because a DataFrame and a database engine are not the same kind of thing. One is returned data. The other is a long-lived object that may be shared.
st.cache_data vs st.cache_resource
| Use case | Recommended API | Why |
|---|---|---|
| DataFrame loaded from disk | st.cache_data | The function returns data that can be reused safely. |
| API response | st.cache_data | It avoids repeating slow network calls. |
| SQL query results | st.cache_data | Query output is data, not a shared singleton. |
| ML model object | st.cache_resource | The model should be loaded once and reused. |
| Database engine or client | st.cache_resource | It is expensive to initialize and may be shared. |
| Per-user interaction state | st.session_state | That belongs to one session, not the global cache. |
What st.cache_data does
st.cache_data is for functions that return data. Streamlit hashes the function code and inputs. If nothing relevant changed, it returns the cached result instead of running the function again.
That makes it a good fit for:
- reading files
- transforming DataFrames
- calling APIs
- running queries
- expensive deterministic calculations
import requests
import streamlit as st
@st.cache_data(ttl=900)
def fetch_metrics(api_url: str) -> dict:
response = requests.get(api_url, timeout=30)
response.raise_for_status()
return response.json()Think of st.cache_data as "remember the output of this data-producing function until something important changes."
Useful st.cache_data parameters
| Parameter | What it does | When to care |
|---|---|---|
ttl | Expires cached entries after a duration or number of seconds | Use when the source data changes over time |
max_entries | Limits how many cached entries to keep | Useful when users trigger many unique inputs |
show_spinner | Controls the loading message on cache miss | Good for slow data loads |
show_time | Displays execution timing | Helpful while tuning performance |
persist | Persists cache to disk | Use only when you need cache to survive app restarts |
hash_funcs | Adds custom hashing behavior | Needed only for advanced types |
What st.cache_resource does
st.cache_resource is for long-lived objects you want to initialize once and reuse, such as:
- model pipelines
- database engines
- API clients
- reusable service objects
Unlike st.cache_data, the cached object is shared. That means thread safety matters.
import streamlit as st
from sqlalchemy import create_engine
@st.cache_resource
def get_engine(database_url: str):
return create_engine(database_url, pool_pre_ping=True)
engine = get_engine("sqlite:///analytics.db")If a resource should not be shared across the whole app, move it into st.session_state or use session scope instead.
import sqlite3
import streamlit as st
@st.cache_resource(scope="session", on_release=lambda conn: conn.close())
def get_connection(path: str):
return sqlite3.connect(path, check_same_thread=False)Think of st.cache_resource as "create this expensive object once, then keep reusing it."
Useful st.cache_resource parameters
| Parameter | What it does | When to care |
|---|---|---|
ttl | Rebuilds a resource after a period of time | Useful for expiring tokens or unstable clients |
validate | Checks whether the cached resource is still valid | Helpful for stale connections |
max_entries | Caps the number of cached resources | Useful when many unique configs exist |
show_spinner | Shows a message when the resource is created | Good for slow model initialization |
on_release | Cleans up when Streamlit releases the resource | Useful for closing handles cleanly |
scope | Chooses app-wide or session-scoped caching | Important for non-global resources |
Why caching helps real projects
Caching is not only about shaving milliseconds off toy demos. In real projects it can:
- make dashboard filters feel instant
- stop repeated file reads from dominating reruns
- lower API cost and rate-limit pressure
- keep heavyweight models from reloading over and over
- make apps feel more stable under repeated user interaction
In other words, caching often turns a "works, but feels clumsy" Streamlit app into one that feels production-ready.
Replacing legacy @st.cache
If an older codebase still uses @st.cache, translate it into one of the new APIs instead of copying it forward unchanged.
If the old function returns data, move it to st.cache_data.
Legacy data pattern:
@st.cache
def load_data(path):
return pd.read_csv(path)Modern replacement:
@st.cache_data
def load_data(path: str):
return pd.read_csv(path)If the old function was effectively caching a shared object with allow_output_mutation=True, move it to st.cache_resource.
Legacy model-loading pattern:
@st.cache(allow_output_mutation=True)
def load_model():
return pipeline("sentiment-analysis")Modern replacement:
@st.cache_resource
def load_model():
return pipeline("sentiment-analysis")Common caching patterns
Cache a DataFrame transform
This pattern is useful when your raw data is large enough that cleaning and aggregation become part of the slow step.
import pandas as pd
import streamlit as st
@st.cache_data
def prepare_revenue_table(raw: pd.DataFrame) -> pd.DataFrame:
cleaned = raw.dropna(subset=["region", "revenue"]).copy()
cleaned["revenue"] = cleaned["revenue"].round(2)
return cleaned.groupby("region", as_index=False)["revenue"].sum()Cache a query while skipping an unhashable connection argument
Arguments prefixed with _ are ignored by Streamlit's hasher.
import pandas as pd
import streamlit as st
@st.cache_data(ttl="5m")
def run_query(_engine, sql: str) -> pd.DataFrame:
return pd.read_sql(sql, _engine)Clear the cache
When you need a manual reset during debugging or after a data change, use one of the built-in clear methods:
load_sales_data.clear() # Clear one cached function
st.cache_data.clear() # Clear all data caches
st.cache_resource.clear() # Clear all resource cachesCommon mistakes and how to fix them
1. Using st.cache_resource for normal data
If a function returns a DataFrame, list, or dict, use st.cache_data. st.cache_resource is for shared long-lived objects.
2. Caching mutable shared objects by accident
If you cache a resource and then mutate it, every user may see the changed object. That is acceptable for some shared resources, but dangerous for user-specific state.
3. Expecting caching to fix every slow app
Caching helps repeated work. It does not fix slow plotting code, poor query design, or oversized page layouts by itself.
4. Expecting TTL to behave the same with disk persistence
When persist="disk" is enabled for st.cache_data, TTL is ignored. Use persistence only when you really need it.
5. Caching code with side effects
Cached functions should not quietly write files, mutate global state, or depend on hidden ambient state. Keep them predictable.
Why am I seeing st.cache deprecation warnings?
This is one of the most common migration questions.
If you upgraded Streamlit and started seeing warnings around st.cache, the short answer is that Streamlit split the old API into two more specific APIs:
st.cache_datafor datast.cache_resourcefor shared objects
In practice, the warning usually means your old caching code still works poorly for the newer mental model, even if the app has not fully broken yet. The right fix is usually migration, not suppression.
Troubleshooting
Why is my cached function still rerunning?
Usually one of these changed:
- the function code
- one of the function arguments
- the cache entry expired because of
ttl - the cache was cleared manually
Why does my app show stale data?
Your cache is doing what you asked it to do. Lower ttl, clear the cache, or include the changing input in the function arguments.
Can I use widgets inside a cached function?
Avoid it. Cached functions should focus on computation or resource setup, not UI flow.
Should I cache async objects?
No. Streamlit warns against caching async objects or objects that depend on event loops.
Is cache data safe?
st.cache_data stores return values in pickled form, so cached values should be treated as trusted application data only.
Caching vs session state
Caching and session state solve different problems.
- use caching when the goal is to avoid recomputation
- use
st.session_statewhen the goal is to remember a user's current interaction state
Examples of session state:
- selected filters
- wizard progress
- draft form values
- current view or step
Related Guides
- Streamlit Session State
- Streamlit DataFrame
- How to Run a Streamlit App
- Streamlit Upload File
- Streamlit Components
Frequently Asked Questions
What is Streamlit caching in simple terms?
It is Streamlit's way of reusing expensive work between reruns so your app does not have to recompute everything every time a widget changes.
What replaced @st.cache in Streamlit?
Use st.cache_data for data results and st.cache_resource for shared objects such as models or connections.
Why am I seeing st.cache is deprecated in Streamlit?
Because modern Streamlit split the old cache API into st.cache_data and st.cache_resource. If your code still uses @st.cache, update it based on whether the function returns data or a shared object.
When should I use st.cache_data?
Use it for DataFrames, query results, API responses, and other pure function outputs that should only recompute when the inputs change.
When should I use st.cache_resource?
Use it for expensive shared objects such as model pipelines, clients, and database engines that should be created once and reused.