Pandas DataFrame to CSV: Complete Guide to to_csv()
Updated on
You cleaned the data, transformed every column, and ran your analysis. Now you need to share the results -- and the CSV file you export opens with garbled characters in Excel, an unwanted index column, or a 4 GB file that chokes your colleague's laptop. Exporting a Pandas DataFrame to CSV sounds simple until encoding, separators, compression, and large-file performance turn a one-liner into a debugging session.
These problems are common because to_csv() has over 20 parameters, and the defaults do not always match what downstream consumers expect. A wrong encoding breaks non-ASCII text. A missing index=False adds a mystery column. No compression turns a reasonable dataset into an oversized file attachment.
The DataFrame.to_csv() method handles all of these cases once you know which parameters to set. This guide walks through every practical scenario -- from basic export to chunked writing of million-row DataFrames -- so you can export data correctly on the first try.
Basic Usage
The simplest call writes a DataFrame to a CSV file in the current working directory:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie'],
'age': [28, 35, 42],
'salary': [72000, 88000, 95000]
})
df.to_csv('employees.csv')This creates employees.csv with the following content:
,name,age,salary
0,Alice,28,72000
1,Bob,35,88000
2,Charlie,42,95000Notice the first column -- that is the DataFrame index. Most of the time you do not want it. Fix this with index=False:
df.to_csv('employees.csv', index=False)Output:
name,age,salary
Alice,28,72000
Bob,35,88000
Charlie,42,95000Full Method Signature
Here is the complete to_csv() signature with the parameters you will use most often:
DataFrame.to_csv(
path_or_buf=None, # file path or buffer
sep=',', # delimiter
na_rep='', # string for missing values
float_format=None, # format string for floats
columns=None, # subset of columns to write
header=True, # write column names
index=True, # write row index
index_label=None, # column label for index
mode='w', # write mode ('w', 'a', 'x')
encoding=None, # file encoding (default utf-8)
compression='infer',# compression type
quoting=None, # csv.QUOTE_* constant
lineterminator=None,# newline character
chunksize=None, # rows per chunk
date_format=None, # format string for datetime
errors='strict', # encoding error handling
)Key Parameters Explained
sep -- Custom Delimiters
Not every system reads commas. Use sep to change the delimiter:
import pandas as pd
df = pd.DataFrame({
'product': ['Widget A', 'Widget B'],
'price': [19.99, 29.99]
})
# Tab-separated
df.to_csv('products.tsv', sep='\t', index=False)
# Semicolon-separated (common in European locales)
df.to_csv('products_eu.csv', sep=';', index=False)
# Pipe-separated
df.to_csv('products.txt', sep='|', index=False)columns -- Export a Subset
Write only specific columns instead of the entire DataFrame:
import pandas as pd
df = pd.DataFrame({
'id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie'],
'email': ['a@x.com', 'b@x.com', 'c@x.com'],
'internal_score': [0.87, 0.92, 0.78]
})
# Export only the columns the client needs
df.to_csv('export.csv', columns=['id', 'name', 'email'], index=False)header -- Control Column Names
You can also rename columns before exporting to give them cleaner names.
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'score': [85, 92]
})
# No header row
df.to_csv('no_header.csv', header=False, index=False)
# Custom header names
df.to_csv('custom_header.csv', header=['Full Name', 'Test Score'], index=False)na_rep -- Represent Missing Values
By default, NaN values become empty strings. Use na_rep to make them explicit:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'sensor': ['A', 'B', 'C'],
'reading': [23.5, np.nan, 18.2]
})
# Default: empty string for NaN
df.to_csv('readings_default.csv', index=False)
# sensor,reading
# A,23.5
# B,
# C,18.2
# Explicit missing value marker
df.to_csv('readings_marked.csv', na_rep='NULL', index=False)
# sensor,reading
# A,23.5
# B,NULL
# C,18.2float_format -- Control Decimal Precision
import pandas as pd
df = pd.DataFrame({
'item': ['Widget', 'Gadget'],
'price': [19.999999, 42.123456]
})
# Two decimal places
df.to_csv('prices.csv', float_format='%.2f', index=False)
# item,price
# Widget,20.00
# Gadget,42.12date_format -- Format Datetime Columns
import pandas as pd
df = pd.DataFrame({
'event': ['Launch', 'Review'],
'date': pd.to_datetime(['2026-03-15 14:30:00', '2026-04-01 09:00:00'])
})
# ISO date only (no time)
df.to_csv('events.csv', date_format='%Y-%m-%d', index=False)
# event,date
# Launch,2026-03-15
# Review,2026-04-01Handling Encoding and Special Characters
Encoding is the number one cause of garbled CSV files. The default encoding in to_csv() is utf-8, which works for most systems. But Excel on Windows expects a BOM (Byte Order Mark) to correctly display non-ASCII characters.
UTF-8 (Default)
import pandas as pd
df = pd.DataFrame({
'city': ['Zurich', 'Munchen', 'Tokyo'],
'greeting': ['Gruezi', 'Servus', 'Konnichiwa']
})
# Standard UTF-8
df.to_csv('cities.csv', index=False, encoding='utf-8')UTF-8 with BOM for Excel
If your CSV contains accented characters, CJK text, or special symbols and needs to open correctly in Microsoft Excel:
import pandas as pd
df = pd.DataFrame({
'city': ['Zurich', 'Munchen', 'Sao Paulo', 'Beijing'],
'population': [434000, 1472000, 12330000, 21540000]
})
# utf-8-sig adds a BOM that Excel recognizes
df.to_csv('cities_excel.csv', index=False, encoding='utf-8-sig')Other Encodings
import pandas as pd
df = pd.DataFrame({'col': ['data']})
# Latin-1 for legacy Western European systems
df.to_csv('legacy.csv', index=False, encoding='latin-1')
# Shift-JIS for Japanese legacy systems
df.to_csv('japanese.csv', index=False, encoding='shift_jis')| Encoding | When to Use | Excel Compatible? |
|---|---|---|
utf-8 | Default for most systems, APIs, databases | Partial (no BOM) |
utf-8-sig | Excel on Windows with non-ASCII text | Yes |
latin-1 / iso-8859-1 | Legacy Western European systems | Yes |
shift_jis | Japanese legacy systems | Yes |
cp1252 | Windows Western European | Yes |
Large DataFrames: Compression and Chunking
Compression
Compression reduces file size by 60-90% with no data loss. Pandas supports multiple formats:
import pandas as pd
import numpy as np
# Create a large DataFrame
df = pd.DataFrame({
'id': range(1_000_000),
'value': np.random.randn(1_000_000),
'category': np.random.choice(['A', 'B', 'C', 'D'], 1_000_000)
})
# Gzip compression (most common)
df.to_csv('large_data.csv.gz', index=False, compression='gzip')
# Zip compression
df.to_csv('large_data.zip', index=False, compression='zip')
# Bz2 compression (smaller but slower)
df.to_csv('large_data.csv.bz2', index=False, compression='bz2')
# Zstandard (fast, good ratio -- requires zstandard package)
df.to_csv('large_data.csv.zst', index=False, compression='zstd')| Compression | File Extension | Speed | Size Reduction | Read Back |
|---|---|---|---|---|
| None | .csv | Fastest write | 0% | pd.read_csv('f.csv') |
| gzip | .csv.gz | Moderate | 70-85% | pd.read_csv('f.csv.gz') |
| zip | .zip | Moderate | 70-85% | pd.read_csv('f.zip') |
| bz2 | .csv.bz2 | Slow | 75-90% | pd.read_csv('f.csv.bz2') |
| zstd | .csv.zst | Fast | 70-85% | pd.read_csv('f.csv.zst') |
Pandas automatically infers the compression format from the file extension, so compression='infer' (the default) usually works.
Chunksize -- Write in Batches
For very large DataFrames that strain memory, write in chunks:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'id': range(5_000_000),
'value': np.random.randn(5_000_000)
})
# Write 500,000 rows at a time
df.to_csv('huge_data.csv', index=False, chunksize=500_000)The chunksize parameter does not reduce the final file size -- it controls how many rows are flushed to disk at once. This can help reduce peak memory usage during the write operation.
Writing to a String or Buffer
You do not always write to a file. Sometimes you need the CSV as a string for an API call, a database insert, or in-memory processing:
import pandas as pd
df = pd.DataFrame({
'name': ['Alice', 'Bob'],
'score': [85, 92]
})
# Get CSV as a string
csv_string = df.to_csv(index=False)
print(csv_string)
# name,score
# Alice,85
# Bob,92Writing to StringIO or BytesIO
import pandas as pd
from io import StringIO, BytesIO
df = pd.DataFrame({
'x': [1, 2, 3],
'y': [4, 5, 6]
})
# StringIO buffer
buffer = StringIO()
df.to_csv(buffer, index=False)
csv_text = buffer.getvalue()
# BytesIO buffer (useful for HTTP responses, S3 uploads)
byte_buffer = BytesIO()
df.to_csv(byte_buffer, index=False, encoding='utf-8')
csv_bytes = byte_buffer.getvalue()This pattern is common when uploading CSV data to cloud storage (S3, GCS) without creating a local file.
Appending to an Existing CSV
To add rows to an existing file without overwriting it, use mode='a' and header=False:
import pandas as pd
# First batch -- write with header
df1 = pd.DataFrame({'name': ['Alice'], 'score': [85]})
df1.to_csv('results.csv', index=False)
# Second batch -- append without duplicating the header
df2 = pd.DataFrame({'name': ['Bob', 'Charlie'], 'score': [92, 78]})
df2.to_csv('results.csv', mode='a', header=False, index=False)The resulting file has one header row followed by all three data rows. Omitting header=False would insert a second header line in the middle of the data.
Quoting and Escaping
Fields containing the delimiter, newlines, or quotes need proper quoting. Pandas handles this automatically, but you can control the behavior:
import pandas as pd
import csv
df = pd.DataFrame({
'name': ['O\'Brien', 'Smith, Jr.', 'Alice "Ace" Wong'],
'notes': ['Contains apostrophe', 'Contains comma', 'Contains quotes']
})
# Default: quote only when necessary (QUOTE_MINIMAL)
df.to_csv('minimal.csv', index=False)
# Quote all fields
df.to_csv('all_quoted.csv', index=False, quoting=csv.QUOTE_ALL)
# Quote only non-numeric fields
df.to_csv('nonnumeric.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)Export Format Comparison: to_csv() vs to_excel() vs to_parquet()
Choosing the right export format depends on who consumes the file and how large it is:
| Feature | to_csv() | to_excel() | to_parquet() |
|---|---|---|---|
| File format | Plain text | Binary (xlsx) | Binary (columnar) |
| Human readable | Yes | Via Excel/Sheets | No |
| File size (1M rows) | ~50-100 MB | ~30-60 MB | ~5-15 MB |
| Write speed | Fast | Slow | Fast |
| Read speed | Moderate | Slow | Very fast |
| Type preservation | No (all strings) | Partial | Full |
| Requires extra library | No | openpyxl | pyarrow / fastparquet |
| Compression support | gzip, zip, bz2, zstd | Built-in | Built-in (snappy, gzip) |
| Best for | Interop, APIs, quick sharing | Business users, Excel users | Analytics pipelines, big data |
Rule of thumb: Use CSV for sharing data with non-technical users or external systems. Use Parquet for internal pipelines where speed and type fidelity matter. Use Excel when the recipient needs formatting or multiple sheets.
Common Pitfalls and How to Avoid Them
1. Unwanted Index Column
The most frequent complaint. Always pass index=False unless your index carries meaningful data (see reset_index() for cleaning up indexes before export):
# Wrong -- adds mystery "Unnamed: 0" column when read back
df.to_csv('data.csv')
# Right
df.to_csv('data.csv', index=False)2. Unnamed Columns on Read-Back
If you exported with the index and then read the file back with read_csv(), you get an Unnamed: 0 column:
import pandas as pd
# Reading back a CSV that was exported with index
df = pd.read_csv('data.csv', index_col=0) # tell read_csv which column is the index3. Losing Datetime Precision
Without date_format, datetime columns get exported as full timestamps including microseconds. Control this explicitly:
import pandas as pd
df = pd.DataFrame({
'ts': pd.to_datetime(['2026-01-15 08:30:00.123456'])
})
df.to_csv('ts.csv', index=False, date_format='%Y-%m-%d %H:%M:%S')4. Float Rounding Issues
Floating-point numbers can produce unexpected trailing digits:
import pandas as pd
df = pd.DataFrame({'val': [0.1 + 0.2]})
df.to_csv('float.csv', index=False)
# val
# 0.30000000000000004
df.to_csv('float_clean.csv', index=False, float_format='%.2f')
# val
# 0.305. Mixed Types in a Column
If a column has mixed types (integers and strings), to_csv() converts everything to strings. Verify your data types before exporting:
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 'three']})
print(df.dtypes)
# id object <-- mixed type
# Clean before export
df['id'] = pd.to_numeric(df['id'], errors='coerce')
df.to_csv('clean.csv', index=False)Visualize Your Data Before Export with PyGWalker
Before exporting a DataFrame to CSV, it helps to verify the data looks correct. PyGWalker (opens in a new tab) turns any Pandas DataFrame into an interactive visual interface inside Jupyter Notebook -- drag and drop columns to build charts without writing plot code:
import pandas as pd
import pygwalker as pyg
df = pd.read_csv('raw_data.csv')
# Explore visually before exporting
walker = pyg.walk(df)
# Once satisfied, export
df.to_csv('verified_export.csv', index=False, encoding='utf-8-sig')This workflow catches issues like unexpected nulls, outlier values, or wrong data types before they reach your CSV consumer.
FAQ
How do I export a Pandas DataFrame to CSV without the index?
Pass index=False to to_csv(): df.to_csv('file.csv', index=False). This prevents the row index from being written as the first column. When reading the file back, you will not see an extra Unnamed: 0 column.
How do I handle special characters and encoding when exporting to CSV?
Use encoding='utf-8-sig' if the CSV needs to open correctly in Microsoft Excel with non-ASCII characters (accented letters, CJK text). For standard systems and APIs, the default encoding='utf-8' works. For legacy systems, use the specific encoding they require (latin-1, shift_jis, etc.).
How do I compress a large CSV file when exporting from Pandas?
Add a compression extension to the filename and Pandas handles the rest: df.to_csv('data.csv.gz', index=False) creates a gzip-compressed file. You can also set compression='gzip', 'zip', 'bz2', or 'zstd' explicitly. Compressed files are typically 70-90% smaller and can be read back directly with pd.read_csv().
Can I append data to an existing CSV file instead of overwriting it?
Yes. Use mode='a' and header=False: df.to_csv('file.csv', mode='a', header=False, index=False). The mode='a' opens the file for appending, and header=False prevents writing a duplicate header row in the middle of the file.
What is the difference between to_csv() and to_parquet() in Pandas?
to_csv() produces a human-readable text file but loses type information (everything becomes strings). to_parquet() creates a compact binary file that preserves data types (integers, floats, datetimes), reads 5-10x faster, and is 5-10x smaller. Use CSV for sharing with non-technical users or external systems. Use Parquet for internal data pipelines where performance and type fidelity matter.
Conclusion
DataFrame.to_csv() is the standard way to export Pandas data to CSV files. For clean exports, always pass index=False. Use encoding='utf-8-sig' when Excel compatibility matters. Compress large files with gzip or zstd to reduce size by 70-90%. Control decimal precision with float_format, handle missing values with na_rep, and append to existing files with mode='a'. For data pipelines where type preservation and speed matter, consider to_parquet() instead. Before exporting, verify your data visually with tools like PyGWalker to catch issues early.
Related Guides
- Pandas read_csv: Import CSV Files
- Pandas to_datetime: Convert Strings and Timestamps
- Pandas Rename Column
- Pandas reset_index(): Resetting DataFrame Index