Skip to content

Pandas DataFrame to CSV: Complete Guide to to_csv()

Updated on

You cleaned the data, transformed every column, and ran your analysis. Now you need to share the results -- and the CSV file you export opens with garbled characters in Excel, an unwanted index column, or a 4 GB file that chokes your colleague's laptop. Exporting a Pandas DataFrame to CSV sounds simple until encoding, separators, compression, and large-file performance turn a one-liner into a debugging session.

These problems are common because to_csv() has over 20 parameters, and the defaults do not always match what downstream consumers expect. A wrong encoding breaks non-ASCII text. A missing index=False adds a mystery column. No compression turns a reasonable dataset into an oversized file attachment.

The DataFrame.to_csv() method handles all of these cases once you know which parameters to set. This guide walks through every practical scenario -- from basic export to chunked writing of million-row DataFrames -- so you can export data correctly on the first try.

📚

Basic Usage

The simplest call writes a DataFrame to a CSV file in the current working directory:

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [28, 35, 42],
    'salary': [72000, 88000, 95000]
})
 
df.to_csv('employees.csv')

This creates employees.csv with the following content:

,name,age,salary
0,Alice,28,72000
1,Bob,35,88000
2,Charlie,42,95000

Notice the first column -- that is the DataFrame index. Most of the time you do not want it. Fix this with index=False:

df.to_csv('employees.csv', index=False)

Output:

name,age,salary
Alice,28,72000
Bob,35,88000
Charlie,42,95000

Full Method Signature

Here is the complete to_csv() signature with the parameters you will use most often:

DataFrame.to_csv(
    path_or_buf=None,   # file path or buffer
    sep=',',            # delimiter
    na_rep='',          # string for missing values
    float_format=None,  # format string for floats
    columns=None,       # subset of columns to write
    header=True,        # write column names
    index=True,         # write row index
    index_label=None,   # column label for index
    mode='w',           # write mode ('w', 'a', 'x')
    encoding=None,      # file encoding (default utf-8)
    compression='infer',# compression type
    quoting=None,       # csv.QUOTE_* constant
    lineterminator=None,# newline character
    chunksize=None,     # rows per chunk
    date_format=None,   # format string for datetime
    errors='strict',    # encoding error handling
)

Key Parameters Explained

sep -- Custom Delimiters

Not every system reads commas. Use sep to change the delimiter:

import pandas as pd
 
df = pd.DataFrame({
    'product': ['Widget A', 'Widget B'],
    'price': [19.99, 29.99]
})
 
# Tab-separated
df.to_csv('products.tsv', sep='\t', index=False)
 
# Semicolon-separated (common in European locales)
df.to_csv('products_eu.csv', sep=';', index=False)
 
# Pipe-separated
df.to_csv('products.txt', sep='|', index=False)

columns -- Export a Subset

Write only specific columns instead of the entire DataFrame:

import pandas as pd
 
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'email': ['a@x.com', 'b@x.com', 'c@x.com'],
    'internal_score': [0.87, 0.92, 0.78]
})
 
# Export only the columns the client needs
df.to_csv('export.csv', columns=['id', 'name', 'email'], index=False)

header -- Control Column Names

You can also rename columns before exporting to give them cleaner names.

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'score': [85, 92]
})
 
# No header row
df.to_csv('no_header.csv', header=False, index=False)
 
# Custom header names
df.to_csv('custom_header.csv', header=['Full Name', 'Test Score'], index=False)

na_rep -- Represent Missing Values

By default, NaN values become empty strings. Use na_rep to make them explicit:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'sensor': ['A', 'B', 'C'],
    'reading': [23.5, np.nan, 18.2]
})
 
# Default: empty string for NaN
df.to_csv('readings_default.csv', index=False)
# sensor,reading
# A,23.5
# B,
# C,18.2
 
# Explicit missing value marker
df.to_csv('readings_marked.csv', na_rep='NULL', index=False)
# sensor,reading
# A,23.5
# B,NULL
# C,18.2

float_format -- Control Decimal Precision

import pandas as pd
 
df = pd.DataFrame({
    'item': ['Widget', 'Gadget'],
    'price': [19.999999, 42.123456]
})
 
# Two decimal places
df.to_csv('prices.csv', float_format='%.2f', index=False)
# item,price
# Widget,20.00
# Gadget,42.12

date_format -- Format Datetime Columns

import pandas as pd
 
df = pd.DataFrame({
    'event': ['Launch', 'Review'],
    'date': pd.to_datetime(['2026-03-15 14:30:00', '2026-04-01 09:00:00'])
})
 
# ISO date only (no time)
df.to_csv('events.csv', date_format='%Y-%m-%d', index=False)
# event,date
# Launch,2026-03-15
# Review,2026-04-01

Handling Encoding and Special Characters

Encoding is the number one cause of garbled CSV files. The default encoding in to_csv() is utf-8, which works for most systems. But Excel on Windows expects a BOM (Byte Order Mark) to correctly display non-ASCII characters.

UTF-8 (Default)

import pandas as pd
 
df = pd.DataFrame({
    'city': ['Zurich', 'Munchen', 'Tokyo'],
    'greeting': ['Gruezi', 'Servus', 'Konnichiwa']
})
 
# Standard UTF-8
df.to_csv('cities.csv', index=False, encoding='utf-8')

UTF-8 with BOM for Excel

If your CSV contains accented characters, CJK text, or special symbols and needs to open correctly in Microsoft Excel:

import pandas as pd
 
df = pd.DataFrame({
    'city': ['Zurich', 'Munchen', 'Sao Paulo', 'Beijing'],
    'population': [434000, 1472000, 12330000, 21540000]
})
 
# utf-8-sig adds a BOM that Excel recognizes
df.to_csv('cities_excel.csv', index=False, encoding='utf-8-sig')

Other Encodings

import pandas as pd
 
df = pd.DataFrame({'col': ['data']})
 
# Latin-1 for legacy Western European systems
df.to_csv('legacy.csv', index=False, encoding='latin-1')
 
# Shift-JIS for Japanese legacy systems
df.to_csv('japanese.csv', index=False, encoding='shift_jis')
EncodingWhen to UseExcel Compatible?
utf-8Default for most systems, APIs, databasesPartial (no BOM)
utf-8-sigExcel on Windows with non-ASCII textYes
latin-1 / iso-8859-1Legacy Western European systemsYes
shift_jisJapanese legacy systemsYes
cp1252Windows Western EuropeanYes

Large DataFrames: Compression and Chunking

Compression

Compression reduces file size by 60-90% with no data loss. Pandas supports multiple formats:

import pandas as pd
import numpy as np
 
# Create a large DataFrame
df = pd.DataFrame({
    'id': range(1_000_000),
    'value': np.random.randn(1_000_000),
    'category': np.random.choice(['A', 'B', 'C', 'D'], 1_000_000)
})
 
# Gzip compression (most common)
df.to_csv('large_data.csv.gz', index=False, compression='gzip')
 
# Zip compression
df.to_csv('large_data.zip', index=False, compression='zip')
 
# Bz2 compression (smaller but slower)
df.to_csv('large_data.csv.bz2', index=False, compression='bz2')
 
# Zstandard (fast, good ratio -- requires zstandard package)
df.to_csv('large_data.csv.zst', index=False, compression='zstd')
CompressionFile ExtensionSpeedSize ReductionRead Back
None.csvFastest write0%pd.read_csv('f.csv')
gzip.csv.gzModerate70-85%pd.read_csv('f.csv.gz')
zip.zipModerate70-85%pd.read_csv('f.zip')
bz2.csv.bz2Slow75-90%pd.read_csv('f.csv.bz2')
zstd.csv.zstFast70-85%pd.read_csv('f.csv.zst')

Pandas automatically infers the compression format from the file extension, so compression='infer' (the default) usually works.

Chunksize -- Write in Batches

For very large DataFrames that strain memory, write in chunks:

import pandas as pd
import numpy as np
 
df = pd.DataFrame({
    'id': range(5_000_000),
    'value': np.random.randn(5_000_000)
})
 
# Write 500,000 rows at a time
df.to_csv('huge_data.csv', index=False, chunksize=500_000)

The chunksize parameter does not reduce the final file size -- it controls how many rows are flushed to disk at once. This can help reduce peak memory usage during the write operation.

Writing to a String or Buffer

You do not always write to a file. Sometimes you need the CSV as a string for an API call, a database insert, or in-memory processing:

import pandas as pd
 
df = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'score': [85, 92]
})
 
# Get CSV as a string
csv_string = df.to_csv(index=False)
print(csv_string)
# name,score
# Alice,85
# Bob,92

Writing to StringIO or BytesIO

import pandas as pd
from io import StringIO, BytesIO
 
df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [4, 5, 6]
})
 
# StringIO buffer
buffer = StringIO()
df.to_csv(buffer, index=False)
csv_text = buffer.getvalue()
 
# BytesIO buffer (useful for HTTP responses, S3 uploads)
byte_buffer = BytesIO()
df.to_csv(byte_buffer, index=False, encoding='utf-8')
csv_bytes = byte_buffer.getvalue()

This pattern is common when uploading CSV data to cloud storage (S3, GCS) without creating a local file.

Appending to an Existing CSV

To add rows to an existing file without overwriting it, use mode='a' and header=False:

import pandas as pd
 
# First batch -- write with header
df1 = pd.DataFrame({'name': ['Alice'], 'score': [85]})
df1.to_csv('results.csv', index=False)
 
# Second batch -- append without duplicating the header
df2 = pd.DataFrame({'name': ['Bob', 'Charlie'], 'score': [92, 78]})
df2.to_csv('results.csv', mode='a', header=False, index=False)

The resulting file has one header row followed by all three data rows. Omitting header=False would insert a second header line in the middle of the data.

Quoting and Escaping

Fields containing the delimiter, newlines, or quotes need proper quoting. Pandas handles this automatically, but you can control the behavior:

import pandas as pd
import csv
 
df = pd.DataFrame({
    'name': ['O\'Brien', 'Smith, Jr.', 'Alice "Ace" Wong'],
    'notes': ['Contains apostrophe', 'Contains comma', 'Contains quotes']
})
 
# Default: quote only when necessary (QUOTE_MINIMAL)
df.to_csv('minimal.csv', index=False)
 
# Quote all fields
df.to_csv('all_quoted.csv', index=False, quoting=csv.QUOTE_ALL)
 
# Quote only non-numeric fields
df.to_csv('nonnumeric.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)

Export Format Comparison: to_csv() vs to_excel() vs to_parquet()

Choosing the right export format depends on who consumes the file and how large it is:

Featureto_csv()to_excel()to_parquet()
File formatPlain textBinary (xlsx)Binary (columnar)
Human readableYesVia Excel/SheetsNo
File size (1M rows)~50-100 MB~30-60 MB~5-15 MB
Write speedFastSlowFast
Read speedModerateSlowVery fast
Type preservationNo (all strings)PartialFull
Requires extra libraryNoopenpyxlpyarrow / fastparquet
Compression supportgzip, zip, bz2, zstdBuilt-inBuilt-in (snappy, gzip)
Best forInterop, APIs, quick sharingBusiness users, Excel usersAnalytics pipelines, big data

Rule of thumb: Use CSV for sharing data with non-technical users or external systems. Use Parquet for internal pipelines where speed and type fidelity matter. Use Excel when the recipient needs formatting or multiple sheets.

Common Pitfalls and How to Avoid Them

1. Unwanted Index Column

The most frequent complaint. Always pass index=False unless your index carries meaningful data (see reset_index() for cleaning up indexes before export):

# Wrong -- adds mystery "Unnamed: 0" column when read back
df.to_csv('data.csv')
 
# Right
df.to_csv('data.csv', index=False)

2. Unnamed Columns on Read-Back

If you exported with the index and then read the file back with read_csv(), you get an Unnamed: 0 column:

import pandas as pd
 
# Reading back a CSV that was exported with index
df = pd.read_csv('data.csv', index_col=0)  # tell read_csv which column is the index

3. Losing Datetime Precision

Without date_format, datetime columns get exported as full timestamps including microseconds. Control this explicitly:

import pandas as pd
 
df = pd.DataFrame({
    'ts': pd.to_datetime(['2026-01-15 08:30:00.123456'])
})
 
df.to_csv('ts.csv', index=False, date_format='%Y-%m-%d %H:%M:%S')

4. Float Rounding Issues

Floating-point numbers can produce unexpected trailing digits:

import pandas as pd
 
df = pd.DataFrame({'val': [0.1 + 0.2]})
df.to_csv('float.csv', index=False)
# val
# 0.30000000000000004
 
df.to_csv('float_clean.csv', index=False, float_format='%.2f')
# val
# 0.30

5. Mixed Types in a Column

If a column has mixed types (integers and strings), to_csv() converts everything to strings. Verify your data types before exporting:

import pandas as pd
 
df = pd.DataFrame({'id': [1, 2, 'three']})
print(df.dtypes)
# id    object  <-- mixed type
 
# Clean before export
df['id'] = pd.to_numeric(df['id'], errors='coerce')
df.to_csv('clean.csv', index=False)

Visualize Your Data Before Export with PyGWalker

Before exporting a DataFrame to CSV, it helps to verify the data looks correct. PyGWalker (opens in a new tab) turns any Pandas DataFrame into an interactive visual interface inside Jupyter Notebook -- drag and drop columns to build charts without writing plot code:

import pandas as pd
import pygwalker as pyg
 
df = pd.read_csv('raw_data.csv')
 
# Explore visually before exporting
walker = pyg.walk(df)
 
# Once satisfied, export
df.to_csv('verified_export.csv', index=False, encoding='utf-8-sig')

This workflow catches issues like unexpected nulls, outlier values, or wrong data types before they reach your CSV consumer.

FAQ

How do I export a Pandas DataFrame to CSV without the index?

Pass index=False to to_csv(): df.to_csv('file.csv', index=False). This prevents the row index from being written as the first column. When reading the file back, you will not see an extra Unnamed: 0 column.

How do I handle special characters and encoding when exporting to CSV?

Use encoding='utf-8-sig' if the CSV needs to open correctly in Microsoft Excel with non-ASCII characters (accented letters, CJK text). For standard systems and APIs, the default encoding='utf-8' works. For legacy systems, use the specific encoding they require (latin-1, shift_jis, etc.).

How do I compress a large CSV file when exporting from Pandas?

Add a compression extension to the filename and Pandas handles the rest: df.to_csv('data.csv.gz', index=False) creates a gzip-compressed file. You can also set compression='gzip', 'zip', 'bz2', or 'zstd' explicitly. Compressed files are typically 70-90% smaller and can be read back directly with pd.read_csv().

Can I append data to an existing CSV file instead of overwriting it?

Yes. Use mode='a' and header=False: df.to_csv('file.csv', mode='a', header=False, index=False). The mode='a' opens the file for appending, and header=False prevents writing a duplicate header row in the middle of the file.

What is the difference between to_csv() and to_parquet() in Pandas?

to_csv() produces a human-readable text file but loses type information (everything becomes strings). to_parquet() creates a compact binary file that preserves data types (integers, floats, datetimes), reads 5-10x faster, and is 5-10x smaller. Use CSV for sharing with non-technical users or external systems. Use Parquet for internal data pipelines where performance and type fidelity matter.

Conclusion

DataFrame.to_csv() is the standard way to export Pandas data to CSV files. For clean exports, always pass index=False. Use encoding='utf-8-sig' when Excel compatibility matters. Compress large files with gzip or zstd to reduce size by 70-90%. Control decimal precision with float_format, handle missing values with na_rep, and append to existing files with mode='a'. For data pipelines where type preservation and speed matter, consider to_parquet() instead. Before exporting, verify your data visually with tools like PyGWalker to catch issues early.

Related Guides

📚