NumPy Concatenate: Join Arrays with np.concatenate, vstack, and hstack
Updated on
Combining arrays is one of the most common operations in numerical computing. You split data for processing, then need to reassemble it. You load features from multiple sources into a single matrix. You stack predictions from multiple models for ensembling -- for example, combining outputs from a Random Forest ensemble. Using Python lists to combine arrays with loops or + destroys performance and creates copies unnecessarily.
NumPy provides specialized functions for array joining that operate at C speed and give you precise control over which axis to concatenate along. This guide covers np.concatenate(), np.vstack(), np.hstack(), np.stack(), and their use cases.
np.concatenate() -- The Core Function
np.concatenate() joins a sequence of arrays along an existing axis.
1D Arrays
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])
result = np.concatenate([a, b, c])
print(result) # [1 2 3 4 5 6 7 8 9]
print(result.shape) # (9,)2D Arrays Along Rows (axis=0)
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Stack vertically (add rows)
result = np.concatenate([a, b], axis=0)
print(result)
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
print(result.shape) # (4, 2)2D Arrays Along Columns (axis=1)
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Stack horizontally (add columns)
result = np.concatenate([a, b], axis=1)
print(result)
# [[1 2 5 6]
# [3 4 7 8]]
print(result.shape) # (2, 4)Shape Requirements
All arrays must have the same shape except along the concatenation axis.
import numpy as np
a = np.ones((3, 4)) # 3 rows, 4 cols
b = np.zeros((2, 4)) # 2 rows, 4 cols
# OK: different row count, same column count, axis=0
result = np.concatenate([a, b], axis=0)
print(result.shape) # (5, 4)
# ERROR: different column counts
c = np.ones((3, 5))
# np.concatenate([a, c], axis=0) # ValueError: dimensions don't matchnp.vstack() -- Vertical Stacking
np.vstack() stacks arrays vertically (along axis 0). Equivalent to np.concatenate(arrays, axis=0) but also handles 1D arrays by treating them as single rows.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# vstack treats 1D arrays as rows
result = np.vstack([a, b])
print(result)
# [[1 2 3]
# [4 5 6]]
print(result.shape) # (2, 3)
# With 2D arrays
c = np.array([[1, 2], [3, 4]])
d = np.array([[5, 6]])
result = np.vstack([c, d])
print(result)
# [[1 2]
# [3 4]
# [5 6]]np.hstack() -- Horizontal Stacking
np.hstack() stacks arrays horizontally (along axis 1 for 2D, axis 0 for 1D).
import numpy as np
# 1D arrays: concatenates like np.concatenate
a = np.array([1, 2, 3])
b = np.array([4, 5])
result = np.hstack([a, b])
print(result) # [1 2 3 4 5]
# 2D arrays: adds columns
c = np.array([[1], [2], [3]])
d = np.array([[4], [5], [6]])
result = np.hstack([c, d])
print(result)
# [[1 4]
# [2 5]
# [3 6]]np.stack() -- Creates a New Axis
Unlike concatenate, stack joins arrays along a new axis, increasing the dimension count by 1.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Stack along new axis 0 (default)
result = np.stack([a, b])
print(result)
# [[1 2 3]
# [4 5 6]]
print(result.shape) # (2, 3)
# Stack along new axis 1
result = np.stack([a, b], axis=1)
print(result)
# [[1 4]
# [2 5]
# [3 6]]
print(result.shape) # (3, 2)All input arrays must have the same shape for np.stack().
Function Comparison
| Function | Axis | Creates New Axis? | 1D Input Handling |
|---|---|---|---|
np.concatenate | Existing (default 0) | No | Concatenates as-is |
np.vstack | 0 (vertical) | No | Treats 1D as row |
np.hstack | 1 (horizontal) | No | Concatenates 1D arrays |
np.stack | New axis | Yes (+1 dimension) | Creates new dimension |
np.column_stack | 1 | No | Treats 1D as column |
np.row_stack | 0 | No | Same as vstack |
Practical Examples
Combine Features for Machine Learning
import numpy as np
# Feature arrays from different sources
ages = np.array([25, 30, 35, 40])
incomes = np.array([50000, 60000, 75000, 90000])
scores = np.array([720, 680, 750, 800])
# Stack as columns to create feature matrix
X = np.column_stack([ages, incomes, scores])
print(X)
# [[ 25 50000 720]
# [ 30 60000 680]
# [ 35 75000 750]
# [ 40 90000 800]]
print(X.shape) # (4, 3)Batch Processing Results
import numpy as np
# Process data in batches, combine results
results = []
for batch_start in range(0, 100, 25):
batch = np.random.randn(25, 10) # 25 samples, 10 features
processed = batch * 2 + 1 # Some processing
results.append(processed)
# Combine all batches vertically
all_results = np.vstack(results)
print(all_results.shape) # (100, 10)Image Processing -- Combine Channels
import numpy as np
height, width = 100, 100
# Separate color channels
red = np.random.randint(0, 256, (height, width))
green = np.random.randint(0, 256, (height, width))
blue = np.random.randint(0, 256, (height, width))
# Combine into RGB image
rgb_image = np.stack([red, green, blue], axis=2)
print(rgb_image.shape) # (100, 100, 3)Visualizing Combined Data
After concatenating arrays from different sources, PyGWalker (opens in a new tab) lets you explore the combined dataset interactively in Jupyter:
import pandas as pd
import pygwalker as pyg
# Convert concatenated array to DataFrame
df = pd.DataFrame(X, columns=['age', 'income', 'score'])
walker = pyg.walk(df)FAQ
What is the difference between np.concatenate and np.stack?
np.concatenate joins arrays along an existing axis without changing the number of dimensions. np.stack joins arrays along a new axis, adding one dimension. For example, stacking two (3,) arrays with concatenate gives (6,), while stack gives (2, 3).
What does axis mean in np.concatenate?
The axis parameter specifies which dimension to join along. axis=0 concatenates along rows (adding more rows), axis=1 along columns (adding more columns). For 2D arrays, axis=0 is vertical stacking and axis=1 is horizontal stacking.
When should I use vstack vs hstack vs concatenate?
Use vstack to add rows (stack vertically), hstack to add columns (stack horizontally), and concatenate when you need to specify an arbitrary axis. vstack and hstack are convenience functions that handle 1D arrays more intuitively than concatenate.
Why do I get a ValueError when concatenating arrays?
All arrays must have the same shape except along the concatenation axis. If you're concatenating along axis=0, all arrays must have the same number of columns. Check shapes with array.shape before concatenating.
How do I concatenate arrays of different dimensions?
Reshape the arrays to have compatible dimensions first. Use np.expand_dims(array, axis) to add a dimension, or array.reshape(new_shape). For example, to vstack a 1D array with a 2D array, reshape the 1D array: array.reshape(1, -1).
Conclusion
NumPy provides a complete toolkit for joining arrays: np.concatenate() for general-purpose joining along any axis, np.vstack()/np.hstack() for intuitive vertical/horizontal stacking, and np.stack() when you need to create a new dimension. Remember that concatenate joins along existing axes while stack creates new ones. Use np.reshape when you need to fix dimension mismatches before concatenating. Generate the arrays themselves with np.arange or np.linspace. Match dimensions carefully, and use vstack/hstack for the most readable code when working with common row/column operations.
Related Guides
- NumPy reshape -- fix dimension mismatches before concatenating arrays
- NumPy arange -- create evenly spaced arrays to concatenate
- NumPy linspace -- generate precise sequences of values
- Sklearn Pipeline -- use concatenated feature arrays in ML pipelines