How to Easily Select All Columns from Stage in Snowflake

Name: Sebastian Brandt

Updated on 7/24/2023

Snowflake, a break-through cloud data platform, is transforming the landscape of data engineering, data warehouses, data lakes, and data science. One of the key features of Snowflake is its ability to handle stages, a critical component in data warehousing and analytics. This article will take you on a deep dive into the concept of stages in Snowflake, and more importantly, how to select all columns from a stage.

Stages in Snowflake serve as an intermediate space where data files are stored before being loaded into tables. This feature is crucial for bulk data operations, making it an essential topic for data engineers and analysts working with Snowflake. But before we delve into the specifics of selecting all columns from a stage, let's first understand what Snowflake stages are and why they are important.

Want to Easily Visualize Snowflake Data? RATH (opens in a new tab) gives you the easiest solution for AI-powered Data Visualization and a complete package for Automated Data Analysis!

Watch the following video about how to Explore Data Insights easily with RATH:

Want to connect your Snowflake database to RATH now? Check out the RATH Documentation for a step-by-step guide of Snowflake intergration.

Beyond Snowflake, RATH supports a wide range of data sources. Here are some of the major database solutions that you can connect to RATH:

Interested? Give RATH a try right now at RATH Website (opens in a new tab)

(opens in a new tab)

Understanding Snowflake Stages

In Snowflake, a stage is a named database object that specifies where data files are staged to streamline the bulk loading and unloading of data in and out of database tables. Stages play a pivotal role in data warehousing and analytics, acting as a bridge between raw data files and Snowflake tables.

There are two types of stages in Snowflake:

Internal Stages: These stages store data internally in Snowflake tables. They are automatically created for each table and schema, and you can also create them explicitly.
External Stages: These stages store data in an external location, such as Amazon S3 buckets, Google Cloud Storage buckets, or Microsoft Azure Blob storage. External stages are read-only, meaning you cannot perform DML operations on them.

Understanding the difference between these two types of stages is crucial when working with Snowflake, as it influences how you interact with your data.

Working with Snowflake Metadata

Snowflake automatically generates metadata for files in both internal and external stages. This metadata includes file names, version IDs, and associated properties, providing additional information about the files. Metadata in Snowflake acts as a roadmap to locate the contents of a data warehouse, making it an essential aspect of data querying and analysis.

Here are the key metadata columns in Snowflake that you can query or load into tables:

METADATA$FILENAME: This column returns the name of the staged file, the row it belongs to, and the file path.
METADATA$FILE_ROW_NUMBER: This column returns the row number for each record in the staged file.

How to Query Staged Data in Snowflake

Now that we have a solid understanding of Snowflake stages and metadata, let's delve into how to query staged data in Snowflake. We'll walk through a step-by-step guide on how to query metadata from an external stage and an internal stage, providing practical examples to illustrate the process.

Querying Metadata from an External Stage

Let's start with an example where we have a CSV file named 'Cars' staged in an external Amazon S3 bucket. The file has three columns: id, name, and location. Here's how you can query the metadata from this staged file:

Create a Snowflake stage: First, you need to create a stage that points to the Amazon S3 bucket where your CSV file is stored. You can do this using the CREATE STAGE command.
Create a file format: Next, you need to create

a file format that describes the format of the CSV file. You can do this using the CREATE FILE FORMAT command.

Query the metadata: Finally, you can query the metadata and the regular data columns in the staged file using a SELECT statement. The SELECT statement should include the METADATA$FILENAME and METADATA$FILE_ROW_NUMBER columns, as well as the regular data columns.

Here's a sample code snippet that illustrates these steps:

-- Create a stage
CREATE OR REPLACE STAGE cars_stage URL = 's3://your_bucket/path/';
 
-- Create a file format
CREATE OR REPLACE FILE FORMAT cars_format TYPE = 'CSV' FIELD_DELIMITER = ',';
 
-- Query the metadata and data columns
SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, $1, $2, $3
FROM @cars_stage
(FILE_FORMAT => cars_format);

This query will return the filename, row number, and the data in each column for each record in the staged CSV file.

Querying Metadata from an Internal Stage

Querying metadata from an internal stage follows a similar process, but with a few differences. Let's consider an example where we have a JSON file named 'Movies' staged in an internal Snowflake stage. The file contains objects with two properties: title and year. Here's how you can query the metadata from this staged file:

Create a Snowflake stage: As with the external stage, you first need to create a stage. However, since this is an internal stage, you don't need to specify a URL.
Create a file format: Next, you need to create a file format that describes the format of the JSON file. This is done using the CREATE FILE FORMAT command, with the type set to 'JSON'.
Stage the JSON file: You can stage the JSON file from your local directory to the Snowflake stage using the PUT command.
Query the metadata: Finally, you can query the metadata and the objects in the staged file using a SELECT statement. The SELECT statement should include the METADATA$FILENAME and METADATA$FILE_ROW_NUMBER columns, as well as the PARSE_JSON function to parse the JSON objects.

Here's a sample code snippet that illustrates these steps:

-- Create a stage
CREATE OR REPLACE STAGE movies_stage;
 
-- Create a file format
CREATE OR REPLACE FILE FORMAT movies_format TYPE = 'JSON';
 
-- Stage the JSON file
PUT file://path_to_your_file/movies.json @movies_stage;
 
-- Query the metadata and data columns
SELECT METADATA$FILENAME, METADATA$FILE_ROW_NUMBER, PARSE_JSON($1)
FROM @movies_stage
(FILE_FORMAT => movies_format);

This query will return the filename, row number, and the parsed JSON objects for each record in the staged JSON file.

Advanced Techniques for Querying Staged Data

As you become more comfortable with querying staged data in Snowflake, you can start to explore more advanced techniques. For instance, you can use the COPY INTO command to load the data from a staged file into a Snowflake table. This command allows you to perform bulk data operations, which can be much faster than inserting data row by row.

Here's an example of how you can use the COPY INTO command to load data from a CSV file staged in an Amazon S3 bucket into a Snowflake table:

-- Create a table
CREATE OR REPLACE TABLE cars (id INTEGER, name STRING, location STRING);
 
-- Load data from the staged file into the table
COPY INTO cars
FROM @cars_stage
FILE_FORMAT => cars_format;

This command will load all the data from the staged CSV file into the 'cars' table. You can then query the data in the table using regular SQL commands.

FAQs

1. Can I query staged data in Snowflake without loading it into a table?

Yes, you can query staged data in Snowflake directly without loading it into a table. You can do this using the SELECT statement, as shown in the examples above. However, keep in mind that querying large amounts of staged data directly can be slower than querying data loaded into a table.

2. What types of files can I stage in Snowflake?

Snowflake supports staging a variety of file types, including CSV, JSON, Avro, Parquet, and ORC. You can specify the file type when you create a file format.

3. Can I stage data from a local file in Snowflake?

Yes, you can stage data from a local file in Snowflake using the PUT command. This command uploads the local file to an internal Snowflake stage, from where you can query the data or load it into a table.

Conclusion

Selecting all columns from a stage in Snowflake is a powerful operation that allows you to query and analyze large amounts of data efficiently. By understanding the concepts of stages and metadata in Snowflake, and by mastering the techniques for querying staged data, you can unlock the full potential of your data and gain deeper insights. Whether you're a data engineer, a data analyst, or just someone interested in data science, mastering these skills will undoubtedly be a valuable addition to your toolkit.