Snowflake: Bulk Load with Local Files
The Snowflake connector supports two ways to bulk load data into Snowflake: using AWS S3 or a local file. The article will focus on how to use local files to bulk load data into Snowflake.
Snowflake Configuration
To get started, a file format, a stage, and a table must be created within Snowflake. The file format will define the structure and type of file being sent to the stage. The stage will be used to load the file into Snowflake. The table will be used to store the data.
First, create a file format using the following Snow SQL command. This step should be performed within a Snowflake Worksheet. The file format will be expecting data in a CSV format. It will contain a header, delimited with a comma, can be optionally enclosed by double quotes, and values will be null if they are empty. The name of the file format is ‘boomi_csv’.
-- Create a File Format
CREATE OR REPLACE FILE FORMAT boomi_csv
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = '';
This is an example file format that will be used in this example.
ACCOUNT_ID,NAME,STATUS
4,Boomi,Active
2,PanAm,Closed
Next a stage will need to be created as a holding spot for the CSV file. The name of the stage being created is boomi_stage_csv. The file format created above will be associated with the stage.
-- Create a stage for the CSV file.
CREATE OR REPLACE STAGE boomi_stage_csv
FILE_FORMAT = csv;
Once the file format and stage are created, a table also needs to be created to store the data. This table will only be used as a reference for this example.
CREATE OR REPLACE TABLE ACCOUNT.PUBLIC.STG_ACCOUNT_NAME (
ACCOUNT_ID NUMBER(20,0),
NAME VARCHAR(16777216),
STATUS VARCHAR(16777216)
);
Boomi Configuration
Within Boomi, a process will be built to query a database, transform to data into a CSV, and then write to a disk. In a second branch, a Snowflake Bulk Load operation will be used to load the CSV into Snowflake. With the Bulk Load operation, data has to be written to disk beforehand. If using AWS S3, then CSV data can be directly located by sending data into the Snowflake Bulk Load Operation. Boomi uses two commands to load data into Snowflake: PUT and COPY INTO. The PUT command will move the file from the local disk to the stage. The COPY command will move the data from the stage to the table.
Figure 1. Overview of Boomi Process.
On branch 1 of the process, a file will be created and written to a directory. The directory needs to be static, but the file name can be dynamic if using a wildcard (*) within the Internal Source Files Path field on the Snowflake Bulk Load Operation.
When writing data to the directory, ensure that there are no spaces within the file name. Boomi’s Snowflake connector does not currently support spaces within the file name (as of March 2024). Spaces in the directory path are more common in Windows than in Linux.
Example Directory Path Syntax
# Linux
/boomi/share/snowflake/
# Windows
C:\Boomi\snowflake
Figure 2. Set Properties with Linux Directory Defined.
Figure 3. Set Properties with Windows Directory Defined.
Next, create a Snowflake Bulk Load Operation and import the desired table to write to. The Bulk Load Operation will perform a PUT command to move the local file to the Snowflake stage. Then it will execute a COPY INTO command to move the file from the stage to the table that was imported.
Figure 4. Snowflake Bulk Load Operation for Linux.
Figure 5. Snowflake Bulk Load Operation for Windows.
Below is a list of fields used within the Snowflake Bulk Load Operation to be used with local files.
Table 1. Required Fields within Snowflake Bulk Load Operation.
Property | Description |
---|---|
Object | The name of the table that data should be written to. |
Truncate before load | If checked, the table above will be truncated before data is loaded. |
S3 Bucket Name | Leave blank if using local files. |
AWS Region | Leave blank if using local files. |
Internal Stage Name | The name of the stage that was created above (BOOMI_STAGE_CSV). Do not include an @ before the stage name. It will be automatically included within the statement. |
Internal Source Files Path | The name of the directory that the file was written to in branch 1. Do not include single or double quotes around the directory. Asterisk (*) can be used as wildcards and will pick up any file within the directory. |
Overwrite | Overwrite files with the same name within the stage. |
Column Names | This field is only important if the column order in the CSV file is different than the column order in the table. |
File Format Name | The name of the file format that was created above (CSV) |
File Format Type | The type of file format. This is not required if populating File Format Name. CSV is the default. |
Stage Path | The path within the stage that the file is located. |
References:
- Boomi Doc: Snowflake Connector
- Boomi Doc: Snowflake Bulk Load Operation
- Snowflake Doc: Create File Format
- Snowflake Doc: Create Stage
- Snowflake Doc: PUT
- Snowflake Doc: COPY INTO
The article was originally posted at Boomi Community.