Snowflake and Boomi make a wonderful pairing. Boomi can integrate massive amounts of data into Snowflake. Then, Snowflake can be used for all of your data warehousing needs, such as Power BI Dashboards and Machine Learning datasets. However, Snowflake is not just another relational database. Even though there is a JDBC driver, it does not mean that you should treat it as one within your integrations. Snowflake works best when you use files to import and export data. Then use the JDBC driver as a way to execute commands.
Generally, if someone comes from a strong database background they will want to insert data into a database by performing an insert statement. When setting up Snowflake, there is an Insert operation that has a use, but it is limited. In practice, the Insert operation will be extremely slow. One main reason for this is that the Snowflake does not support batching. Yet, the more important reason is that Snowflake does not even recommend this method. There are better ways that Snowflake has optimized for and that includes sending files for consumption.
Snowflake recommends using either COPY INTO or Snowpipe to load large amounts of data. An INSERT statement would work fine if you had less than 100 rows of data to be inserted. Yet, after that, the time tradeoff starts to get very steep. An example is loading 100k rows can take about 2 hours with INSERT statements. That same data with COPY INTO can take a few seconds. The time is that starkly different. With all of the various Snowflake Operations within Boomi, the most helpful ones to load data will be Bulk Load, SnowSQL and Execute. The Bulk Load will get data into a staging table. Then the SnowSQL or Execute will often be used to merge data from the staging table into the main table. A SnowSQL keeps the merging logic within Boomi. The Execute operation will execute a stored procedure that will keep the merging logic within Snowflake.
COPY INTO is built in Boomi’s Snowflake Bulk Load and Copy Into Table Operations. Out of these two options, Bulk Load will likely be the most useful. Bulk Load performs two tasks: 1) Load a CSV file into AWS S3, and 2) Perform a COPY INTO TABLE command to move the data from S3 to a staging table.
Snowflake’s other recommendation is to use Snowpipe. Snowpipe listens for new files added to AWS S3 (or other object storage) and will automatically perform the COPY INTO when a new file is found. Snowpipe listens to new files via AWS SQS messages (or another queue). In Boomi, this can simplify the integration because the integration will only need to write a CSV file to AWS S3. Yet, it moves some of the complexity to setting up AWS SQS and configuring Snowflake Snowpipe.
Within Snowflake, a File Format needs to be created. If you decide to use a file written to the local drive on the server, then a Stage will need to be created, which is not described here. The file format will determine how the CSV file will proceed. There are other formats, but CSV is easier to work with.
First, within the Snowflake UI, create a File Format. The following command will create a file format called “csv”. It will be delimited with a comma. It will have a header. The fields can be enclosed with double quotes. If a field is a no-character string, then it will be a null within the staging table.
-- Create a File Format
CREATE OR REPLACE FILE FORMAT csv
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = '';
In this example, there will be a staging table and a main table. Below are the following create statements for the two that will be used.
create or replace TABLE ACCOUNT.PUBLIC.STAGE_ACCOUNT_NAME (
ACCOUNT_ID NUMBER(20,0),
NAME VARCHAR(16777216),
STATUS VARCHAR(16777216)
);
create or replace TABLE ACCOUNT.PUBLIC.ACCOUNT_NAME (
ACCOUNT_ID NUMBER(20,0),
NAME VARCHAR(16777216),
STATUS VARCHAR(16777216)
);
To get started, the Integration Runtime should be private, either privately hosted or Managed Cloud Services (MCS). The Snowflake connector is currently not supported on the Boomi Public Cloud because of how the JDBC driver initializes. Once that is confirmed, create a Snowflake Bulk Load Operation. The instructions will use AWS S3. Another option is to write data locally with a disk shape and then use the internal stage and internal stage files path.
The overall process will have two branches. Branch 1 will send CSV data to Snowflake via AWS S3 using the Bulk Load Operation. A message shape is used to mock data. In a production process, this will be at least a connector to query data and a map to transform it. In Branch 2, an update and insert statement are used to merge the data from the staging table to the main table.
Figure 1. Process Overview
Figure 2. Message Shape Data
The message shape contains mock data. The import part is that there are headers to the CSV that match the headers within the STAGE_ACCOUNT_NAME table.
Next, add a Snowflake connector and create a Bulk Load Operation. The following are the basic fields that need to be populated.
Table 1. Snowflake Bulk Load Operation
Field | Value |
---|---|
Object | Click on Import Operation to Populate the Stage Table Name |
Option: Truncate Before Load | Optional, but useful to clear the data before loading |
S3 Bucket Name | {S3 Bucket} |
AWS Region | {S3 Region} |
File Format Name | csv |
Figure 3. Snowflake Bulk Load Operation
Finally, for Branch 2, create a Snowflake Snow SQL Operation to perform the merge. The merge is split into two statements. The first is an update on Account Ids that match, and the second is an insert on Account Ids that do not match. These statements are not required to be executed within Boomi, but are used as a general outline for what to do once data is loaded into a Snowflake table.
Table 2. Snowflake Snow SQL Operation.
Field | Value |
---|---|
Object | Click on Import Operation to create a dummy profile |
SQL Script | Script located below |
Number of SnowSQL Statements | 2 |
UPDATE ACCOUNT.PUBLIC.ACCOUNT_NAME AS target
SET
target.NAME = source.NAME,
target.STATUS = source.STATUS
FROM ACCOUNT.PUBLIC.STG_ACCOUNT_NAME AS source
WHERE target.ACCOUNT_ID = source.ACCOUNT_ID;
INSERT INTO ACCOUNT.PUBLIC.ACCOUNT_NAME (ACCOUNT_ID, NAME, STATUS)
SELECT
source.ACCOUNT_ID,
source.NAME,
source.STATUS
FROM ACCOUNT.PUBLIC.STG_ACCOUNT_NAME source
LEFT JOIN ACCOUNT.PUBLIC.ACCOUNT_NAME target
ON source.ACCOUNT_ID = target.ACCOUNT_ID
WHERE target.ACCOUNT_ID IS NULL;
Figure 4. Snowflake Snow SQL Operation is used to merge data into the main table
Now when the process runs, it will load data into AWS S3, and perform the Snowflake COPY INTO command to load data into the staging table, and finally, the Snow SQL Operation will execute two statements that will merge data from the staging table into the main table.
The article was originally posted at Boomi Community.