All Articles

Recursively Query Subdirectories with Disk v2

The article reviews how to recursively query files in all subdirectories. Below are two methods to obtain all directories that are below the defined directory. By default, the Disk v2 connector will only query one directory at a time. Although, with the construct below, multiple queries can be made to obtain files within the defined directory and recursively query in all subdirectories. The first method uses Boomi’s out-of-the-box features and should be the first option, and will have the most support. The second option uses a Groovy script and requires the developer to have familiarity with Groovy.

Method 1 - Use a Disk v2 List Operation with a Loop

The first method uses out-of-the-box tools within Boomi. The demo within this process is set up so that it can be reusable and will only require a single document with a dynamic document property as input. Additionally, the subprocess is there so that all documents will exit it at one time, and, therefore, all queries (end of the main process) will occur at the same time.

Overview of main process

Figure 1. Overview of main process

Reusable Subprocess

Figure 2. Overview of the reusable subprocess: [SUB] Query Disk v2 Recursive.

First, within the main process, a Set Property shape will define a Dynamic Document Property with the base directory to query. DDP_BASE_DIR_QUERY is the Dynamic Document Property to be defined.

Set Property

Figure 3. Set DDP_BASE_DIR_QUERY within Set Property shape.

The subprocess is designed to take a single document with DDP_BASE_DIR_QUERY set, and then recursively find all directories below it to be queried. The output of the subprocess will be a document per subdirectory. Each document will then be used to query each directory. The first step within the subprocess (Figure 2) is to set the Disk v2 Directory with a Set Property shape.

Set Property with Disk Directory

Figure 4. Set Disk v2 Dir Set Property shape within subprocess.

Next, there is a branch with a Map down branch 1. The query for subdirectories will show all subdirectories but will not output the initial directory to query (e.g. DDP_BASE_DIR_QUERY). Branch 2 will be set up first because branch 1 will use components that are found in the Disk v2 LIST Operation along Branch 2.

On Branch 2 add a Disk v2 connector. Create a Disk v2 connection, create a LIST operation, import a profile, and create a single parameter: isDirectory: EQUALS. The parameter will be used to look for only directories and no files.

Disk v2 Operation

Figure 5. Disk v2 LIST Operation with isDirectory:EQUAL parameter set.

Once complete, click save on the operation, and go back to the subprocess canvas. Click on the Disk v2 to set the parameters. Then add the isDirectory:EQUALS parameter and set it to Static: true. This will only return directories.

Connector Parameters

Figure 6. Defining isDirectory:EQUAL parameter within the connector configuration panel.

After the Disk v2 connector, add another Branch shape. Branch 1 will go to a Return Document shape. Branch 2 will go to a Set Disk v2 Dir Recursive Set Property shape. This Set Property shape is similar to the first one at the start of the subprocess, but will take the output of the Disk v2 connector and create another query with the connector’s output. There are two fields to focus on within the JSON output, which are fileName and directory. In this query, the element fileName is actually the name of the subdirectory. The directory element will be the absolute directory. This example is using a Windows-based atom to execute the process and, therefore, a back slash as the path separator.

Example JSON response from Disk v2 LIST operation.

{
    "createdDate": "2022-06-04T12:52:29.362-04:00",
    "directory": "C:\\Boomi AtomSphere\\TestNewFiles",
    "fileName": "test1",
    "fileSize": 0,
    "isDirectory": true,
    "modifiedDate": "2022-06-04T14:57:31.115-04:00"
}

Within the Set Disk v2 Dir Recursive Set Property shape concatenate the directory, path separator, and fileName. Pay special attention that there are 3 values added under Property Value. The path separator will be a back slash for Windows and a forward slash for Linux. Connect the output of the Set Property shape to the Disk v2 connector.

Disk v2 Recursive Set Property

Figure 7. Set Disk v2 Dir Recursive Set Property shape.

Now, that Branch 2 is completely, create the map on branch 1. Within the map, create a flat file profile with a single element as the source profile. For the destination profile, use the Disk v2 Directory LIST Response profile that was created within the Disk v2 LIST operation. Within the map functions area, add a Get Document Property and set a Dynamic Document Property to DDP_BASE_DIR_QUERY, and populate the directory element within the target profile. This map is being used so that the base directory will also be included in the list of directories to be queried later in the process. Once complete, connect the map to the Return Document shape. Click save on the subprocess and move back to the main process.

Initial Base Directory for Query Map

Figure 8. Get Initial Base Directory for Query Map.

Next, after the subprocess, add another Set Property shape and this will look exactly like the et Disk v2 Dir Recursive Set Property shape (Figure 7) that is within the subprocess. This will be used to list all of the directories that need to be queries. Finally, add a Disk v2 connector to the canvas, use the same connector as before, and create a QUERY operation. Define all parameters required within the operation.

Method 2 - Use a Data Process Shape with a Script

The second method uses a Groovy script within a Data Process shape. This script can be helpful if there are a significant number of recursive subdirectories, which has the potential for a stack overflow to occur. The script is designed to be executed on either a Windows or Linux based operating system. Other operating systems are not supported.

Overview of recursively querying directories using a Groovy script

Figure 9. Overview of recursively querying directories using a Groovy script.

A Set Property shape is used to set a Dynamic Document Property that will contain the directory to recursively query for. Dynamic Document Property to Set: DDP_BASE_DIR_QUERY

DDP_BASE_DIR_QUERY within Set Property shape

Figure 10. Set DDP_BASE_DIR_QUERY within Set Property shape.

Next the Data Process Shape will contain a script that will get the directory path of the base directory set within DDP_BASE_DIR_QUERY and all subdirectories. After the script is executed, there is a second step within the Data Process shape that is used to split the document.

Groovy Script

Figure 11. Add script to Data Process shape.

// Groovy 2.4
import java.util.Properties
import java.io.InputStream

String lineSeparator = System.getProperty("line.separator");
String detectedOS = System.getProperty("os.name", "generic")

for (int i = 0; i < dataContext.getDataCount(); i++) {
    InputStream is = dataContext.getStream(i);
    Properties props = dataContext.getProperties(i);

    String rootDir = props.getProperty("document.dynamic.userdefined.DDP_BASE_DIR_QUERY");
    Process p = null;

    // Detect the OS. Only valid on Windows and Linux.
    if (detectedOS.contains("Windows")) {
        p = Runtime.getRuntime().exec("cmd /c dir \"" + rootDir + "\" /ad /b /s");
    } else if (detectedOS.contains("Linux")) {
        p = Runtime.getRuntime().exec("find " + rootDir + " -type d -print");
    }
    
    // Prepend rootDir to output
    List<InputStream> streams = Arrays.asList(
            new ByteArrayInputStream(rootDir.getBytes()),
            new ByteArrayInputStream(lineSeparator.getBytes()),
            p.getInputStream());
    is = new SequenceInputStream(Collections.enumeration(streams));

    dataContext.storeStream(is, props);

}

Split flat file data by line

Figure 12. Split flat file data by line.

Next with a Set Property shape, set the Disk v2 Directory to a profile element. I find it helpful to use a flat file profile with a single element instead of current data.

Set Property Shape for Disk Query

Figure 13. Create a flat file profile with a single element and use it to parse the data. It can be helpful to remove line breaks from current data.

Finally, setup a Disk v2 Query Operation. Have a single parameter set to fileName:WILDCARD = * to query all files. Update the parameters per your use case.

Disk v2 Query Operation

Figure 14. Set the parameter within the Disk v2 Query Operation.

Disk v2 Query Parameters

Figure 15. Set the parameter to Static = * within the Connector configuration panel.

Article originally posted at Boomi Community.

Published Jun 5, 2022

Developing a better world.© All rights reserved.