All Articles

Using RegEx Groups in Search/Replace to Extract Data from a Document

regex header

Boomi uses Regular Expressions (RegEx) in the Search/Replace step within the Data Process Shape, following all the standard rules of RegEx. This allows for data extraction from a document using RegEx groups. In this article, we will cover how to use RegEx groups, which are patterns wrapped in parentheses. By referencing these groups with their respective numbers, data can be efficiently extracted from the document.

Below is an example document that contains multi-part data. The XML data is being extracted from the document.

------=_Part_199271_1149177254.1718988851264
Content-Type: application/xop+xml;charset=UTF-8;type="text/xml"
Content-Transfer-Encoding: 8bit
Content-ID: <123456789>

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
	<env:Header/>
	<env:Body>
		<ns0:GenericResponse xmlns:ns0="http://www.boomi.com">Boomi</ns0:GenericResponse>
	</env:Body>
</env:Envelope>
------=_Part_199271_1149177254.1718988851264--

Within the Text to Find field of the Search/Replace step, the following RegEx pattern is used to extract the XML data from the document.

(?s)(.*?)(<env:Envelope[\s\S]*?<\/env:Envelope>)(.*)

With this RegEx pattern, it starts with treating the current data as a string. Then it breaks it until into three groups:

  1. (?s) - This flag allows the dot to match newlines. While, the are parentheses around the entire pattern, it is not a group.
  2. (.*?) - This group grabs all the data before the next group. The ? is non-greedy, so it will stop at the next group.
  3. (<env:Envelope[\s\S]*?<\/env:Envelope>) - This group grabs the XML data that starts with <env:Envelope and ends with </env:Envelope>.
  4. (.*) - This group is the rest of the data after the XML data.

Next in the Replace With field, the group number is used to extract the XML data. So, the entire document is matched in the Text To Find, and it is being replaced with the XML data from group 2. Group 2 is the above bullet 3 since the first parentheses set is only used to set a flag. Additionally, the search should be set to the Entire document at once.

$2

data process regex groups

Figure 1. Using RegEx Groups in Search/Replace to Extract Data from a Document.

References

The article was originally posted at Boomi Community.

Published Jul 21, 2024

Developing a better world.© All rights reserved.