In a program like SDL Trados Studio, regular expressions can be used to:
1) Filter on segments that match a certain regex
2) Find text that matches a regex
3) Create verification settings
4) Add new segmentation rules to a TM
Let's use the new line and tab special characters to look at a few examples of these applications.
1) Filtering on segments that contain a new line break
This is achieved by using the regular Display Filter (found in the Review tab), which has regex enabled by default, or the Advanced Display Filter, where regular expressions must be enabled by checking a box.
Tip: For even more powerful filtering, download the Community Advanced Display Filter from the SDL app store.
So, if we have a document that looks like this:
Entering the "new line" regex character in the Display Filter search box produces the following filtered results:
2) Finding a tab
To do this, the regular expressions checkbox in the Find dialog box must be checked. This example shows the results of the search.
Once we learn that we can use regex in the Find dialog box, a natural question is whether the same can be done in a replace operation. The answer is a bit disappointing: while the Find field accepts all kinds of regular expressions, the regex syntax accepted by the Replace field is very limited, so, in short, no, you can't do the same in the replace field, that is, you can't replace a tab character with a new line character using regex, for example. In fact, if you enter "\n" in the replace field, that will be interpreted literally as "a backslash followed by an n", and that´s exactly what will be used in the replacement.
3) Creating verification settings
SDL Trados Studio's out-of-the-box verification options include the ability to add regex patterns to flag potential errors. In the example below, a rule has been created to tell Studio that when a new line character is found in the source, it should also be present in the target.
4) Adding new segmentation rules to a TM
There are some cases where creating new pattern-based segmentation rules is desirable. A new segmentation rule for line breaks (soft returns), for example, would look like this (there's a dot in the "After break" section, even though it's hard to see):
After the rule has been added to the TM, files that are added to the project will be segmented at every line break, in addition to the usual segmentation. So, for our example above, if we remove the file from the project and add it back after the rule has been added, the new segmentation would look like this:
While the examples in this article use only the tab and new line characters, all kinds of complex regex patterns can be used in the four features that make use of regular expressions in Studio (display filter, search, verification and segmentation), and while linguists don't need to be computer programmers, investing some time to learn the basics of regex will help them save time and work more efficiently.