Thursday, October 24, 2019

Regular Expressions for Translators: Four Applications in SDL Trados Studio

In regular expressions (regex), a new line break (or soft return) and a tab are represented with the following special characters:


In a program like SDL Trados Studio, regular expressions can be used to:

                                        1) Filter on segments that match a certain regex

                                        2) Find text that matches a regex

                                        3) Create verification settings

                                        4) Add new segmentation rules to a TM

Let's use the new line and tab special characters to look at a few examples of these applications.

1) Filtering on segments that contain a new line break

This is achieved by using the regular Display Filter (found in the Review tab), which has regex enabled by default, or the Advanced Display Filter, where regular expressions must be enabled by checking a box.

Tip: For even more powerful filtering, download the Community Advanced Display Filter from the SDL app store.

So, if we have a document that looks like this:



Entering the "new line" regex character in the Display Filter search box produces the following filtered results:



2) Finding a tab

To do this, the regular expressions checkbox in the Find dialog box must be checked. This example shows the results of the search.





Once we learn that we can use regex in the Find dialog box, a natural question is whether the same can be done in a replace operation. The answer is a bit disappointing: while the Find field accepts all kinds of regular expressions, the regex syntax accepted by the Replace field is very limited, so, in short, no, you can't do the same in the replace field, that is, you can't replace a tab character with a new line character using regex, for example. In fact, if you enter "\n" in the replace field, that will be interpreted literally as "a backslash followed by an n", and that´s exactly what will be used in the replacement.

3) Creating verification settings

SDL Trados Studio's out-of-the-box verification options include the ability to add regex patterns to flag potential errors. In the example below, a rule has been created to tell Studio that when a new line character is found in the source, it should also be present in the target.



With the rule in place, once the verification is run, the program will identify any instances where there is a new line character in the source but not in the target.




4) Adding new segmentation rules to a TM

There are some cases where creating new pattern-based segmentation rules is desirable. A new segmentation rule for line breaks (soft returns), for example, would look like this (there's a dot in the "After break" section, even though it's hard to see):



After the rule has been added to the TM, files that are added to the project will be segmented at every line break, in addition to the usual segmentation. So, for our example above, if we remove the file from the project and add it back after the rule has been added, the new segmentation would look like this:



Final words
While the examples in this article use only the tab and new line characters, all kinds of complex regex patterns can be used in the four features that make use of regular expressions in Studio (display filter, search, verification and segmentation), and while linguists don't need to be computer programmers, investing some time to learn the basics of regex will help them save time and work more efficiently.


 ¡Pregunta por los precios especiales de SDL Trados Studio para México!




6 comments:

  1. Great article as usual Nora... but perhaps a small bit if useful information. You can actually replace tabs with a new line character. The way to do it is this:

    1. add a new line char into a segment
    2. copy the new line char to your clipboard
    3. Ctrl+H to bring up your replace dialog
    4. Search (using regex) for \t
    5. Replace by pasting the new line char into the replace field

    You can't see it of course, but it is there, ad it will act as the replacement character.

    ReplyDelete
    Replies
    1. Thank you, Paul! This is very helpful. I was trying to say that you can't use a regular expression per se in the replace field, but you're absolutely right!

      Delete
  2. Thank you for the article!

    Is there any way to save custom verification settings created with regex in order to use them in other projects?

    ReplyDelete
    Replies
    1. Yes! There's an export/import option for QA settings in general (Verification - QA Checker - QA Checker profiles) and for regular expressions in particular (Verification - QA Checker - Regular Expressions - Action - Export/import items).

      Delete
    2. Thank you very much Nora!

      Delete
  3. Thank you Nora, your regex tips and suggestions are always very welcome!

    ReplyDelete