In a program like SDL Trados Studio, regular expressions can be used to:
1) Filter on segments that match a certain regex
2) Find text that matches a regex
3) Create verification settings
4) Add new segmentation rules to a TM
Let's use the new line and tab special characters to look at a few examples of these applications.
1) Filtering on segments that contain a new line break
This is achieved by using the regular Display Filter (found in the Review tab), which has regex enabled by default, or the Advanced Display Filter, where regular expressions must be enabled by checking a box.
Tip: For even more powerful filtering, download the Community Advanced Display Filter from the SDL app store.
So, if we have a document that looks like this:
Entering the "new line" regex character in the Display Filter search box produces the following filtered results:
2) Finding a tab
To do this, the regular expressions checkbox in the Find dialog box must be checked. This example shows the results of the search.
Once we learn that we can use regex in the Find dialog box, a natural question is whether the same can be done in a replace operation. The answer is a bit disappointing: while the Find field accepts all kinds of regular expressions, the regex syntax accepted by the Replace field is very limited, so, in short, no, you can't do the same in the replace field, that is, you can't replace a tab character with a new line character using regex, for example. In fact, if you enter "\n" in the replace field, that will be interpreted literally as "a backslash followed by an n", and that´s exactly what will be used in the replacement.
3) Creating verification settings
SDL Trados Studio's out-of-the-box verification options include the ability to add regex patterns to flag potential errors. In the example below, a rule has been created to tell Studio that when a new line character is found in the source, it should also be present in the target.
4) Adding new segmentation rules to a TM
There are some cases where creating new pattern-based segmentation rules is desirable. A new segmentation rule for line breaks (soft returns), for example, would look like this (there's a dot in the "After break" section, even though it's hard to see):
After the rule has been added to the TM, files that are added to the project will be segmented at every line break, in addition to the usual segmentation. So, for our example above, if we remove the file from the project and add it back after the rule has been added, the new segmentation would look like this:
While the examples in this article use only the tab and new line characters, all kinds of complex regex patterns can be used in the four features that make use of regular expressions in Studio (display filter, search, verification and segmentation), and while linguists don't need to be computer programmers, investing some time to learn the basics of regex will help them save time and work more efficiently.
Great article as usual Nora... but perhaps a small bit if useful information. You can actually replace tabs with a new line character. The way to do it is this:
ReplyDelete1. add a new line char into a segment
2. copy the new line char to your clipboard
3. Ctrl+H to bring up your replace dialog
4. Search (using regex) for \t
5. Replace by pasting the new line char into the replace field
You can't see it of course, but it is there, ad it will act as the replacement character.
Thank you, Paul! This is very helpful. I was trying to say that you can't use a regular expression per se in the replace field, but you're absolutely right!
DeleteThank you for the article!
ReplyDeleteIs there any way to save custom verification settings created with regex in order to use them in other projects?
Yes! There's an export/import option for QA settings in general (Verification - QA Checker - QA Checker profiles) and for regular expressions in particular (Verification - QA Checker - Regular Expressions - Action - Export/import items).
DeleteThank you very much Nora!
DeleteThank you Nora, your regex tips and suggestions are always very welcome!
ReplyDelete