In previous articles, we
have learned about metacharacters, anchors and special characters. In this
article we’ll talk about character classes.
First, let’s understand
which characters are included in or excluded from each character class.
Let’s consider the following strings containing English and Spanish characters. This is not meant to be an exhaustive list of characters, but rather something we can use to understand the concept of character classes.
abcdefghijklmnñopqrstuvwxyz ABCDEFGHIJKLMNÑOPQRSTUVWXYZ
áéíóúÁÉÍÓÚ 1234567890-!"#$%&/()=?¡[¨*]{}´+,.;:_
To clearly visualize which characters are matched when each character class is used, I will use the above strings and a regex tester called Regex Hero, which alternates between yellow and orange highlighting to show each subsequent match.
Let’s consider the following strings containing English and Spanish characters. This is not meant to be an exhaustive list of characters, but rather something we can use to understand the concept of character classes.
abcdefghijklmnñopqrstuvwxyz ABCDEFGHIJKLMNÑOPQRSTUVWXYZ
áéíóúÁÉÍÓÚ 1234567890-!"#$%&/()=?¡[¨*]{}´+,.;:_
To clearly visualize which characters are matched when each character class is used, I will use the above strings and a regex tester called Regex Hero, which alternates between yellow and orange highlighting to show each subsequent match.
As shown below, each of
the 10 digits in the sample text is matched when we use \d.
Using \D will
match anything that is not a digit, so all of the other characters, including
white spaces, are matched here.
\w will match any word character, which
includes all the letters, numbers and the underscore, as shown below.
Note that these regexes match one instance of the corresponding character. To match more than one instance, we will learn about quantifiers in a future article.
The video below shows the regexes being used with the Find operation in SDL Trados Studio.
Application
Now let’s use these character classes combined with the start of segment and end of segment anchors and escaped metacharacters to see some possible use cases with the display filter in SDL Trados Studio.
Here's the unfiltered text I will use for these examples:
Now let’s use these character classes combined with the start of segment and end of segment anchors and escaped metacharacters to see some possible use cases with the display filter in SDL Trados Studio.
Here's the unfiltered text I will use for these examples:
For this demonstration, instead of adding actual translations on the target side, I have just copied the source over and have added some superfluous spaces and tabs.
Example 1: Filter on target segments that start with a white space (space or tab)
Regex: ^\s
Example 2: Filter on target segments that end in a white space
Regex: \s$
Example 3: Filter on target segments that end in a number
Regex: \d$
Example 4: Filter on target segments that start with a word character
Regex: ^\w
Example 5: Filter on target segments that don't end in a word character
Regex: \W$
Example 6: Filter on target segments that end with a space followed by a period
Regex: \s\.$
Example 7: Filter on target segments that end in a "not white space" character followed by a question mark
Regex: \S\?$
Example 8: Filter on target segments that end in a digit followed by a "not word" character followed by a period
Regex: \d\W\.$
As we can see, by combining character classes, anchors and escaped metacharacters, we can start enhancing our use of SDL Trados Studio's display filter.
Remember that in addition to the display filter, SDL Trados Studio accepts
regular expressions in the Find and Replace dialog box, the segmentation rules
and the verification settings.
No comments:
Post a Comment