Sunday, May 17, 2020

Two Computers, One Project

Have you ever wondered how you can seamlessly switch from your desktop computer to your laptop computer and viceversa when working on SDL Trados Studio projects?

This video explains how to do it!








Saturday, May 9, 2020

Reorganize your Termbase to Improve Term Recognition in SDL Trados Studio

I first wrote about termbase reorganization in 2013, after the upgrade from SDL Trados Studio 2011 to SDL Trados Studio 2014 alerted users to the need to reorganize termbases.


Seven years later, and almost two years after SDL Trados Studio 2019 first came out, we no longer see this reminder, but may still need to run a termbase reorganization now and then to improve term recognition.

If you've run into a situation where you just know your termbase contains certain terms but are not seeing those terms being recognized, try reorganizing your termbase, it may just be what you need to get things working as they should.



Below are the steps to quickly reorganize your termbases.

1. Open your termbase in Multiterm. 

If you haven't installed Multiterm yet, go to your SDL account, download the program and install it. Once installed, double-clicking a termbase file (*.sdltb) in Windows Explorer will open the termbase in Multiterm.


2. Reorganize the termbase

With the termbase(s) open in Multiterm, go to Termbase Management, make sure the Home tab is selected and click on Reorganize and select the termbases you want to reorganize. Note that you can run the process on several termbases at once.


The reorganization process is pretty fast, so you should be done in only a few minutes, or maybe even seconds, if the termbase is small.


After the process is complete, you can close Multiterm and go back to SDL Trados Studio, where you should start seeing improved term recognition.







Saturday, April 11, 2020

Streamlining Written Communication in the Booth: AutoHotkey Text Expansion for Interpreters

Interpreters must often juggle a number of tasks while in the booth, and communicating with others shouldn't be a cause for distraction, whether the interpreter is working on site or in a remote interpretation environment.

This post provides an option to simplify written communication between the interpreter and a boothmate, a technician or anybody else.

The list of phrases is fully customizable, and so are the abbreviations (hotstrings).

Here's a look at how it would work in WhatsApp (on a computer):





Follow the steps below to start using AutoHotkey text expansions right away.

Download AutoHotkey (www.autohotkey.com) and install it. Once installed, you won't see anything open. That's normal. AutoHotkey runs in the background and allows you to run your own scripts (macros), which are created in plain text editors, such as Notepad or Notepad++.

To use an existing script (download a sample script here):

1. Save the *.ahk file to a folder on your computer.


2. Double-click to activate it. Once the script is active, you will see a green square with a white H in it in your system tray. If you hover your mouse over it, you will see the name of the active script.



You are now ready to start using the expansions contained in that file. The sample file above, for example, contains the following:


The text inside the two pairs of colons is the abbreviation, or hotstring, that you would use to trigger the full phrase, or expansion, which appears after the last colon.

In this example, if you type "tp" in any program or chat box followed by a space, tab, punctuation mark or Enter, the abbreviation is replaced by the expansion "I'm having technical problems". Using the same procedure replaces each of the abbreviations above with their corresponding expansion.


You can see the full list included in the file by right-clicking on the ahk file in Windows Explorer and opening it with a plain text editor (Notepad++ used for the screenshot above), so you don't need to memorize all the abbreviations, you just need to keep them handy.

Note that if you would like to change the abbreviations (hotstrings), you can easily do so, and you can also add your own phrases, so keep reading if you're interested in doing that.


To edit an existing script:

Scripts are edited in plain text editors, so in order to edit the script, follow these steps.

1. Open the script in a plain text editor.

2. Edit the script as desired, changing the existing abbreviations or expansions, or adding new ones. When adding new abbreviations, follow the syntax in the examples, placing your abbreviation inside two pairs of colons and your expansion after the last colon.

3. Save the file.

4. In Windows Explorer, double-click the name of the file to either load it, if not active yet, or reload it, if already active, to enable the changes.

Alternatively, after saving your changes, you can right-click on the name of the script in the system tray and select Reload This Script.


This context menu also offers you options to pause or exit the script.

To create a new script:

1. Go to a folder in Windows Explorer where you would like to save your script (I have a folder called AutoHotkey Scripts just to keep them all in one place). Right-click on an empty space in the folder and select New, then AutoHotkey Script. Give a name to your script and save it.

So far, you have the empty "skeleton" of a script. Now you need to enter the actions you want it to execute.

3. Right-click on the script and select Open, then open it with a text editor, such as Notepad++.

4. Once the file is open, you will see that there's already some text in it. Paste your script code in a new line below it.

5. Save the file. Now double-click the file, and this will load the script. Look for a green square with a white H in it in your system tray, which indicates that the script is active.

6. Now that the script is active, type your hotstring in any program or chat box to trigger the expansion in the script.

Text expansions is only one of the many things you can do with AutoHotkey, so I hope this short intro will help you get started and pique your interest to learn more about this great tool.

And lastly, if you're thinking that you would also like to enjoy text expansions in your smartphone or tablet, check out apps such as Texpand for Android or TextExpander for iOS.

Sunday, April 5, 2020

Importing termbases into TMs

The process to import an existing termbase into a translation memory in SDL Trados Studio involves the following steps:

1. Converting the sdltb file to an Excel file, which is easily achieved with the Glossary Converter app, available for free in the SDL App Store

2. Adding the Excel file to a project, making sure the Bilingual Excel filter is enabled

3. Opening the file, changing segment status to Translated and running an Update Main TM batch task

See the process in action here:


Wednesday, February 26, 2020

Regex for Non-Latin Alphabets

A guest post by Salvador Virgen


Not long ago, after a RegEx workshop I taught along with Nora Díaz, I was approached by one of the students who asked: "How can you look for words in other alphabets?" I answered that for single characters there is the Unicode escape sequence, but I had no idea of how to do it in other alphabets. I did a little research and this article answers the question.

When you look for (or filter for) a single character, Unicode is pretty straightforward. For example, if you filter for \u00a9 in Trados you will be seeing only the segments containing the copyright symbol. What about entire alphabets? I did some research and came up with some answers.

Greek


Fortunately, Unicode designers put letters in a contiguous block. Lowercase Greek letters are in positions 0x03B1(α) through 0x03C9 (ω) and uppercase letters are in positions 0x0391 (Α) through 0x03A9 (Ω). In Trados you could just filter for

[\u03b1-\u03c9\u0391-\u03A9]

or

[α-ωΑ-Ω]

just like you would look for [a-zA-Z] in the Latin alphabet.

One nifty thing about Unicode is that it includes the two forms of the lowercase sigma: ς and σ. In the uppercase, which has only one form of sigma there is a “hole”, a non-defined character, between rho and sigma so the block size for lowercase is the same for uppercase.

Be advised that this method cannot find letters with diacritics, which are outside this block. So you cannot look for, say, alpha with a circumflex accent, but if you are looking for the resistivity symbol (lowercase rho, ρ) or the cosmological constant (uppercase lambda, Λ), you will be covered.

Hebrew


The Hebrew alphabet has 14 letters and 5 of them have a different form when written at the end of a word. The letters are in Unicode in block 0x05d0 (alef, א) through 0x05ea (ת, tav). There are no different forms for uppercase.

So, you could filter for

[\u05d0-\u05ea]

or

[א-ת]

Please notice that the first letter, alef (א), is written to the right of the hyphen, apparently against the rule that the lower limit on a range should be written to the left. This is because Hebrew is a right-to-left language, and so the alef is actually written in front of the hyphen.

Again, be advised that this method cannot find letters with diacritics.

Cyrillic


Cyrillic is the only alphabet of a widespread language whose creator is identified. Cyrillic is used for Russian, Bulgarian, Belarusian and Ukrainian, among many others; its users are counted by the hundreds of millions and they are in many countries. The bulk of the Cyrillic letters are in Unicode positions 0x0400 through 0x044F (uppercase and lowercase). 

If you want to look for uppercase letters, search for

[\u0410-\u042F] or [А-Я]

For lowercase, filter for

[\u0x430-\u044F] or [а-я]

And for the whole alphabet

[\u0410-\u044F] or [А-я]

However, if you want to play it safe, filter for

[\u0400-\u04ff] or [Ѐ-ӿ]

This covers the whole gamut, from ye with grave (Ѐ) thru kha with stroke (ӿ).

Conclusions


Looking for strings of non-Latin characters could appear to be an intimidating task, but thanks to ingenious Unicode design and to a clever Regex implementation, building and understanding these regexes is not difficult. The only difficult part is that many programs switch directions upon detecting a right-to-left language character, and moving the cursor around can be tricky, but a workaround for this is writing the range limits as Unicode sequences, which never change direction themselves.

References


Sunday, November 17, 2019

Regular Expressions for Translators: Replacements




One of the first things that I wanted to learn when I first started looking into regular expressions was how to do replacements. In this article we will look at how regex replacements work and how we can use them in SDL Trados Studio.

As with any replacement operation, we must first find the string that we want to replace. To do this, we can use any regex built with metacharactersanchorscharacter classesspecial charactersquantifiers, and groups and ranges. In the replacement part, however, none of these regular expression elements are supported, and we can only use literal characters and substitutions consisting of a dollar sign followed by a number. The $ is the only special character that can appear both in a regex pattern or in a substitution, although with different meanings. In a regex, $ is an anchor that indicates the end of a string. In a replacement pattern, it indicates the beginning of a substitution.

Substitution elements such as $1 or $3 represent the capturing groups in the regular expression matched in the "Find" part of the operation, with groups being assigned consecutive numbers from left to right, starting with 1.

Let's look at a few examples.





Regex pattern: (\w+)our
Replacement pattern:  $1or

In our first example, we want to replace the British word ending "-our" with the American spelling "-or".



In the example above, the regex pattern matches any word character, one or more times, followed by the literal characters "our". This pattern will match each of the words in the sample string: behaviour, colour, humour, labour, neighbour and flavour, and their plural forms, and will allow us to replace them. The pattern to the left of "our" is in parentheses, indicating that it is a group. Since regex groups are automatically assigned consecutive numbers from left to right, this would be group 1. We place this part of our regex in a group for the specific purpose of using the matched contents in the replacement, by using its corresponding substitution element.

Now, in the replacement, we will use a substitution element, $1, which means that the contents of group 1 will be transferred to the replacement, plus the literal characters "or", the replacement for "our", to change the spelling from British to American.

The animated gif below shows how each word is matched by the Find operation and then replaced.









Regex pattern: (\d+th)(\s)(October|November|December)
Replacement pattern:  $3 $1



In our second example, we want to change dates such as 20th November to November 20th. We will use a regex that has three groups:

Group 1: (\d+th)
One or more digits followed by the literal characters "th"
Substitution element: $1 

Group 2: (\s) 
A space
Substitution element: $2

Group 3: (October|November|December)
Any of these words
Substitution element: $3

In the replacement pattern, we need to place Group 3 ($3) at the beginning, followed by a space, followed by Group 1 ($1). Since the space is in its own capturing group, it could be represented by $2, but in this example I chose to enter a literal space, by pressing the spacebar, between $3 and $1, which works just as well. Note that what we can't do is use \s in the replacement pattern to enter a space. If we use \s in the replacement pattern, the literal characters "\s" will appear in the replacement text.

Here's the replacement operation in action.








Regex pattern: (\d+th)(?:\s)(October|November|December)
Replacement pattern:  $2 $1

In the previous example, we used 3 capturing groups in the regex. If instead we place the space inside a non-capturing group*, then the numbers assigned to the groups would change, and the replacement pattern would be different.


Here, we also have 3 groups, but one is a non-capturing group, indicated by the ?: inside the parentheses.

Group 1: (\d+th)
One or more digits followed by the literal characters "th"
Substitution element: $1 

Group 2: (?:\s) 
A space
Substitution element: None, this is a non-capturing group, so its contents are not saved to be used later on.

Group 3: (October|November|December)
Any of these words
Substitution element: $2



Note: In this example, the space is placed inside a non-capturing group only to give an example of how non-capturing groups work, but actually we could use a regex that doesn't place the space in a group, with the same effect: (\d+th)\s(October|November|December).






We have said before that capturing groups are assigned consecutive numbers, starting with 1, that can be used later on in the replacement pattern. But what if we want to use the entire matched string in the replacement operation? In that case, we use $0. You may be wondering when you would need to do this. Consider the following example.


Regex pattern: (\d+,)?\d+\.\d+\scash
Replacement pattern: $$$0



In this example, we want to add a dollar sign in front of any instances of an amount followed by a space and the word cash.

The regex pattern captures the various amounts by matching one or more digits followed by a comma (this first part is made optional by placing the regex in a group and adding the ? quantifier, which means 0 or 1 times), followed by one ore more digits followed by a point, followed by one or more digits, followed by a space and the word "cash".

Since the dollar sign is a special character in the replacement pattern, when we want to enter a literal dollar sign in the replacement, we must use $$. Thus, $$$0 means a dollar sign ($$) followed by the entire string ($0) that matches the regex.

See the replacement in action below.


For this article, we've looked at replacements in the Editor view, but SDL Trados Studio also accepts regex replacements in the Translation Memories view and regex replacement syntax in the Verification settings.

With this, we have come to the end of this article. If you'd like to have a copy of my cheat sheet, you can download it here:




 ¡Pregunta por los precios especiales de SDL Trados Studio para México!







Tuesday, November 12, 2019

Regular Expressions for Translators: Groups and Ranges

In this article, we'll talk about groups and ranges in regular expressions and how they can be used by translators in CAT Tools such as SDL Trados Studio. 


Let's have a look at these regex components and some of their applications. 





The previous post about quantifiers ends with a brief introduction of the function of the dot in regular expressions: a wildcard that represents any character.


Regex example: .*?,

Using a single dot will match any one character. Combining the dot with a quantifier will match more than one. This regex will match all the text up to a comma, as shown below:



A word of caution about the dot in regex
While it may seem tempting to use the dot wildcard frequently, one must be aware of potential undesired results.

For example, imagine that we want to find all the text that comes between straight quotation marks so we can later replace the straight quotation marks with curly quotation marks. Using a regex such as ".+" (a straight quotation mark followed by anything, one or more times, followed by a straight quotation mark) would seem like an easy solution, but look at what can happen below:


Instead of getting two matches: "I will see you there" and "don't be late", we get a single match, from the very first quotation mark in the segment to the very last one.

These undesired results are not always evident when using a regular expression in the SDL Trados Studio display filter, for example, so it's always a good idea to test the regex in a regex tester such as regexstorm.net/tester, which I will use for the examples in this article.

Bonus tip: A better regex to find each separate instance of text inside straight quotes is "[^"]*".








Regex example: col(o|ou)r

In regular expressions, the vertical bar or pipe character | indicates alternation. Using the pipe tells the regex to match everything to the left or everything to the right of the pipe, as shown here:


Here, the strings that match the regex colo|our are "colo" and "our". If we want to match "color" and "colour" instead, we need to use parentheses:



Look at the example below to see how alternation can be used to match any of the days of the week.


Now, you may notice that in this example, all the text to the left or right of the pipe is matched, whether it's a whole word or not. If what we want to do is match whole words only, we can use parentheses to create a group and then apply word boundaries to the entire group.


I use this regex in a verification rule in SDL Trados Studio to alert me about segments where the day of the week is not present in the source but is present in the target:








Regex example: (\d+,)+

Parentheses are used to create groups in regular expressions. Look at this example:


Here, the regex \d+, is matched twice, once by "123," and once by "456,". If instead we want to include both instances in a single match, we need to add parentheses to the expression and then add the + quantifier to the group.



A group can also be a single character, as in the example below, where the ? quantifier (which means 0 or 1) is used to make the s character optional.



Lastly, groups have two purposes in regular expressions: to organize information and to capture the contents of the group. The captured information is "remembered" by the regex engine and can later be used for backreferences or substitutions.

Consider this example:


Here, the regex pattern has been organized into five groups, as shown in the table above, at the bottom of the regex tester window. Each group is assigned a consecutive number, so group 1 captures the character sequence 123, group 2 captures the first comma, group 3 captures the character sequence 456, group 4 captures the second comma and group 5 captures the character sequence 789.

In the replacement pattern, we can rearrange the groups by representing each group with the dollar sign followed by the group number. In our example, the groups have been rearranged to $5$2$3$4$1, resulting in the replacement string 789,456,123. 

Note: The same result can be achieved by using commas instead of the numbered groups $2 and $4, which would make the replacement pattern $5,$3,$1.





In addition to regular groups, there is a less commonly used type of group, called passive or non-capturing. The only difference between a non-capturing group and a regular group is that a non-capturing group organizes the information contained in the group, but doesn't capture it, that is, the information in the group is not assigned a group number. 

Let's use the same example we used before to understand how this works. Instead of having five regular groups, the commas will now be placed inside non-capturing groups by using the following syntax: (?:,).


While we still have 5 groups organizing the information, two of them are non-capturing, so the number of groups available for substitutions (replacement operations) is reduced to three, as shown in the table at the bottom of the regex tester window.

With this, the replacement pattern to achieve the same result as in the previous example is now $3,$2,$1.

While there aren't many use cases that come to mind for using non-capturing groups, they can come in handy when one wants to keep the number of capturing groups down to avoid having to keep track of too many group numbers.





Regex example: \d+[abz?*]

Placing characters inside square brackets means that any one of the characters in that set can be matched in that position, in no particular order. Have a look at this example:


Note that when used inside a character set range, metacharacters don't need escaping.





Regex example: \d+[^abz?*]

Adding a caret (^) inside the square brackets means that the characters included inside the square brackets should be excluded.


This is how the quotation mark regex mentioned earlier works:  "[^"]*" will be matched by a quotation mark followed by zero or more of any character except a quotation mark, followed by a quotation mark.








Lastly, let's have a look at these character ranges: lowercase letters, uppercase letters and digits.

Regex example: [A-Z][a-z]+

Lowercase and uppercase letters can be helpful when we need to specify case, for instance, when we want to find words that begin with a capital letter.



The sample regex here means one uppercase letter followed by one or more lowercase letters. But if this is the case, then how come the words "Añoranzas" and "Épicas" are not matched? The reason is that the ranges [A-Z] and [a-z] include only characters in the English alphabet. A solution to include other non-English letters is to add them to the character set:


While these examples use the full range of letters in the English alphabet, it's also possible to limit the range. In the example below, by limiting the uppercase range to "A-I", the words "The" and "La" are excluded from the matches.




Regex example: [0-9.,/"'-]+

While we could say that [0-9] is basically the same as \d, the [0-9] character range offers a bit more flexibility, as we can easily throw in a few other characters into the range to help us cover a variety of number formats.


In this example, the regex matches numbers with decimals, commas, fractions, and dashes, without having to come up with any complex expressions. While this may not be the most elegant solution for someone writing code for a program, it certainly can be a time-saver for a translator wanting to filter segments.

With this, we have come to the end of this article. If you'd like to have a copy of my cheat sheet, you can download it here:


Happy regexing!



 ¡Pregunta por los precios especiales de SDL Trados Studio para México!