Today, let’s look at a simple but effective regex pattern designed to filter segments that start with a Spanish verb in the infinitive form. A specific use case would be a review job where there are inconsistencies, with some segments translated using the infinitive and others using the imperative form of the verb at the beginning of a segment. If we decide to change all the infinitives to imperatives, then it comes in handy to be able to filter and see only the segments that fit this criterion.
The Regex Pattern
Here is the regex pattern:
^\b(?:[a-záéíóúüñ]+ar|[a-záéíóúüñ]+er|[a-záéíóúüñ]+ir)\b
Let’s break it down:
^
: Ensures the pattern matches from the beginning of the segment.\b
: Marks a word boundary to capture entire words.(?: ... )
: Groups the options for non-capturing purposes, which means the parentheses won’t create a separate match group.[a-záéíóúüñ]+
: Matches the root of the verb, allowing for lowercase letters and accented vowels commonly found in Spanish.ar|er|ir
: Specifies the endings for verbs in the infinitive form.\b
: Ensures the word ends here.
What Does This Regex Do?
This regex identifies segments that start with verbs in the infinitive form. For example, it will match segments like:
- "Vincular la información."
- "Escribir un informe detallado."
- "Responder de manera oportuna."
However, it will ignore other types of segments, such as:
- "Vincule la información."
- "Escriba un informe detallado."
- "Responda de manera oportuna."
Trados Studio file, no filtering Trados Studio file, regex-based filtering
A word of warning: The regex doesn't actually match "infinitive verbs", but actually words that end in -ar, -er, or -ir, which is the structure of infinitive verbs in Spanish. This means that the regex will also match other words with the same structure that are not verbs, such as ayer, tapir, or hogar.
Why Is This Useful for Translators?
As translators, we often need to apply specific rules or filters to certain types of text. In the case of Spanish, infinitives are frequently used for:
- Instructional text (e.g., in manuals or guides): "Llenar el formulario."
- Headings or titles: "Comprar boletos."
- General-purpose commands: "Configurar el dispositivo."
However, some clients may prefer the imperative form, and we may find ourselves in a situation where we need to identify segments where an edit is needed.
By using this regex in Trados Studio, or any other CAT tool that supports regex, you can quickly locate and isolate these segments for editing, consistent formatting, terminology application, or quality assurance checks.
How to Use the Regex in Trados Studio
- Open your document in Trados Studio.
- Go to the "Review" tab.
- Paste the regex into the filter field*, making sure you have Source or Target selected, as appropriate.
- Apply the filter, and the tool will display only the segments starting with an infinitive.
Leveraging GenAI for Regex
Does writing a regex look too complicated? No need to worry. With GenAI, getting a regex like the one above is as simple as going to ChatGPT (or your chatbot of choice) and saying "Give me a regex that will find segments that begin with a Spanish verb in the infinitive form".
While learning regex continues to be a valuable skill for a translator, being able to describe the pattern you need to match can be just as useful when using a GenAI tool.
ChatGPT can help you write regexes
Learn More
Regex can seem intimidating at first, but with a few practical examples, it quickly becomes an indispensable tool. To dive deeper into how regex can enhance your work as a translator, check out The Translator's Tool Box book. It offers clear explanations, examples, and step-by-step instructions tailored to translators.
No comments:
Post a Comment