Thursday, November 8, 2012

Multiterm 2011: Avoiding Duplicate Entries When Importing Glossaries

While following the standard import process to reimport the same glossary into Multiterm (for example, after the glossary has grown) will result in duplicate entries, the steps below help prevent this problem.

For the basics on importing glossaries into Multiterm, read this blog entry: Importing Excel Glossaries into Multiterm.

When you're ready to import your glossaries, you will see two standard import definitions in Multiterm, called "Default import definition" and "Synchronize on Entry Number". Neither one of these can be used to avoid duplicate entries, but they can be used as the basis to create a new one.


1.  Right click on Synchronize on Entry Number and choose Duplicate. This will make a copy of the import definition.

2.  Right click on Copy of Synchronize on Entry Number and choose Edit. This will open the Import Wizard, which lists the steps the wizard will go through. The change we need to make to avoid duplicate entries is on Step 4 (Exclude invalid entries from the import), so click Next.


3. Give a more user-friendly name to your new import definition. I have chosen "Avoid Duplicates" for this example. You can also add a description. Click Next.


4.  Choose the file that contains the entries you want to import. Alternatively, you can leave these fields blank and choose your import files later. Click Next. When the wizard asks for an Exclusion File name under Validation Settings, type any name. 



5. This takes us to the key step, Step 4. The default here is "Synchronize entries on entry number", but choosing this will result on duplicate entries if for some reason your glossaries got rearranged. Select "Synchronize entries on index term". Click Next.



6.  Now we choose our Advanced Options for synchronizing on index terms. To keep things simple, I leave the Index as Source, and under "Index term does not exist in the target termbase", leave "Add import entry as new", because otherwise the new entries will not be added to the termbase. For "Index term exists in the target termbase", change the Action to "Omit import entry" if you want to skip it altogether, or choose one of the other options based on your needs.



Click Next a couple more times and the wizard closes, leaving you with a new import definition like this:



Finally, to make sure you can use this new import definition with any of your termbases, be sure to save it by right-clicking and choosing Save. This will save a file with an xdi extension, which you can later load into any other termbase, by right clicking the Import Definition area and choosing Load.

Now, whenever you need to reimport a glossary and you want to avoid getting duplicate entries in your termbase, run this new import definition by right-clicking it and choosing Process.

















8 comments:

  1. Thank you for this post! And what if I already have some entries imported which are repeated? Is there any solution to this?

    ReplyDelete
    Replies
    1. Hi, sorry for the long delay in answering. My solution for this is always to export the whole termbase, clean it up (using Excel, I'll try to write a post about it), then import this new file to create a fresh termbase that doesn't have any duplicates.

      Delete
  2. Hi Nora!
    We have tried the procedure you described above several times before reading your post, but it does not suit our needs. At step 6 - Advanced Options for synchronizing on index terms, none of the options available under "Index term exists in the target termbase" give the expected result.
    Background:
    We have a term base. We send it to a translator and ask him to use the termbase while translating the file. The translator is allowed (encouraged) to add new terms and/or edit existing terms if needed. At the end of the translation, the translator is asked to deliver only edited and/or newly added terms (default export to XML).
    When we import this XML file, using your procedure, we have the following options at step 6:
    - Add import entry as new -> we don't want this, as this will create duplicates.
    - Omit Import entry -> we don't want this, because we want the newly added terms to be added.
    - Omit Import entry and write to output file -> we don't want this because we want the newly added terms to be added.
    - Merge entries -> we don't want this, because merging does not delete possible obsolete information in the existing entry.
    - Overwrite existing entry with import entry -> we don't want this, because the import entry might have less fields (including indexes) than the existing one.

    So, what we do after all is to accept duplicates (“Add import entry as new” option), and then filter duplicated entries, and check them (using the entry number, I know which entry is new, which entry is old) and we merge them (we delete obsolete fields in the process if needed).
    But this is manual work, so time consuming and error-prone. Besides, that work can only be done by an in-house native speaker with enough knowledge of the customer in question. We don’t have such persons for all our term bases.

    Do you have hints on how we could proceed?
    Thanks in advance!

    ReplyDelete
    Replies
    1. Hi,

      The first thing that comes to mind is, why not have the translators export the full termbase, rather than only newly added or edited terms and use that as the new/updated termbase? Unless of course, you want to retain some control over the termbase.

      Another thought: If your translators are using Studio, how about having them use a fresh termbase to add terms (using your master termbase as a secondary termbase in their Studio settings so they would use it for reference but not make changes directly into your master termbase), and then, when the project is over, simply importing the new termbase into the master, overwriting on index match. Or would that be a problem because of the possibility of having less fields, as you explain in your post?

      Editing termbases in Multiterm to delete duplicates is not ideal, so what I do whenever I have duplicates, is export the termbase to Excel (lately I've been using the Glossary Converter app and I'm very pleased with it). After exporting the file to Excel, the latest entries appear at the end of the list, so it would be easy to highlight those in a specific color or change the font color, then sort for duplicates and clean up the termbase in Excel, with the newly added entries easily identifiable because of the highlighting, then reimport to Multiterm. I'm thinking that you could even add a "marker entry" to your termbase before sending it to the translator, for example, the translator's name, and all new entries made by the translator would appear after his/her name in the Excel list.

      This is much faster than using Multiterm to remove the duplicates, but it still implies some manual work.

      I've been planning for some time to write a post about using Excel to clean up duplicates from termbases, but haven't gotten around to it, I'll try to find some time today, hopefully it will help somehow.

      Delete
  3. Hi,

    Thanks for the quick answer.
    Indeed, it would indeed be easier to replace our master term base with the new/udpated term base from the translator. But since we work with multilingual term bases, we cannot do that. At the end of the project, we would receive term bases from different translators, and we really need to find a way to merge all of them (or import terms for different language pairs into the master term base).

    Also, because our term bases have several fields, we don't want new/edited entries to overwrite existing entries. Moreover, we cannot use the Glossary Converter (only converts indexes).

    I guess we will have to do with MultiTerm for the time being, until we find a tool with better import options.

    Thank you!

    ReplyDelete
  4. Thank you so much for this post! Without expert contributions like yours, I would be completely unable to use Multiterm to consolidate my termbases!

    ReplyDelete
  5. Hola Nora:
    Mi nombre es Carlos y tengo un problema con Multiterm 2011.
    He convertido un glosario DE-ES-EN de Multiterm 5 a MT 2011 y cuando cargo las entradas en MT 2011 los registros españoles me salen con caracteres chinos. Los alemanes e ingleses salen correctamente.
    ¿Podrías ayudarme?
    Gracias de antemano

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete