Showing posts with label duplicates. Show all posts
Showing posts with label duplicates. Show all posts

Friday, January 15, 2016

Removing Duplicates from a Multiterm Termbase

Duplicates in a Multiterm termbase can clutter up the term recognition list in Studio and make files larger than they need to be, so it makes sense to keep our termbases as duplicate-free as possible. Here's a quick and easy how-to for termbase maintenance.

Step 1. Convert the termbase to an Excel file

The easiest and fastest way to do this is to use the Open Exchange Glossary Converter app. It's a simple matter of dragging and dropping the termbase onto the app, and just like that, an Excel file will be created in the same folder where the exported termbase is stored.



Step 2. Remove duplicates in Excel

Open the converted file in Excel, and go to Data - Data Tools - Remove Duplicates. Excel will tell you how many duplicates were removed and how many entries are still left.





Step 3. Convert the Excel file back to a termbase

Once again, drag and drop the file (the Excel file this time) onto the Glossary Converter and let it work its magic. You can either overwrite the existing termbase or save it under a new name.




And that's all there is to it. The whole process doesn't take more than a few minutes. Of course, all the standard data back-up warnings apply, and it's advisable to make a copy of the termbase before starting the process, just in case.

Thursday, November 8, 2012

Multiterm 2011: Avoiding Duplicate Entries When Importing Glossaries

While following the standard import process to reimport the same glossary into Multiterm (for example, after the glossary has grown) will result in duplicate entries, the steps below help prevent this problem.

For the basics on importing glossaries into Multiterm, read this blog entry: Importing Excel Glossaries into Multiterm.

When you're ready to import your glossaries, you will see two standard import definitions in Multiterm, called "Default import definition" and "Synchronize on Entry Number". Neither one of these can be used to avoid duplicate entries, but they can be used as the basis to create a new one.


1.  Right click on Synchronize on Entry Number and choose Duplicate. This will make a copy of the import definition.

2.  Right click on Copy of Synchronize on Entry Number and choose Edit. This will open the Import Wizard, which lists the steps the wizard will go through. The change we need to make to avoid duplicate entries is on Step 4 (Exclude invalid entries from the import), so click Next.


3. Give a more user-friendly name to your new import definition. I have chosen "Avoid Duplicates" for this example. You can also add a description. Click Next.


4.  Choose the file that contains the entries you want to import. Alternatively, you can leave these fields blank and choose your import files later. Click Next. When the wizard asks for an Exclusion File name under Validation Settings, type any name. 



5. This takes us to the key step, Step 4. The default here is "Synchronize entries on entry number", but choosing this will result on duplicate entries if for some reason your glossaries got rearranged. Select "Synchronize entries on index term". Click Next.



6.  Now we choose our Advanced Options for synchronizing on index terms. To keep things simple, I leave the Index as Source, and under "Index term does not exist in the target termbase", leave "Add import entry as new", because otherwise the new entries will not be added to the termbase. For "Index term exists in the target termbase", change the Action to "Omit import entry" if you want to skip it altogether, or choose one of the other options based on your needs.



Click Next a couple more times and the wizard closes, leaving you with a new import definition like this:



Finally, to make sure you can use this new import definition with any of your termbases, be sure to save it by right-clicking and choosing Save. This will save a file with an xdi extension, which you can later load into any other termbase, by right clicking the Import Definition area and choosing Load.

Now, whenever you need to reimport a glossary and you want to avoid getting duplicate entries in your termbase, run this new import definition by right-clicking it and choosing Process.