Luis Diego Farias

Microsoft Teams is Getting an Upgrade

Luis Diego Farias — Fri, 22 Nov 2024 21:31:06 +0000

Soon, you can be “speaking” nine languages in Teams—or at least your AI voice can. Microsoft is set to launch an AI agent in Teams that will allow you to clone your voice and translate your speech in real-time.

The ‘Interpreter’ agent will be available to Microsoft 365 subscribers in early 2025 and can simulate your voice in nine languages.

The post Microsoft Teams is Getting an Upgrade appeared first on Globalization Partners International.

Pseudo Localization and Why It’s Important

Luis Diego Farias — Wed, 24 May 2023 21:39:28 +0000

What is Pseudo Localization?

Let’s start this article by defining the word “Pseudo localization”.

Pseudo means pretended and not real. Therefore, pseudo localization means localization that is not real.

Pseudo localization is a way of testing the content of an application or software to demonstrate that it is suitable to be internationalized and localized.

All translatable strings in an application are replaced with text strings that are very similar to the target language. Still, instead of containing an actual translation, they contain text with multiple alterations, such as banned characters, longer text strings, or text with a different direction.

Pseudo Localization Process

The pseudo-localization process aims to find errors in localizable elements before it’s too late.

In addition, it also serves to simulate the final result of the localized object and estimate the additional effort to be able to fix it before delivery.

Some problems that pseudo-localization can detect are:

Corrupted characters: Here you can see if your application engine is e.g., ASCII or ANSI characters are supported.
Text that does not fit in the graphical interface and needs to be shortened or trimmed.
Interface issues with right-to-left languages.
Problems with languages whose characters may differ from those in the source language. Some languages, e.g., Swedish for example, contain characters (Å, Ä, and Ö) that are not in the English language and can therefore cause problems if the application is not prepared for them.

Additionally, pseudo-localization is very useful for detecting text that has been translated but should not be translated.

Where is it Used?

During the development of some applications and operating systems, such as Windows, various builds allowed the use of language packs with pseudo-locals.

These pseudo-locals contained text strings identical to those in English, except that the characters were changed to English-like characters but with accents and characters from other languages. This is very useful because the text is not completely gibberish but is perfectly understandable and it provided a very good way of detecting localization problems, even being able to read the text strings.

Can you read this?

[Шěđлеśđαỳ !!!], 21 ōf [Děcěmßëŕ !!] ōf 2022.

Conclusion

Pseudo-localization allows you to verify that your product is localizable without having to localize it into an actual language. Many potential localization issues can be discovered without the need for an actual translation.

Let’s imagine what would happen if placeholders or tags get translated. If these are unintentionally “translated,” the build could fail, or the app could crash during use.

Pseudo localization results are not only useful for the software localization process, but it is also useful in the document localization process to estimate DTP efforts and detect text that has not been included for translation.

Pseudo localization also aids in the identification and rectification of hard-coded text strings, fostering a more robust, concrete, and maintainable code base, allowing developers to check for language support, text expansion, truncated text, and formatting without the need for actual translations.

The post Pseudo Localization and Why It’s Important appeared first on Globalization Partners International.

Neural versus Phrase-Based Machine Translation

Luis Diego Farias — Wed, 30 Nov 2022 22:50:22 +0000

What is Machine Translation?

Machine Translation is the process of translating content from one language to another, without the intervention of any human being.

Throughout history, there has always been a need for automatic translation without human intervention. The first experiments in machine translation date back to the late 1950s, when IBM, in collaboration with Georgetown University, translated more than 60 words from Russian to English.

About 60 Russian phrases related to political, legal, mathematical, or scientific topics were entered into the machine, which automatically translated them into English.

It wasn’t until the early 2000’s that the necessary hardware and software for more consistent translation became available.

Why is MT hard?

Israeli mathematician and machine translation pioneer Bar-Hillel presented a problem of how a translation system would deal with the phrase “The Box is in the Pen.”

The problem here is clear: The word “pen” has more than one meaning. It can mean “pen,” a writing tool, and at the same time, it can mean “playpen” for children.

To make a correct translation from one language to another, the system must determine which of the two uses of pen is the most appropriate.

«Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy»

A human reader will understand that the word “pen” refers to the playpen and not to a pen for writing.
The first sentence indicates “box” as a “box” that contains the toys. The reader is already aware that the box is much larger than a pen, so the first interpretation is automatically excluded without the reader having to think about it.

Where do we stand today?

The development of the Internet, together with globalization, produced a great demand for translation services and machine translation.

Global businesses, as well as economic growth in emerging markets, fueled the need for practical and decent business products that allow content to be translated into different language pairs.

Neural Machine Translation, Statistical Machine Translation, and a Little Bit of History

Neural Machine Translation (NMT) uses artificial intelligence to learn the rules of different languages and constantly improve. It works like a neuron that learns from specific materials and can predict the probability of a sequence of words.

Why is NMT so popular?

Improvements in learning algorithms, the ease of obtaining data to train the translation engine, as well as having the computational power necessary to train computers with a massive amount of information, have popularized NMT in recent years, to the point that is becoming a standard in MT, being adopted by different companies such as Google and Microsoft.

In many scenarios, NMT performs better, yields better results, and is much easier to maintain than a rule-based engine.

Statistical Machine Translation (SMT)

This model was promoted by IBM in the early 1990s. It evolved from word-level translation to phrase-based translation.

It’s training is based on creating a model that contains a sentence in a source language and its corresponding translation in the target language, creating a multilingual database.

Some MT advantages to think about…

Some CAT (Computer-Assisted Translation) Tools allow the major MT providers to be integrated into the tool, either through a plugin or API (Application Programming Interface).
Being able to translate many words, from many language pairs in a matter of minutes can drive down costs and increase delivery time.
Machine translation is very fast. Like really fast. Thousands of words can be translated into multiple language pairs in a matter of minutes.
In an MT workflow, the human translator does not disappear, but rather participates in the post-editing process, allowing the result obtained from the MT to be refined.

Conclusion:

Machine translation has come a long way, from the first experiments in 1946 to be able to translate a large volume of text in a matter of seconds using an engine that imitates human neurons.
Even so, MT is still in constant evolution, with improved algorithms and greater computational power.

GPI’s Machine Translation (MT) implementations ensure that NMT is a good candidate for the client’s needs. Firstly, carrying out a test project to determine the human translation effort required to edit the output, as well as the creation of a custom engine based on the content and the desired language pairs. GPI’s NMT solutions ensure savings and greater productivity.

The post Neural versus Phrase-Based Machine Translation appeared first on Globalization Partners International.

Tips for Working from Home Securely

Luis Diego Farias — Wed, 06 Jul 2022 11:41:13 +0000

When you work remotely, you are solely responsible for your security and that of your information. In your home, you do not have the same benefits that you have in an office, such as access control or security cameras. This makes working remotely from home risky and an easy target for property theft, competitors, or criminals.

Risks of Working Remotely

Working remotely without good security habits can result in the loss of data and property. Your laptop can be stolen, resulting in not only the loss of physical property, but also confidential and proprietary information, client information and intellectual property, or company plans that could be especially useful to a competitor.

Some tips for working from home securely

Do not leave personal effects (thumb drives, computers, external hard drives) lying around.
Discard papers or files securely, the same way you would in the office.
Beware of unknown open networks in your neighborhood. Make sure you are connected to your home Wi-Fi network.
Get a privacy screen for your laptop which makes it difficult to see the data on the screen.
If you decide to work outside your home (for example, in a café or on a plane) be aware of your personal belongings and your surroundings. In a public setting, social engineering attacks, such as shoulder surfing, can occur. There are two ways to conduct this type of attack: (a) by looking over the victims shoulder to directly see the data; and (b) by using binoculars or espionage devices that allows the perpetrator to observe from a distance.

The most common scenarios in which identity theft can occur through shoulder surfing are the following:

Entering an email on a web page.
Registering in an application.
Withdrawing cash from your bank account.
Accessing corporate applications or reading confidential corporate information.

How to be Secure while Working Remotely

Whenever we enter confidential information, we must ensure that no one around us is attentive to what we do, seeking the greatest possible privacy. As for our computer, make sure that it is safe. Not only is the computer valuable but also the information it contains.

Many people have been victims of computer or hardware (think USB drive) theft, which resulted in the loss of sensitive information, including employee information and corporate secrets. We must make sure that our computer is up to date and with all the security patches installed.

Additionally, if available, it is recommended to activate Bitlocker to protect the information on the hard drive. With Bitlocker activated, if the computer is stolen, the information on it will be inaccessible.

Make frequent backups so that if the computer is stolen, the problem can be instantly mitigated by restoring the latest backup.

Be aware for any bulletins from your company about new policies, procedures, security issues, and other valuable information about working remotely. Communication between you, your office, and your co-workers is one of the most important parts of remote work. Without this, it is easy to lose touch with critical changes and miss valuable information.

Conclusion

Being at home means not having the same security benefits that we have in the office. Although our home is a controlled environment, we must pay special attention not only to physical security but also to the security of our information. Our home networks may be vulnerable to external attacks or may not have encryption strong enough to protect us against threats.

Always be aware of your personal effects (your computer) and information.
Be cautious when working in a public environment.
Pay attention and always follow your IT team’s recommendations for operating system updates and patches.

Working at home securely is important for you, your valuables, and for your information.

At Globalization Partners International, data security is our utmost priority. From providing security awareness training to our employees, to using only accredited software and proprietary GPMS, our clients can benefit from a seamless workflow of our language services.

The post Tips for Working from Home Securely appeared first on Globalization Partners International.

The Future of Blockchain in the Localization Industry

Luis Diego Farias — Thu, 12 May 2022 10:46:42 +0000

The term “blockchain” is being spoken about a lot these days, thanks to the growth of Bitcoin as a digital currency.

What is blockchain and how does it relate to localization?

A blockchain is a database made up of a series of items known as blocks. Each attempt to modify one of the records in a blockchain has the effect of changing the entire chain. The data structure keeps track of the blocks in the sequence in which they were deposited. As a result, blockchain is ideal for storing financial things that must be audited, such as Bitcoin transactions.

Bitcoin was the first generally acknowledged use of financial document blockchains. Because each crypto user has their own language and needs, localization and translation of crypto coins are critical.

How blockchain technology can be used in L10n?

Blockchain is completely transparent and peer-to-peer, which could mean a sea change in the way language services are paid for. The blockchain would make it possible for a professional to charge a certain price for their work, instead of charging by words.

Let’s suppose that we want to translate a project into several languages. We are not always certain whether the chosen translators have done a good job in the past. We may not even know who the translators are.

There are many actors involved in the middle and perhaps even the translation team is located in a different country than the language pair. One of the advantages of blockchain is to have the truth machine, that is, the ability to have traceability to know who did what.

It would be possible to have a ranking of translation professionals, who, based on their previous work, can rank differently so that buyers can decide whether to contract their services or not. It could even change the way of paying for these services, perhaps allowing you to pay by the hour because, like any other professional expert in the field, it is worth it and it is easy to prove. Another advantage would be the possibility of paying for millions of transactions very easily, free of charge, and all over the world with so-called smart contracts.

What are smart contracts?

Smart contracts are computer programs that are activated when certain circumstances are fulfilled. Smart contracts on the blockchain are intended to improve efficiency and transparency, particularly in situations where anonymous parties trade with each other without the use of a middleman.

The earliest example of a smart contract can be found in vending machines.

Distributed collaboration

Blockchain could make possible the idea of creating a collaborative MT engine, where everyone owns a part of it and therefore everyone can contribute equally. Whatever translation work you’re doing, each time you update the engine, you get paid for it in tokens.

Tokens vs. Coins: What’s the difference between the two?

Cryptocurrencies that have their own ecosystem are known as coins. Ethereum and Bitcoin, for example, are coins because they have their own blockchain and do not rely on other blockchains.

Tokens, on the other hand, are currencies based on the blockchain of other digital currencies. USDT, for example, is based on the Ethereum blockchain, making it a token of the Ethereum network.

Conclusion

Without a doubt, blockchain technology will revolutionize the language services industry in the years to come.

Transactions and blockchain localization quality could both benefit. All elements, from translation technology to payment processing as well as price negotiation, may be covered in one place, in a transparent manner, while also people get rewarded for it.

The post The Future of Blockchain in the Localization Industry appeared first on Globalization Partners International.

CAT Tools Alternatives for Mac

Luis Diego Farias — Wed, 30 Mar 2022 10:22:15 +0000

You have a Mac and you want to start translating, but there are no compatible software translation tools. Do not fret, there are numerous options to be able to translate under macOS without having to give up your well-treasured device. Here are some CAT tools alternatives for Mac.

Use the Heartsome Translation Studio

This CAT Tool is currently discontinued and no longer receives updates, although the manufacturer has decided to make it open source and release it to the public. It’s free to download and works great on macOS.

With Heartsome Translation Studio, it is possible to open Trados files (SDLXLIFF) up to version 2021 and translate them without hassle. The interface is very clean and allows the user to translate files in a grid, just as is done in Trados.

Heartsome Translation Studio has some other cool features like:

CSV to TMX converters
.properties file viewer
RTF file cleaner, etc.
Supports a large number of language pairs
Supports many formats, including popular ones like Office documents and Trados files

OmegaT for macOS

OmegaT is very simple to use. While the interface may seem a bit overwhelming at first, the software itself offers a quick start guide allowing you to get started in only 5 minutes. It works on macOS, can be carried on a flash drive, supports many file formats, and is compatible with Trados files.

What if I still need Trados?

If you still need Trados, another convenient option is to install a native Windows environment. Keep in mind that any of these options requires a genuine Windows license.

Windows can be installed by:

– The installation of a virtual machine: A virtual machine (VM) is software that emulates a computer and can run programs just like a physical computer does. Setting a virtual machine is quick and easy. There are many options, but what I can recommend is VirtualBox. It’s free, plus the interface is really simple.

If you want to go for a paid software, you can choose Parallels. It is fast, easy, and lets you run Windows inside macOS.

To run a Virtual Machine you will need:

Mac Computer with an Intel Core i3/i5/i9 or Apple M1 Processor
4GB of RAM (Minimum) / 16 GB (Recommended)
Additional disk space for the guest operating system (Windows 10 requires at least 16 GB)
Recommended, but not needed: an SSD Drive for better performance
A genuine Windows 10 license

Dual boot (Windows / macOS) using Bootcamp

The dual boot option will let you have a second operating system (in this case, Windows) installed on the same computer as macOS. This option will run Windows natively instead of running it as a virtual machine. Dual boot does require restarting your Mac in order to switch between the two operating systems.

Presently, Bootcamp is only available for Mac computers based on Intel processors. Whether running a CAT Tool natively or running it on a virtual machine, these two options are effective to start translating on a Mac without the need to give up your macOS hardware.

The post CAT Tools Alternatives for Mac appeared first on Globalization Partners International.

What’s XPath and What’s it Used for?

Luis Diego Farias — Wed, 10 Nov 2021 10:00:22 +0000

XML Path Language (XPath) is used to identify, address and navigate through parts of an XML document. An XPath expression can be used to search an XML document and extract information from anywhere in the document.

Currently, you can use different versions of XPath:

XPath 1.0 was released in November 1999 and is the most widely implemented and used specification.
XPath 2.0 was released in January 2007 with a revision in 2010. This specification contains many more expressions than XPath 1.0.
XPath 3.0 was published in April 2014.
XPath 3.1 was released in March 2017 supports JSON and XML and added maps and arrays. The latest XPath version 3.1 is specified in the W3C recommendation of March 21, 2017.

How Does XPath Work?

XPath interprets the XML document as a sequence of elements arranged in a tree structure. Each of the elements present in the structure of a file is called nodes. The categorization of the nodes is defined both by the order of appearance in the document and by the relationship of each of the XML elements.

The XPath data identifies seven types of nodes with different functions:

Element node
Document node (root node)
Attribute node
Text node
Namespace node
Processing instruction node
Comment node

XPath Syntax:

An XPath expression is a text string that represents a path in the document tree. The simplest of the expressions look like file paths as seen in Windows Explorer or the Linux shell.

Evaluating an XPath expression is looking for nodes in the document that conforms to the path defined in the expression. The result of the evaluation is all the nodes that fit the expression. In order to evaluate an XPath expression, the document must be well-formed.

Axis:

The axis allows us to select a subset of document nodes and corresponds to paths in the document tree. Element nodes are indicated by the element name.

/: if it is at the beginning of the expression, it indicates the root node; if not, it indicates “child.” It must be followed by the name of an element. Example: /xliff/file/body/group/trans-unit/target
//: indicates “descendant” Example: //target

Predicate:

The predicate is enclosed in brackets, following the axis.

[@attribute]: selects the elements that have the attribute.

[number]: if there are several results, select one of them by order number; last () select the last of them

[condition]: selects the nodes that meet the condition.

The following operators are allowed:

Logical operators: and, or, not ()
Arithmetic operators: +, -, *, div, mod
Comparison operators: =,! =, <,>, <=,> =

Comparisons can be made between node and attribute values or with text or numeric strings.

Where is XPath Used in Localization?

XPath can be used in the localization process for creating custom file types in Trados. With the appearance of new formats, which sometimes aren’t compatible with Trados, the creation of new filters is sometimes indispensable for processing files. With XPath, it’s possible to create new filters for XML files without the need to learn regular expressions. Trados supports the XPath specification 1.0, so you must use expressions from this version of XPath.

You can play around with XPath and its expressions in an online XPath tester. The HTML strip XPath tester has examples and supports XPath 1.0 and 2.0, so it’s a good one to try.

If you prefer to explore something locally, you can use the Notepad++ plugin called “XPatherizer,” which allows you to analyze multiple XPath queries, verify and improve XML documents, and more.

Conclusion

XPath is a useful tool in the localization industry as well as for developers and authors working in XML.

The post What’s XPath and What’s it Used for? appeared first on Globalization Partners International.

Skills for a Localization Engineer

Luis Diego Farias — Wed, 27 Oct 2021 01:41:48 +0000

Have you ever thought about being a localization engineer? Localization engineers facilitate the entire localization process. They act as bridges between the different stakeholders: clients, business development teams, project managers, and vendors.

And what exactly do they do? The role of a localization engineer varies depending on the company’s size and scope. At the basic level, the role could be running QA checks on localized files before delivering them to the client to make sure that they’re error-free. Sometimes the role extends to supporting project managers and the sales and business development teams due to the knowledge about computer-aided translation (CAT) tools, content management systems (CMSs), and other relevant localization information. Finally, on a higher level, the localization engineering role could involve developing a new tool or plugin that supports localization processes that require a solid background in development and programming.

Through a series of blogs, I’ll share some ideas and challenges facing the localization industry. In this blog, I’ll talk about localization engineering skills through the last decade and in the future with the era of big data and automation.

Skills for a Localization Engineer

To be recognized as a good localization engineer, you need to have particular skills and develop them regularly. Here are some examples of important skills to have and hone.

Problem Solver

In the localization industry, usually only a small number of projects tend to go according to the plan at the beginning. Skilled localization engineers can work analytically to understand the problems that they’re facing, the available solutions, and the optimum solution to get the required result.

Knowledgeable About Trends and Research-Oriented

There’s always something new in the localization industry, so a successful localization engineer will keep up-to-date and learn about trends in the industry. In addition, he or she will research ways of doing things and handling problems in multiple ways.

Adaptable

An adaptable localization engineer who can think outside of the box is usually good about finding creative solutions for challenges. When one is adaptable, one can better face challenges.

Detailed-Oriented

The localization industry is full of details, so being a detailed-oriented localization engineer means completing each task, assignment, or phase as flawlessly as possible before moving on to the next one.

Good Teamwork Spirit

The localization industry requires an entire team effort; it’s not a one-person show. Since a localization engineer is at the center of all the stakeholders (e.g., project managers, sales team, translators, vendors, etc.), a successful localization engineer is must know how to work with everyone.

Strong Communication Skills

As we just talked about, a localization engineer knows that localization requires teamwork and that the localization engineering role is the center of all the stakeholders. Therefore, a localization engineer must have very strong communication skills with all stakeholders.

The Role of the Localization Engineer in the Future

Without a doubt, the COVID-19 pandemic has accelerated and intensified the digitization or digital transformation for all activities and industries around us, which has had a huge and direct impact on the localization industry. The pandemic and other factors forced most the localization companies to automate any repetitive tasks, procedures, processes or workflows with the same pattern, which means more involvement for machine translation (MT) and artificial intelligence (AI) in localization daily tasks such as machine translation, transcription, text to speech, speech to text, and more.

MT and AI play a big role in the localization industry now, and they will continue to play a greater role in the future. However, this doesn’t mean that the localization engineering position will go away—it’ll just change, and the changes will be something to embrace. So while the skills that we talked about previously were enough in the previous decade, nowadays you need to learn some additional skills with technology to meet the new needs and challenges of the industry.

Capability to Work on Various Types of Source Files

Currently, many companies from different industries work with a variety of file types and schemes. Simple file formats like Microsoft applications (Word, Excel, and PowerPoint) and more professional tools like Adobe InDesign, Illustrator, or FrameMaker are relatively easy to be localized. However, many other files formats like .OP, .HTM, .YAML, and .JSON requires a higher level of technical knowledge to make sure that final localized files will be error-free and to avoid any unnecessary back and forth between clients and linguists.

Capability to Work on Multiple Tools and Platforms

In the age of automation, it’s crucial to have excellent knowledge and understanding of tools and platforms. This is important so that you select the most appropriate tool to extract translatable text, analyze its word count, and prepare the localization package. If you don’t choose the most appropriate tool or finalize these steps correctly, you’ll disrupt the workflow process while also increasing the overall project cost.

Skilled Coder

Last but not least, you must be able to read code and be familiar with many different programming languages. This became mandatory, not just nice to have like it was previously, because a localization engineer will work with all different types of files that have different formats and schemes. Sometimes you need to deal with customized CMSs or translation management systems (TMSs) that require you to create ways or tools to overcome challenges like using connectors with CMSs, macros or scripts with TMSs.

Localization Engineering at GPI

Our localization engineering team at GPI works hand-in-hand with all our stakeholders to increase efficiencies by automating any repetitive task while still ensuring a high-quality final deliverable. We’re also always looking for ways to make things easier for our clients.

The post Skills for a Localization Engineer appeared first on Globalization Partners International.

Arabic Support in Early Computers

Luis Diego Farias — Thu, 01 Jul 2021 23:30:27 +0000

The first microcomputers supported ASCII. Occasionally some letters with accents and symbols and occasionally characters in Russian or Armenian alphabets were included. However, the Arabic alphabet is a little more complicated, it is written from right to left, utilizes different characters, and requires many diacritics to be understood.

What Were the First Computers To Support Arabic?

Xerox Star 8010

In the late 70s, a team at Xerox began work on a vision: “office of the future”. Most of the technology we use today (windows, the mouse, clickable elements, icons, etc.) had their origin from the Xerox Star 8010.

This system had interesting multilingual capabilities. Unlike any other systems that used 8-bit character encoding, the Star 8010 used 16-bit codes to allow a wider range of characters, input systems, as well as fonts, which had to be designed to support other writing systems like Arabic. It was not easy and intelligent rendering algorithms had to be written to work with Arabic characters.

“Arapple” Arabic Apple II (1979)

The Apple II supported rudimentary Arabic versions and has specialized chips inside to produce and understand the Arabic characters.

Sinclair ZX Spectrum

The Sinclair ZX Spectrum had a rare Arabic version with a switch at the front, to select between the original and the Arabic ROM.

The Arabic ROM includes support for right-to-left writing, and diacritics.

MSX Home Computer

MSX systems was popular in Japan, South Korea, Argentina, and Brazil. It was also popular in the Gulf region where Arabic characters are used. Classrooms with Yamaha MSX computers were common for teaching informatics in some Arab countries.

MSX designed a variation specifically for Arabic countries, including MSX Al Fateh 100, Al Fateh 123, and Sakhr AH-200.

What About Operating Systems?

Microsoft DOS and Windows

In 1988, Microsoft introduced MS-DOS 3.3 with support for the Arabic language. The programs running in this version of DOS worked normally like any other version, although it consumed a more memory than the English version.

The operating system had an Arabic interface and allowed the user to map shortcut keys to switch between English and Arabic. In theory, any well-built program could run in bilingual mode, allowing right-to-left typing, with the character formation required by the Arabic language.

Microsoft gradually incorporated Unicode support in Windows, beginning with Windows NT and Windows 95, influencing the “Arabization” of software products. Microsoft also added support for Unicode to some of their productivity applications, for example Office 97.

Mac OS

Arabic is fully supported on macOS now, although it was not always the case. Apple added support for basic Arabic in System 4.1 and started selling the “Arabic and Persian Language Kit” commercially in 1993. This kit allowed the user to switch between the computer’s main language and Arabic or Persian languages.

It also allows the user enter, edit, or print any Arabic or Persian text and multilingual content with other languages in the same document, or even in the same sentence.

The introduction of Mac OS X 10 did not originally include support for Arabic as Mac OS 9.

Mac OS X 10 supported a handful of language support packs, but Arabic was not included. Mac OS X 10.2 and later versions re-introduced some support for Arabic, officially giving full support in Mac OS 10.4 and onwards.

Conclusion

In most operating systems and applications, Arabic is fully supported. Arabic is the official language in 26 countries, with around 300 million native speakers worldwide, making it the fifth most spoken language in the world.

Over the years, technology and operating systems have developed, as seen above. Now, in most operating systems and applications, Arabic is fully supported. Arabic is the official language in 26 countries, with around 300 million native speakers worldwide, making it the fifth most spoken language worldwide. Your translation provider must have the tools and capabilities to work with languages such as Arabic and Russian.

The post Arabic Support in Early Computers appeared first on Globalization Partners International.

Mojibake: Question Marks, Strange Characters and Other Issues.

Luis Diego Farias — Thu, 03 Jun 2021 13:56:37 +0000

Have you ever found strange characters like these �� when viewing content in applications or websites in other languages? What are these and where do they come from?

Finding these strange, apparently misplaced characters is most likely the result of encoding issues and can be a headache. This occurs many times in the localization field as content crosses borders, platforms and languages.

Character Encoding

Everything on a computer is a number. If we want to have letters on computers, we must all agree on which number corresponds to which letter. This is called “character encoding”.

Character encoding refers to how “character sets” for different languages are assigned to computers. They are defined in “code pages” or “character maps”. These tables combine characters with specific sequences of ones and zeros.

The simplest encoding is called ASCII. ASCII characters are stored using only 7 bits, which means that there are only 2⁷= 128 characters possible.

ASCII works great for encoding basic characters in English/Latin, but there are over 128 characters in the world!

Things got complicated when Asian languages and computers met. In some languages, such as Chinese for example, you have up to 60,000 different characters. This is where 16-bit encoding schemes appear, giving the ability to store up to 64,000 characters.

Japan, for example, does not use ASCII and simply created its own encodings, having up to 4 simultaneously. All of them incompatible with each other. So, if for example, you sent a document from one Japanese computer to another computer with the incorrect encoding, the text would be broken.

The Japanese have a term for this phenomenon: Mojibake

Some languages affected by mojibake:

Arabic
French
English: It may appear in some characters such as em dashes (—) or dashes (-). It rarely affects the characters of the alphabet.
Japanese
Chinese: In Chinese, this phenomenon is called 亂碼(Luàn mǎ) or ‘chaotic code’.
Languages based on Cyrillic alphabet: Mojibake also affects languages such as Russian, Ukrainian, Belarusian or Tajik. In languages such as Bulgarian, mojibake is also translated as “monkey’s alphabet” and in Serbian is known as “garbage”.
Polish
Nordic languages: Mojibake affects Nordic languages although it is not common. Finnish and Swedish use the same alphabet as the English alphabet, with three new letters: å, ä and ö.
Spanish: Mojibake in Spanish can be translated as “deformation” and the same happens as in the Nordic languages: Spanish uses 26 standard Latin characters but at the same time includes letters such as ñ, accents and sometimes ü. These characters, because they are not available in ASCII, are displayed incorrectly.

Spanish example		Spanish text: Señalización
File encoding	Setting in browser	Result
Windows-1256	ISO 8859-1	Seأ±alizaciأ³n
ISO 8859-1	Mac Roman	SeÃ±alizaciÃ³n
UTF-8	ISO 8859-1	SeÃ¶alizaciÃ³n
UTF-8	Mac Roman	Se•alizaci³n

Unicode to the rescue

Finally, someone had enough and decided to create a standard to unify all coding standards. This standard is called Unicode and is not an actual encoding, but a character set.

Unicode has a code space of 1,114,112 possible positions. Sufficient for languages like Arabic, Russian, Japanese, Korean, Chinese, European languages, etc. and even for characters that do not exist. By using Unicode you can write a document in any language.

UTF-8 or UTF-32?

UTF-32 is a Unicode character encoding that uses a 32-bit number for each character. This makes a lot of sense, but it wastes a lot of space. An English document would occupy 4 times as much as it should.

Why do we need such a large number to contain 1,112,064 values, when most of the time we will only use the first 128 values?

UTF-8 is a “variable-length encoding”. That means most of the time, each character only takes up 8 bits, but can expand up to 32 bits if necessary. This system can support any Unicode character without wasting space, which has made it the most popular character encoding.

Conclusion

Today the most used standard is UTF-8 since it can encode any character and is backward compatible with ASCII. In addition, UTF-8 is relatively efficient in terms of space, which makes it the most efficient encoding standard for most cases.

Making sure your application is “localization ready” is key to developing applications that can be easily localized. Supporting different character encodings and choosing the right encoding are two crucial steps in localization projects, either for websites or software localization.

The post Mojibake: Question Marks, Strange Characters and Other Issues. appeared first on Globalization Partners International.