GameSpot may receive revenue from affiliate and advertising partnerships for sharing this content and from purchases through links.

Kinect speaks a global language

GDC 2011: Microsoft's Lief Thompson, Yumiko Murphy joined by Englobe's Kate Edwards to discuss the trials of bringing voice-, gesture-based gaming to worldwide audiences.


Who was there: With the introduction of the Kinect in November, Microsoft opened itself up to new ways to offend cultures around the globe. In a GDC 2011 session titled "New Technology and New Interfaces: Localizing Video Games for Kinect," Microsoft's Lief Thompson and Yumiko Murphy were joined by Englobe's Kate Edwards to discuss the many challenges of releasing gesture-based games across the globe.

This tiger speaks the international language of cute.
This tiger speaks the international language of cute.

What they talked about: Thompson began the presentation by first offering a look at what Microsoft has accomplished with the launch of the Kinect. The hardware sold 8 million units through the end of 2010, and of the 14 launch titles, five went through Microsoft's internal localization department: Kinect Adventures, Kinect Joy Ride, Kinectimals, Dance Central, and Kinect Sports.

In Redmond, Washington, the localization team establishes the scope, engineering, and schedule. Microsoft also has localization teams in Dublin and Tokyo, which manage vendor costs and schedules, perform translations and testing, coordinate with regional operations, and arrange regional age-rating submissions. The goal of these global operations, he said, is to ensure simultaneous launches worldwide and to keep costs down.

There were a number of challenges that accompanied releasing the Kinect globally, Thompson said. Time was at the forefront of these, as there were really only two years between the Kinect's early design and prototype phase and its global launch. In this brief period, the team was challenged with catering to 12 languages.

On the software side, localization challenges started with security, as Microsoft was very sensitive about hardware and software leaks. Therefore, localization teams had to ensure that their testing was done in secure facilities and outside of the sight of consumers. This necessitated bringing testers to Microsoft's facilities. However, space requirements soon became a serious issue and were dealt with by testing 9 of the 12 languages in Dublin, with the remaining three occurring in Tokyo.

One other issue the team ran into was the sheer human logistics of testing a Kinect game. A single tester can't be expected to play a Kinect game for 8-10 hours per day, Thompson said, purely from a physical standpoint. Bug capture was also a problem, as players can't log a bug while interacting with the title. The solution here was to bring in two testers--one that plays and the other that logs. The two testers would then swap roles.

Thompson also noted that localization testing has a very specific purpose that is independent from functional testing. Therefore, the team was able to gain time by using a variety of debug hardware cheats, such as inputting codes to hop through or pass content.

Edwards then took the podium to speak about gestural issues and the accounting of cultural nuance. The goal is to not force players to use potentially negative gestures to control the unit or to play a game, she said. Hand and body gestures are particularly problematic for localization, because certain gestures, such as a peace sign or a thumb between the fingers can mean different things to different people.

Dance Central is the type of Kinect game that is particularly problematic, she said, before showing an image of an in-game character making what would appear to be a rock-and-roll hand gesture in the game. In the US, this gesture simply implies rock on, she said. However, in Texas, the gesture also has the connotation of hook 'em horns, thanks to its association with the University of Texas. Edwards also noted that showing someone the bottom of their foot can be quite offensive in some cultures.

Games such as this are given a gesture review, which involves examining all mo-cap and avatar moves and then making minor tweaks as necessary.

Murphy then takes the stage to talk about the speech functionality found in Kinectimals. In particular, this title supports voice commands and the Name Your Animal feature. Voice command, she said, is supported through a limited set of "grammars," which are sets of words or phrases that trigger an in-game response. Murphy pointed out that there are actually a number of ways to activate a certain action.

With the Name Your Animals feature, there were an infinite number of naming possibilities, and therefore there were no grammar files. Names would be broken down into phonemes that are then stored in memory.

Not every language was supported with the speech recognition and Name Your Animals feature, she said. The team focused in on US and UK English, Japanese, Mexican Spanish, and Japanese.

Translating for fluent speakers was of paramount importance, she said, and so it was very important that the translators had all of the appropriate context information while working. Translators needed to know how actions are used in a game, she said.

Data collection with native speakers was also important. Microsoft needed to have a wide range of age groups in the testing pool to ensure speech recognition worked for everyone. However, the game's targeted age group was given top priority in this regard.

Moving to the Name Your Animals feature, Murphy said they wanted this feature to be available in as many locations as possible. However, they ended up having to scale this back to 11 languages for 15 different countries. They compiled the top 100 pet names in each of these regions and then asked native speakers from these areas to test them. It took about four weeks to complete data collection, and they achieved 80-93 percent test accuracy.

She went on to note that scheduling was a major challenge for Kinectimals, as they had only about four and a half months to complete speech recognition localization. Due to security and confidentially issues, they could not use external vendors, as this was all occurring prelaunch. Therefore, they had to do all of their testing internally, involving Microsoft employees and their family members.

Because Kinectimals is targeted at young kids, it was important to collect voice data from many children. However, strict child labor laws complicated this process, she said.

Quote: "It's all very context-sensitive."--Kate Edwards

Takeaway: Between multilanguage voice support and gesture-based gaming, Microsoft's localization team needed to clear a variety of hurdles to bring its latest gaming device to market. These problems were compounded by the intense pressure to keep the device out of the public eye.

Got a news tip or want to contact us directly? Email

Join the conversation
There are 47 comments about this story