Archivos para Human Language Techonology

Diferences (Q3)

Machine translation, according to Wikipedia is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.  MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, attemps more complex translations , allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Machine aided Translation, where translation proper is performed by a computer, even if the human helps by preediting, postediting, or answering questions to disambiguate the source text. In Computer-Aided Translation, or more precisely Machine-Aided Human Translation (MAHT), by contrast, translation is performed by a human, and the computer offers supporting tools.

Multilingual Content Management Systems has seven keys to consider:

Translation is essential to the running of a multilingual website and will require qualified personnel or the use of an external translation service. Proof reading of translated copy is also often required.

Localization is a multilingual website is usually a mixture of global and local content. Local content presents no particular content management issues; global content – which has to be translated across all language locales – does.

Culture: Differences in language are only part of what distinguishes different locales. Graphical conventions, matters of taste, sense of humour, socially acceptable forms of address and issues of privacy all vary from place to place.

Feedback: Responses to any website feedback will need to be addressed in the language of the initial communication. User feedback should not be solicited in a language if it cannot be routed to a suitably qualified person who can answer in the appropriate language.

Design: Perhaps the most common, and an easily overlooked, difficulty encountered in developing multi-lingual websites is the maintenance of a consistent design across different language versions of a site, and in particular the layout of navigation: text or graphic labels that fit the design constraints in one language may not work well in translation. 

Workflow: Simple workflow mechanisms usually offer some kind of notification when some action is performed on a page or when the page moves from one state to another.

Non-Latin character sets: There are some interesting challenges associated with the creation and rendering of non-Latin alphabets, although modern browsers have better support for them than in the past.

Translation Technology These pages contain links to software tools for automatic or computer assiested translation as well as articles and background information on these or related techonologies. They invited and they will continously invite prominent actors in the field or users of the different systems to contribute their insights into this fascinating field.

Sources:

Machine Translation Retrieved 13.02, 26 May, 2008 

http://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=203927830

Christian Boitet, 8.4 Machine-aided Human Translation Retrieved 13.04, 26 May, 2008

http://cslu.cse.ogi.edu/HLTsurvey/ch8node6.html

Multilingual content management, writen by Danny Sofer Retrieved 11.37, 28 May, 2008

http://www.kitsite.com/articles/multilingual-content-management.html

Tramslation techonology Retrieved 12.00, 28 May, 2008

http://www.foreignword.com/Technology/technology.htm

Dejar un comentario »

Tranducción del francés al castellano (Q3)

Synopsis

Chaque mercredi Pierre Brochant, célèbre éditeur parisien, organise avec des amis un « dîner de cons » : chaque organisateur amène avec lui un « con » qu’il a déniché au hasard. Ensuite, les organisateurs se moquent des « cons » toute la soirée sans que ces derniers ne s’en rendent compte. À l’issue du repas, on choisit le champion. Un ami lui en a trouvé un fabuleux : François Pignon qui se passionne pour les constructions en allumettes. Mais rien ne va se dérouler comme prévu…

 

Sinopsis

Cada miércoles Piedra Que Encuaderna, célebre editor parisino, organiza con amigos una «cena de cons»: cada organizador lleva con él un «con» que ha desanidado por casualidad. A continuación, los organizadores se burlan de los «cons» toda la noche sin que estos últimos no se den de ello cuenta. Al Final de la comida, se elige al campeón. Un amigo él ha encontrado de él uno fabuloso: François Pignon que se apasiona para las construcciones en cerillas. Pero nada se va a desarrollar como previsto …

 Piedra Que Encuaderna: este pequeño error a la hora de traducir el nombre podría ser incluso cómico.

cons: el traductor desconoce el significado de con, que significa gilipollas.

ha desanidado: el verbo correcto sería encontrado

estos: falta la tilde, tendría que ser éstos

no: en francés se utiliza la partícula “ne” para las frases negativas que en castellano no es necesaria.

de ello: el orden de la frase es erróneo, de ello tendría que aparecer después ya que es un complemento del nombre.

él: aquí tendría que ser le, pero en francés el acusativo y el nominativo tiene la misma forma.

de él: no tiene ningún sentido en la frase añadir ésto.

para: lo correcto sería por.

 

Se nota muchísimo la diferencia al corregir, en este segundo texto en el que se traduce de una lengua romana o otra hay menos fallos y los que hay son simples errores que no cuesta mucho traducir.

Fuentes:

Retrieved 13.06, 19 mayo, 2008

http://fr.wikipedia.org/wiki/Le_D%C3%AEner_de_cons

http://translendium.com/

Dejar un comentario »

Translation from English to Spanish (Q3)

 PETER PAN

History

Peter Pan first appeared in a section of The Little White Bird, a 1902 novel for adults. Following the highly successful debut of the play about Peter Pan in 1904, Barrie’s publishers, Hodder and Stoughton, extracted chapters 13-18 of The Little White Bird and republished them in 1906 under the title Peter Pan in Kensington Gardens, with the addition of illustrations by Arthur Rackham.[1]

The character’s best-known adventure debuted on 27 December 1904, in the stage play Peter Pan, or The Boy Who Wouldn’t Grow Up. This story was adapted and expanded somewhat as a novel, published in 1911 as Peter and Wendy, and later as Peter Pan and Wendy.

 

Historia

Peter Pan primero aparecía en una sección del Pequeño Pájaro Blanco, una 1902 novela para adultos. Siguiendo el inicio altamente satisfactorio del juego|obra de teatro por Peter Pan en 1904, los editores de Barrie, Hodder y Stoughton, siendo extraído capítulos 13-18 del Pequeño Pájaro Blanco y siéndolos republicado en 1906 bajo el título Peter Pan en Kensington Trabaja en el jardín, con la adición de las ilustraciones por Arthur Rackham.[1]

La aventura más conocida del carácter debutada el 27 de diciembre de 1904, en el juego|obra de teatro de escenario|fase Peter Pan, o El Chico Quien No Crecería. Esta historia se adaptaba y se expandía como novela, publicada en 1911 como Peter y Wendy, y más tarde como Peter Pan y Wendy.

 

“aparecía” (…) “adaptaba y se expandía”: el verbo correcto sería apareció, acción en un tiempo pasado ya terminada. El error se repite en el párrafo siguiente, en vez de traducirlo como pretérito perfecto simple, lo traduce como pretérito imperfecto.

“una 1902 novela”: en castellano no se puede utilizar como en inglés la fecha como si fuera un adjetivo. Habría que colocarlo como complemento del nombre “una novela de 1902

“siendo (…) siéndolos”: habría que utilizar el verbo haber para que quedara bien, pero no es una forma muy común de presentar la frase.

“en Kensington trabaja en el jardín”: la correcta traducción sería “Peter Pan en los jardines de Kensington” pero el traductor ha decidido incomporar la palabra “trabaja“, sacada de ningún lado.

“carácter”: se refiere a personaje.

“juego/obra de teatro” (…) “escenario/fase”: Translendium no sabe bien cuál de las dos traducciones necesita el receptor, así que le da ambas opciones para que al leerlo, se pueda elegir.

Sources:

Retrieved 12.16, May 14, 2008

http://es.wikipedia.org/wiki/Peter_Pan

http://www.translendium.com/

Dejar un comentario »

Characteristics of translation (Q3)

As the FEMTI web page sais, the characteristics of the translation task refers to the information flow intended for the output, from the point of view of the agent (human or otherwise) who receives the translation.

According to the FEMTI report, the characteristics of the translation are the followings: 

Assimilation: The ultimate purpose of the assimilation task (of which translation forms part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.

Dissemination: The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization. 

Communication: The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.  

Sources:

Retrieved 12.54, 12 May, 2008

http://www.issco.unige.ch:8080/cocoon/femti/st-home.html

Dejar un comentario »

Explanation of three research topics (Q2)

Collaborating using diagrams

The goal of this project is to discover how people help to perform a task that needs them to plan a path on a map.

The LTG’s role in this project is to give the infrastructure to write and analyse multimodal dialogues, particularly for the division of dialogues into task phases, transcription, the identification of spoken references to locations and routes on a map, gesture coding, and multimodal links between spoken references and gestures.

SQUAD

The SQUAD project goals on the prepartion of qualitative social science data for archiving. The project will focus on providing tools to semi-automate the process of archiving data.

The LTG contribution to the SQUAD project’s aim will be to build tools to carry out tasks such as:

  • Named entity recognition: people, places, locations, dates, times, occupations, etc.
  • Anonymisation of entities to protect identities
  • Geographical grid references
  • Keyword extraction

JAST

According to the web, “JAST is an EU-funded integrated project that aims to develop jointly acting autonomous systems that communicate and work intelligently on mutual tasks in dynamic unstructured environments.

Edinburgh is most involved in two strands of the work. The first is the human-human studies of joint action involving language. These studies involve collecting data developing two eyetrackers that are wired together so that two subjects can perform a joint task. The Language Technology Group’s role in the work is to make this experimental setup work, to set up a data route for transcription and discourse annotation of the captured dialogues using the NITE XML Toolkit, and to integrate data from the eyetracker and the discourse annotation into a coherent database for analysis.

The second is the development of human-robot dialogue systems that demonstrate principles of joint action found in the human studies. For this strand, the Language Technology Group is contributing to the dialogue systems effort.”

Sources:

http://www.ltg.ed.ac.uk/projects/Diagrams

http://www.ltg.ed.ac.uk/projects/JAST

http://www.ltg.ed.ac.uk/projects/SQUAD

Dejar un comentario »

Recent Reseach Topics (Q2)

Here we will talk about the most recent research topics related with Human Language Technologies.

In the site of thw German Research Center for Artificial Intelligence

Within the German Research Center for Artificial Intelligence, the following themes are elaborated in research:

  • Exploiting – and automatically extending – ontologies for content processing.
  • Tighter integration of shallow and deep techniques in processing.
  • Enriching deep processing with statistical methods.
  • Combining language checking with structuring tools in document authoring.
  • Document indexing for German and English.
  • Automatically associating recognized information with related information and thus building up collective knowledge.
  • Automatically structuring and visualizing extracted information.
  • Processing information encoded in multiple languages, among them Chinese and Japanese.

These are the projecs of the Edinburgh Language Technology Group:

 

  • EASIE Combining Shallow Semantics and Domain Knowledge.
  • TXM Text Mining for Biomedical Content Curation.
  • CROSSMARC Cross-retail Multi-agent Retail Comparison.
  • SQUAD Smart Qualitalive Data: Methods and Community tools for Data Mark-up.
  • SEER Machine Learning for Named Entity Recognition.
  • BOPCRIS Named entity tagging of historical parliamentary proceedings
  • Synthesis Integrated Models and Tools for Fine-Grained Prosody in Discourse.
  • JAST Joint Action Science and Technology.
  • AMI and AMIDA consortium projects that are developing technologies for meeting browsing and to assist people participating in meetings from a remote location.
  • Collaborating using diagrams Study of how pairs collaborate when in planning a route on a map.

Lengoaia Naturalaren Prozesamendurako IXA taldea have some aim, which I mention here:

  • STREP European Comunity KYOTO: knowledge yielding ontologies for translation-based organizations
  • ANR: Agence Nationale de la Recherche TSABL: Towards a Syntactic Atlas of the Basque Language
  • EUSKO JAURLARITZA LINGUISTIC TOOLS FOR CHILDREN CUBA’S SCHOOLS
  • M.E.C OpenMT: Open Source Machine Translation using hybrid methods: RBMT-EBMT methods
  • M.E.C AVIVAVOZ: Tecnologías para la traducción de voz: reconocimiento, traducción estadística basada en corpus y síntesis
  • Ministerio de Industria EurOpenTrad: Enhanced Machine Translation in open-soouce for the European integration of the languages in Spain
  • Unibertsitate enpresa Euskarari aplikatutako zuzenketa lexikoa, morfologikoa eta sintaktikoa

Sources:

Dejar un comentario »

European research centres for human language technology (Q1)

We do have research centres in Europe and here we’ll explain some of them and what do they do:

Language Technology Lab (DFKI) which is in Germany. They focus on “tighter integration of shallow and deep techniques in processing, enriching deep processing with statistical methods, combining language checking with structuring tools in document authoring” and some other more, but we’ll be mentioning just few of them. They have three main areas: Information and Knowledge Management, Natural Commnunication and Document Production.

Another important centre is The Edinburgh Language Technology Group. They started 20 years ago, their main objective is to create easy solutions to real problems in text processing. As their web says “We have worked in all areas of large-volume text handling, from text annotation through markup architectures and from information extraction to automatic or computer-assisted generation of text.”

The last one that I mention here is situated in Ireland: National Centre for Language Techonology. The computers are the ones which guide the research into the processing of human language. Like speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localisation and globalisation.

http://www.dfki.de/lt/projects.php

http://www.ltg.ed.ac.uk/

http://www.computing.dcu.ie/research/nclt/

Dejar un comentario »

Hans Uszkoreit (Q1)

He works in the university of Saarland as a Scientific Director at the German Research Center for Artificial Intelligente (DFKI). He helps in two other different interprasises, in one of them as a co-fundator and in the other one as an advisor.

He studied linguistics and Computer Science (in Belin and Austin). From 1982 to 1986 he worked in Menlo Park, in California, as a comjputer scientist at the Artificial Intelligence Centre.

Here is the detailed CV with all his publications and what has he done during the years since he graduated.

He has written many books about the matter and colaborated with other specialists for the publication of other. The last one was in 2007 and called Methods and Application for Relation Detection, but there are a lot more, so here is the link to see them all.

Now he is interested in some researches, as the web says:

  • “models of human language processing that take into account realistic resource limitations in human cognition
  • models of linguistic knowledge that are shaped by the optimization of processing
  • language technology applications that demonstrate how even very limited language capabilities can add considerable value to software for handling electronic information, for processing texts and for communicating with machines
  • grammar of human language: theory, formalisms and engineering
  • structuring of distributed digital knowledge”

http://hans.uszkoreit.net/

http://www.coli.uni-saarland.de/~hansu/bio.html

Dejar un comentario »

Definition of Human Language Technolody (Q1)

According to Wikipedia the Natural Languages is a subfield of the Artificial intelligence and the ingenieril branch of linguistic. They are trying to make humans and machines comunicate by making the computer’s information into human’s language, what can be heard in the street in the everyday life. The HLT does not deal with the communication by a natural languages in an abstract way, but to design mechanisms to communicate effectivelly by programs that execute or simulate the communication.

We can see that Hans Uszkoreit has made a large and simple introduction of the definition of HLT:

“Language technology comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language.”

“The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction beween human and computer is a communication problem. It enables the user to communicate with the computer in French, English, German, or another human language.”

“Much older than communication problems between human beings and machines are those between people with different mother tongues. One of the original aims of computational linguistics has always been fully automatic translation between human languages. From bitter experience scientists have realized that they are still far away from achieving the ambitious goal of translating unrestricted texts. Nevertheless, they have been able to create software systems that simplify the work of human translators and clearly improve their productivity. The increasing multilinguality of the web constitutes an additional challenge for language technology.”

He uses a simple language to explain the complicated world of the scientifical language.

Sources:

http://en.wikipedia.org/w/index.php?title=Natural_language_processing&oldid=110067975

http://www.dfki.de/lt/lt-general.php

Dejar un comentario »