Using LLMs as Machine Translators
Introduction
Travel disruptions can be one of the most frustrating moments customers of Trainline.
Miscommunication, especially when traveling internationally, amplifies these challenges.
At Trainline, we identified a critical gap: disruption messages aren’t translated for customers outside their native language.
This is a pain point not just for usability but also for customer retention.
Addressing this is an opportunity to set ourselves apart from competitors like Omio and Eurostar while building reusable capabilities for the future.
We asked ourselves a question - could LLMs help us with this problem?
The Hypothesis
The approach I laid out for the AI Lab at Trainline was to go after small, but meaningful projects to build up a library of useful and reusable services that you could later integrate into a wider system.
(As with all plans in overweight companies, this starts out well, but as you gain success, you inevitably will end up hiring more career manager-types who sidetrack this a bit)
We hypothesized that solving real-time translation issues would:
- Differentiate our product in the travel market from competitors like Uber and Omio.
- Improve customer retention by reducing frustration during disruptions.
- Provide a foundation for global future products powered by large language models. If we are a global company we need to be able to translate well.
By tackling this issue, we aim to achieve two objectives with one solution: improving current customer experiences and building scalable capabilities for future Gen AI based projects.
But was there any evidence for LLMs being able to do this? Turns out yes. The seminal paper in this space is From LLM to NMT: Advancing Low-Resource Machine Translation with Claude
This paper posits, and proves out, that LLMs - in particular Antrophics Claude 3.5 models - are effective machine tranlsators - even in the absence of huge amounts of reference data.
With this papers results in mind it was apparent that is isn't just a guess that LLMs can make attempts at translation text from one language to another - they are provably good at it!
The Approach
Our project focused on integrating real-time translation using an LLM.
Here’s how we broke it down:
- Prompt Design:
• Injected “Trainline-specific vocabulary” to improve contextual accuracy.
• Ensured translations dynamically adjusted to the source and target languages. - Technical Implementation:
• Embedded real-time translations directly into disruption notifications.
• Designed the system to function seamlessly across languages, ensuring accessibility for international travelers. - Evaluation Metrics:
• Offline: Assessed translation quality using BLEU scores to benchmark performance.
• Online: Measured the impact on user satisfaction and retention metrics through A/B testing.
What we built
• A C# Library called Trainline.TranslationService
• A database of Trainline specific terms to inject into the prompt.
• A Python backend service to connect to AWS Bedrock (Specifically Claude 3.5)
Outcomes
The integration showed promising results:
• Significant reduction in customer complaints about unreadable disruption messages.
• An increase in positive feedback related to ease of use and reliability during disruptions.
• The System we built is now reusable as a C# library that can be slotted in to new projects.
Learnings
The project underscored the importance of designing products that solve real customer pain points. While real-time translation posed challenges in accuracy and latency, refining prompt engineering and leveraging LLM capabilities proved transformative.
Additionally, this project highlighted the potential for expanding LLM-powered features to other areas of the product.
Recap
By addressing the frustration of untranslated disruption messages, we not only improved the immediate customer experience but also established a robust framework for future innovations.
Real-time translation is more than a feature—it’s a step towards building a global travel assistant.