Personal Project - Mandarin Navigator
A website to help you read mandarin e-books.
Try it out!
The CPP version is not currently under development, but you can view a compiled page from 三体问题 (Three Body Problem) and try it out!
About the project
Over the past few years, I've been learning Chinese. As many language learners will tell you, learning a language is all about input. The more you can immerse yourself in the language, the quicker you can learn. And with Chinese specifically, it can be really hard to read words you don't understand. In German, if you don't know that "liebe" means "love", you at least know how to type it and pronounce it. In Chinese, if you don't know that "爱" means "love" you won't have any way to type or say the word. How will you look it up in a dictionary?
Mandarin Navigator aims to solve that problem for e-books. It completely transforms a standard e-book file into a webpage, that allows you to hover over words and see possible translations for those words, as well as how too say/type the word.
Technologies
C
C++
Webserver Technology
Javascrips
CSS
HTML
Translation Methodology
The current translation methodology is designed for speed and ease of implementation, at the cost of context sensitivity and accuracy. The system uses a dictionary of 121,366 common Chinese words in both traditional and simplified character sets, provided by CC-CEDICT.
The algorithm works by breaking down each paragraph of the ebook into individual words. It starts by matching the longest word it can find to an entry in the dictionary. Then, it repeats this process for the remaining words on either side until the entire sentence has been translated or no further breakdown is possible.
This methodology offers several benefits, such as the ability to easily update the dictionary and ignore any non-Mandarin words or characters, such as punctuation. Additionally, by swapping the dictionary file, this methodology could theoretically be used to translate text in other logographic languages.
The primary assumption of this methodology is that longer words are more likely to be the intended meaning than shorter words.
Note: While this methodology is relatively simple, it may not always provide the most accurate or context-sensitive translations. However, it is a good starting point for further development and improvement.