I have a pen. → 私が、ペンであります。
I went to school yesterday. → 昨日は学校へ行った。
40年にわたるMT研究の、翻訳には到底及ばないけれど言語解析としては極めて優れた成果は、どこに行ってしまったのだろう? と不思議。
WeとかIは日本語側では主語を省略するようです。
もちろん、テトゥン語のように話者の少ない(といっても100万人ちかくいるが)言語についての「蓄積」はないのだけど。
| 日 | 月 | 火 | 水 | 木 | 金 | 土 |
|---|---|---|---|---|---|---|
| 1 | 2 | |||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 10 | 11 | 12 | 13 | 14 | 15 | 16 |
| 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 |
| 31 |
It seems likely that the future of online translation will be about using statics to regonising patterns in bilingual text (SMT?) and not in such rule based legacy (MT) software.
The problem with less widely used languages is that the large amount of bilingual text required for SMT to work does simply not exist (I guess this is your point?).
By the way, if you omit the full-stop in the "enter text" field, the translation is a little better.
eg
I have a pen -> ペンがあります。
I went to school yesterday -> きのう学校へ行きました
mushrooms for everyone -> すべてのきのこ (oops)
The result of mushrooms for everyone is so funny:-)
半角英数字のみのコメントは受け付けないと怒られたので、日本語も書いときますね。
Firstly, I noticed that google’s translation system has the same issue that you mention with dropping the subject.
http://translate.google.com/translate_t#
I went to school yesterday ->学校に行った昨日
Compared to this, babelfish and systran sites keep the subject (私は学校に昨日行った). The difference between sites like babelfish and google is that google search is powered by SMT.
You are wondering what happened to all the good results from the 40 years of research, but what seems to be happening is a transition from MT to SMT. Quality does seem poor at the moment, yet with big companies investing heavily, hopefully SMT will improve quickly.
This is the point I wanted to make, but maybe I’m still lost. My apologies in advance if I’m confusing the topic again.