Re: [NTLK] Newton Spanish text recognition.

From: Roman Pixell (roman_at_pixell.net)
Date: Tue Sep 21 2004 - 16:22:47 PDT


On Sep 22, 2004, at 12:55 AM, Larry Yaeger wrote:

> The English and alternate
> language hypotheses will compete for space in the recognizer's search
> engine, so that could conceivably affect accuracy, but the maximum
> amount of memory used for that purpose is fixed, so it probably will
> not affect heap usage.

yes, i recognise this - sometimes english interpretations are suggested
when i wirte in swedish.

>> what would the optimal size be?
>
> Sorry, I really don't know. More words -> greater coverage ->
> greater overall accuracy. But if you're running out of heap space,
> you clearly want to limit what you add to the system.

i will try to start with a word list of 30-40k (including some 25k
which i intend to steal somewhere) and see how it goes when i have the
time. the best would be like you say, to use analyse text from online
newspapers etc, in order to cover things that are not in a regular
dictionary.

thanks for all the answers!

/ ®

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Tue Sep 21 2004 - 17:00:04 PDT