Re: [NTLK] Newton Spanish text recognition.

From: Paul Guyot (pguyot_at_kallisys.net)
Date: Wed Sep 22 2004 - 08:58:59 PDT


On Wed, 22 Sep 2004, Eckhart Köppen wrote:

> An alternative approach (and I guess this is what Paul is doing?) could
> be to feed a word list programmatically into a clean user dictionary
> and dump the data once finished. Adding the words should take care of
> creating the correct compressed dictionary format. The dumped data
> should be usable the same way, i.e. stick it in a resource file and
> build a Newton package. But there is probably an upper limit for the
> number of words you can process like this.

It's what I did, indeed, except that I don't extract the data with
SaveDictionaryData. SaveDictionaryData method transfers the data from the
system heap to the NS heap (and you cannot use a VBO). The limit in that
case is very low. You cannot have many words in a user dictionary saved
like this (the loading NS method equivalent can work with larger
dictionaries but do load a copy of the word list in system heap, unlike
the dict part technique used in all the dictionaries on my website).

However, using the documented information in the bowels article, one can
use Hammer to dump the zone directly the way you very probably did it with
the ROM dictionaries.

This works for (at least) up to 62K words with a MP2100 with packages
disabled.

It takes a long time to get the words in (with AddToDictionary
NewtonScript method, usually via NTK over Ethernet) and the words out
(with Hammer's dump memory data command).

This process can easily take one or two afternoons (i.e. you need to start
from scratch if something failed since everything is in RAM).

Paul

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Wed Sep 22 2004 - 14:00:01 PDT