Re: [NTLK] PDF/CHM reader, Bluetooth and other questions

From: Hendrik Lipka (hendrik.lipka_at_gmx.de)
Date: Thu Oct 28 2004 - 07:08:40 PDT


=0D=0AThursday, October 28, 2004, 3:37:25 PM, you wrote:

> Have you tried to use the Adobe accessibility web/email-interface for
> converting PDF to HTML as a method of text extraction?

No. I would need a stand-alone tool for inclusion in PDFConv, so it would
be of no use for me.

> Have you found other libraries or methods for PDF->text conversion to
> be better than the Adobe web interface?

I had played a little bit with earlier versions of JPedal, and it had some
problems with german umlauts.
From=20the way PDF works, text extraction will not be reliable for all case=
s.
So I will play a little bit with PDFBox, MultiValent and JPedal, and the
best will be included (maybe I will include multiple engines, and combine
their output...)

hli
--=20
M=F8=F8se trained to mix concrete and Hendr=
ik Lipka
sign complicated insurance forms hendrik.lipka_at_gm=
x.de
                                                            www.hendriklipk=
a.de

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Thu Oct 28 2004 - 07:30:02 PDT