From: Hendrik Lipka (hendrik.lipka_at_gmx.de)
Date: Thu Oct 28 2004 - 07:08:40 PDT
=0D=0AThursday, October 28, 2004, 3:37:25 PM, you wrote:
> Have you tried to use the Adobe accessibility web/email-interface for
> converting PDF to HTML as a method of text extraction?
No. I would need a stand-alone tool for inclusion in PDFConv, so it would
be of no use for me.
> Have you found other libraries or methods for PDF->text conversion to
> be better than the Adobe web interface?
I had played a little bit with earlier versions of JPedal, and it had some
problems with german umlauts.
From=20the way PDF works, text extraction will not be reliable for all case=
s.
So I will play a little bit with PDFBox, MultiValent and JPedal, and the
best will be included (maybe I will include multiple engines, and combine
their output...)
hli
--=20
M=F8=F8se trained to mix concrete and Hendr=
ik Lipka
sign complicated insurance forms hendrik.lipka_at_gm=
x.de
www.hendriklipk=
a.de
-- This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries Official Newton FAQ: http://www.chuma.org/newton/faq/ WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/
This archive was generated by hypermail 2.1.5 : Thu Oct 28 2004 - 07:30:02 PDT