Re: [NTLK] PDF/CHM reader, Bluetooth and other questions

From: Hendrik Lipka (hendrik.lipka_at_gmx.de)
Date: Thu Oct 28 2004 - 07:08:40 PDT

Next message: Sonny Hung: "Re: [NTLK] ATA Support Error"
Previous message: Sonny Hung: "Re: [NTLK] [OT] How often does a '72 Convertible Cutlass cross the radar on NTLK?"
In reply to: Gavin McKenzie: "Re: [NTLK] PDF/CHM reader, Bluetooth and other questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

=0D=0AThursday, October 28, 2004, 3:37:25 PM, you wrote:

> Have you tried to use the Adobe accessibility web/email-interface for
> converting PDF to HTML as a method of text extraction?

No. I would need a stand-alone tool for inclusion in PDFConv, so it would
be of no use for me.

> Have you found other libraries or methods for PDF->text conversion to
> be better than the Adobe web interface?

I had played a little bit with earlier versions of JPedal, and it had some
problems with german umlauts.
From=20the way PDF works, text extraction will not be reliable for all case=
s.
So I will play a little bit with PDFBox, MultiValent and JPedal, and the
best will be included (maybe I will include multiple engines, and combine
their output...)

hli
--=20
M=F8=F8se trained to mix concrete and Hendr=
ik Lipka
sign complicated insurance forms hendrik.lipka_at_gm=
x.de
www.hendriklipk=
a.de

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/

Next message: Sonny Hung: "Re: [NTLK] ATA Support Error"
Previous message: Sonny Hung: "Re: [NTLK] [OT] How often does a '72 Convertible Cutlass cross the radar on NTLK?"
In reply to: Gavin McKenzie: "Re: [NTLK] PDF/CHM reader, Bluetooth and other questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 28 2004 - 07:30:02 PDT