[NTLK] Scraping the waybackmachine for Newton material - seeking advice

Clu doctorclu at sbcglobal.net
Tue Feb 4 17:45:47 EST 2020


There were a few archives of websites that were passed around for a while.  I might have my discs somewhere around too.



> On Feb 4, 2020, at 3:54 PM, Alex Santos via NewtonTalk <newtontalk at newtontalk.net> wrote:
> 
> Hello
> 
> I am using a batch script and the wayback_machine_downloader to download old sites that might have once carried the following types of files:
> pdf, hqx, sit, dd, pkg, abs, bin, sea, cpt, dmg, txt
> 
> A lot of material comes down. The reason behind this project is to expose files that may obscured due to the navigation required to view a site’s history. Trying to find files manually would be an impossible challenge. Eventually my findings will go online for the public to consume. Ironically I might very well return them to the internet archive as consumable downloads, one per site and I may be targeting the Macintosh Garden and/or approach UNNA to understand if they want to review the material and put it online. If the wayback machine captured it it’s downloadable.
> 
> At the moment I have roughly 1000 URLs to process, some those will surely be duplicate top level domain (TLD) with unique subdomains but I have a lot to process but it’s say for me to setup a list and batch process these and just let it run for days capturing files.
> 
> The question that I have to ask before I go through this is if there are other filetypes that I should capture. Were Newton packages distributed as pkg files primarily (though these would cross over to Mac OS X as well) or simply put, should I capture any other filetypes beyond what I noted at the top.
> 
> Also, does anyone want me to upload to their FTP server? I do have a FTPSE (encrypted with a cert) server running so if you are an UNNA or otherwise and would like access to these I could create an account on my FTP so that you can download these.
> 
> Ah, before I forget, are there any old URLs or companies from back in the day that I should prioritize or be sure to include? I already downloaded the mo site (Motorola) and did so up until the 2003 year make in the hopes of finding a PDF of the Motorola Marco user manual but that wasn’t to be found. So if you know any material that once existed on some site I can certainly try to see if it is on the waybackmachine and download it to boot.
> 
> Hope this interests folks. The main purpose of this is to expose any and all files that are thought to be lost but which might otherwise already exist.
> 
> Cheers!
> —Alex
> ----------------------------------------------------------------------
> 
> http://newtontalk.net/
> http://twitter.com/newtontalk




More information about the NewtonTalk mailing list