github.com/bcampbell/scrapeomat@v0.0.0-20220820232205-23e64141c89e/TODO

github.com/bcampbell/scrapeomat@v0.0.0-20220820232205-23e64141c89e/TODO (about)

     1  archive: use gzipped .warc files to save space
     2  better setup instructions (particularly db creation)
     3  artform should allow for underscored slugs  (eg in http://www.lancashiretelegraph.co.uk)
     4  show better error message for cascadia parse error in config files (ie file and linenum)
     5  slurpserver: show proper IP address in log when behind proxy server
     6  slurpserver: simple API token access (and show token in log)
     7  scrapeomat: when scraping from list (-i), show how many articles already in database
     8  summary API+tool: add option to batch by week
     9  WARC archive read/write - should now use github.com/bcampbell/warc