github.com/bcampbell/scrapeomat@v0.0.0-20220820232205-23e64141c89e/cmd/linkgrabber/README.md (about)

     1  # linkgrabber
     2  
     3  Noddy little tool to download pages and output all the links on them.
     4  
     5  By default grabs all links, but you can give a css selector to narrow them
     6  down.
     7  
     8  For example, grabbing the links in the nytimes main navigation bar:
     9  ```
    10  $ linkgrabber -s '[data-testid="mini-nav"] a' https://nytimes.com | sort | uniq
    11  https://nytimes.com/
    12  https://www.nytimes.com/section/arts
    13  https://www.nytimes.com/section/books
    14  https://www.nytimes.com/section/business
    15  https://www.nytimes.com/section/food
    16  https://www.nytimes.com/section/health
    17  https://www.nytimes.com/section/magazine
    18  https://www.nytimes.com/section/nyregion
    19  https://www.nytimes.com/section/opinion
    20  https://www.nytimes.com/section/politics
    21  https://www.nytimes.com/section/realestate
    22  https://www.nytimes.com/section/science
    23  https://www.nytimes.com/section/sports
    24  https://www.nytimes.com/section/style
    25  https://www.nytimes.com/section/technology
    26  https://www.nytimes.com/section/t-magazine
    27  https://www.nytimes.com/section/travel
    28  https://www.nytimes.com/section/us
    29  https://www.nytimes.com/section/world
    30  https://www.nytimes.com/video
    31  ```
    32