github.com/suntong/cascadia@v1.3.0/README.md (about)

     1  # cascadia
     2  <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
     3  [![All Contributors](https://img.shields.io/badge/all_contributors-6-orange.svg?style=flat-square)](#contributors-)
     4  <!-- ALL-CONTRIBUTORS-BADGE:END -->
     5  
     6  [![MIT License](http://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
     7  [![GoDoc](https://godoc.org/github.com/suntong/cascadia?status.svg)](http://godoc.org/github.com/suntong/cascadia)
     8  [![Go Report Card](https://goreportcard.com/badge/github.com/suntong/cascadia)](https://goreportcard.com/report/github.com/suntong/cascadia)
     9  [![Build Status](https://github.com/suntong/cascadia/actions/workflows/go-release-build.yml/badge.svg?branch=master)](https://github.com/suntong/cascadia/actions/workflows/go-release-build.yml)
    10  [![PoweredBy WireFrame](https://github.com/go-easygen/wireframe/blob/master/PoweredBy-WireFrame-B.svg)](http://godoc.org/github.com/go-easygen/wireframe)
    11  
    12  
    13  
    14  ## TOC
    15  - [cascadia - CSS selector CLI tool](#cascadia---css-selector-cli-tool)
    16  - [Usage](#usage)
    17    - [$ cascadia](#-cascadia)
    18    - [Examples](#examples)
    19  - [Download/install binaries](#downloadinstall-binaries)
    20    - [The binary executables](#the-binary-executables)
    21    - [Distro package](#distro-package)
    22    - [Debian package](#debian-package)
    23  - [Install Source](#install-source)
    24  - [Author](#author)
    25  - [Contributors](#contributors-)
    26  
    27  ## cascadia - CSS selector CLI tool
    28  
    29  The [Go Cascadia package](https://github.com/andybalholm/cascadia) implements CSS selectors for html. This is the command line tool, started as a thin wrapper around that package, but growing into a better tool to test CSS selectors without writing Go code:
    30  
    31  ## Usage
    32  
    33  ### $ cascadia
    34  ```sh
    35  cascadia wrapper
    36  Version 1.3.0 built on 2023-06-30
    37  Copyright (C) 2016-2023, Tong Sun
    38  
    39  Command line interface to go cascadia CSS selectors package
    40  
    41  Usage:
    42    cascadia -i in -c css -o [Options...]
    43  
    44  Options:
    45  
    46    -h, --help        display help information 
    47    -i, --in         *The html/xml file to read from (or stdin) 
    48    -o, --out        *The output file (or stdout) 
    49    -c, --css        *CSS selectors (can provide more if not using --piece) 
    50    -t, --text        Text output for none-block selection mode 
    51    -R, --Raw         Raw text output, no trimming of leading and trailing white space 
    52    -p, --piece       sub CSS selectors within -css to split that block up into pieces
    53  			format: PieceName=[PieceStyle:]selector_string
    54  			 PieceStyle:
    55  			  RAW : will return the selected as-is
    56  			  ATTR : will return the value of attribute selector_string
    57  			 Else the text will be returned 
    58    -d, --delimiter   delimiter for pieces csv output [=	]
    59    -w, --wrap-html   wrap up the output with html tags 
    60    -y, --style       style component within the wrapped html head 
    61    -b, --base        base href tag used in the wrapped up html 
    62    -q, --quiet       be quiet
    63  ```
    64  
    65  Its output has two modes, _none-block selection mode_ and _block selection mode_, depending on whether the `--piece` parameter is given on the command line or not.
    66  
    67  For details about the concept of block and pieces, check out [andrew-d/goscrape](https://github.com/andrew-d/goscrape) (in fact, `cascadia` was initially developed just for it, so that I don't need to tweak Go code, build & run it just to test out the block and pieces selectors). Here is the exception:
    68  
    69  - Inside each page, there's 1 or more *blocks* - some logical method of splitting up a page into subcomponents.
    70  - Inside each block, you define some number of *pieces* of data that you wish
    71    to extract.  Each piece consists of a name, a selector, and what data to
    72    extract from the current block.
    73  
    74  This all sounds rather complicated, but in practice it's quite simple. See the next section for details.
    75  
    76  In summary,
    77  
    78  - The none-block selection mode will output the selection as HTML source by default
    79    * but if `-t`, or `--text` cli option is provided, the none-block selection mode will [output as text](https://github.com/suntong/cascadia/issues/6#issuecomment-980757881) instead.
    80      - By default, such text output will get their leading and trailing white space trimmed.
    81      - However, if `-R`, or `--Raw` cli option is provided, no trimming will be done.
    82  - The block selection mode will output HTML as text in a `tsv`/`csv` table form by default
    83    * if the `--piece` selection is prefixed with `RAW:`, then that specific block selection will output in HTML instead. See the following for details.
    84  
    85  ### Examples
    86  
    87  All the three `-i -o -c` options are required. By default it reads from `stdin` and output to `stdout`:
    88  
    89  ```sh
    90  $ echo '<input type="radio" name="Sex" value="F" />' | tee /tmp/cascadia.xml | cascadia -i -o -c 'input[name=Sex][value=F]'
    91  1 elements for 'input[name=Sex][value=F]':
    92  <input type="radio" name="Sex" value="F"/>
    93  ```
    94  
    95  Either the input or the output can be followed by a file name:
    96  
    97  
    98  ```sh
    99  $ cascadia -i /tmp/cascadia.xml -o -c 'input[name=Sex][value=F]'
   100  1 elements for 'input[name=Sex][value=F]':
   101  <input type="radio" name="Sex" value="F"/>
   102  ```
   103  
   104  
   105  ```sh
   106  $ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html
   107  1 elements for 'input[name=Sex][value=F]':
   108  
   109  $ cat /tmp/out.html
   110  <input type="radio" name="Sex" value="F"/>
   111  ```
   112  
   113  More other options can be applied too:
   114  
   115  ```sh
   116  # using --wrap-html
   117  $ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w
   118  1 elements for 'input[name=Sex][value=F]':
   119  
   120  $ cat /tmp/out.html
   121  <!DOCTYPE html>
   122  <html>
   123  <head>
   124  <meta charset="utf-8">
   125  <base href="">
   126  
   127  </head>
   128  <body>
   129  <input type="radio" name="Sex" value="F"/>
   130  </body>
   131  
   132  # using --wrap-html with --style
   133  $ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w -y '<link rel="stylesheet" href="styles.css">'
   134  1 elements for 'input[name=Sex][value=F]':
   135  
   136  $ cat /tmp/out.html
   137  <!DOCTYPE html>
   138  <html>
   139  <head>
   140  <meta charset="utf-8">
   141  <base href="">
   142  <link rel="stylesheet" href="styles.css">
   143  </head>
   144  <body>
   145  <input type="radio" name="Sex" value="F"/>
   146  </body>
   147  ```
   148  
   149  - For more on using the `--style` option, check out ["adding styles"](https://github.com/suntong/cascadia/wiki/Adding-styles).
   150  - For more examples, check out the [wiki](https://github.com/suntong/cascadia/wiki/), which includes but not limits to, 
   151  
   152    * [None-block selection mode](https://github.com/suntong/cascadia/wiki#none-block-selection-mode)
   153      * [Multi-selection](https://github.com/suntong/cascadia/wiki#multi-selection)
   154    * [Block selection mode](https://github.com/suntong/cascadia/wiki#block-selection-mode)
   155      * [Block selection mode HTML output](https://github.com/suntong/cascadia/wiki#block-selection-mode-html-output)
   156      * [Block selection mode table output](https://github.com/suntong/cascadia/wiki#block-selection-mode-table-output)
   157      * [Attribute selection](https://github.com/suntong/cascadia/wiki#attribute-selection)
   158      * [Twitter Search](https://github.com/suntong/cascadia/wiki#twitter-search)
   159    * [Reconstruct the separated pages](https://github.com/suntong/cascadia/wiki#reconstruct-the-separated-pages)
   160    * [More On CSS Selector](https://github.com/suntong/cascadia/wiki#more-on-css-selector)
   161  
   162  ## Install Debian/Ubuntu package
   163  
   164      sudo apt install -y cascadia
   165  
   166  ## Download/install binaries
   167  
   168  - The latest binary executables are available 
   169  as the result of the Continuous-Integration (CI) process.
   170  - I.e., they are built automatically right from the source code at every git release by [GitHub Actions](https://docs.github.com/en/actions).
   171  - There are two ways to get/install such binary executables
   172    * Using the **binary executables** directly, or
   173    * Using **packages** for your distro
   174  
   175  ### The binary executables
   176  
   177  - The latest binary executables are directly available under  
   178  https://github.com/suntong/cascadia/releases/latest 
   179  - Pick & choose the one that suits your OS and its architecture. E.g., for Linux, it would be the `cascadia_verxx_linux_amd64.tar.gz` file. 
   180  - Available OS for binary executables are
   181    * Linux
   182    * Mac OS (darwin)
   183    * Windows
   184  - If your OS and its architecture is not available in the download list, please let me know and I'll add it.
   185  - The manual installation is just to unpack it and move/copy the binary executable to somewhere in `PATH`. For example,
   186  
   187  ``` sh
   188  tar -xvf cascadia_*_linux_amd64.tar.gz
   189  sudo mv -v cascadia_*_linux_amd64/cascadia /usr/local/bin/
   190  rmdir -v cascadia_*_linux_amd64
   191  ```
   192  
   193  
   194  ### Distro package
   195  
   196  - [Packages available for Linux distros](https://cloudsmith.io/~suntong/repos/repo/packages/) are
   197    * [Alpine Linux](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-alpine)
   198    * [Debian](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-deb)
   199    * [RedHat](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-rpm)
   200  
   201  The repo setup instruction url has been given above.
   202  For example, for [Debian](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-deb) --
   203  
   204  ### Debian package
   205  
   206  
   207  ```sh
   208  curl -1sLf \
   209    'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \
   210    | sudo -E bash
   211  
   212  # That's it. You then can do your normal operations, like
   213  
   214  sudo apt update
   215  apt-cache policy cascadia
   216  
   217  sudo apt install -y cascadia
   218  ```
   219  
   220  ## Install Source
   221  
   222  To install the source code instead:
   223  
   224  ```
   225  go install github.com/suntong/cascadia@latest
   226  ```
   227  
   228  ## Author
   229  
   230  Tong SUN  
   231  ![suntong from cpan.org](https://img.shields.io/badge/suntong-%40cpan.org-lightgrey.svg "suntong from cpan.org")
   232  
   233  _Powered by_ [**WireFrame**](https://github.com/go-easygen/wireframe)  
   234  [![PoweredBy WireFrame](https://github.com/go-easygen/wireframe/blob/master/PoweredBy-WireFrame-Y.svg)](http://godoc.org/github.com/go-easygen/wireframe)  
   235  the _one-stop wire-framing solution_ for Go cli based projects, from _init_ to _deploy_.
   236  ## Contributors ✨
   237  
   238  Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
   239  
   240  <!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
   241  <!-- prettier-ignore-start -->
   242  <!-- markdownlint-disable -->
   243  <table>
   244    <tbody>
   245      <tr>
   246        <td align="center" valign="top" width="14.28%"><a href="https://github.com/suntong"><img src="https://avatars.githubusercontent.com/u/422244?v=4?s=100" width="100px;" alt="suntong"/><br /><sub><b>suntong</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=suntong" title="Code">💻</a> <a href="#ideas-suntong" title="Ideas, Planning, & Feedback">🤔</a> <a href="#design-suntong" title="Design">🎨</a> <a href="#data-suntong" title="Data">🔣</a> <a href="https://github.com/suntong/cascadia/commits?author=suntong" title="Tests">⚠️</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3Asuntong" title="Bug reports">🐛</a> <a href="https://github.com/suntong/cascadia/commits?author=suntong" title="Documentation">📖</a> <a href="#blog-suntong" title="Blogposts">📝</a> <a href="#example-suntong" title="Examples">💡</a> <a href="#tutorial-suntong" title="Tutorials">✅</a> <a href="#tool-suntong" title="Tools">🔧</a> <a href="#platform-suntong" title="Packaging/porting to new platform">📦</a> <a href="https://github.com/suntong/cascadia/pulls?q=is%3Apr+reviewed-by%3Asuntong" title="Reviewed Pull Requests">👀</a> <a href="#question-suntong" title="Answering Questions">💬</a> <a href="#maintenance-suntong" title="Maintenance">🚧</a> <a href="#infra-suntong" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a></td>
   247        <td align="center" valign="top" width="14.28%"><a href="https://github.com/hoshsadiq"><img src="https://avatars.githubusercontent.com/u/600045?v=4?s=100" width="100px;" alt="Hosh"/><br /><sub><b>Hosh</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=hoshsadiq" title="Code">💻</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3Ahoshsadiq" title="Bug reports">🐛</a> <a href="#userTesting-hoshsadiq" title="User Testing">📓</a></td>
   248        <td align="center" valign="top" width="14.28%"><a href="https://github.com/mh-cbon"><img src="https://avatars.githubusercontent.com/u/17096799?v=4?s=100" width="100px;" alt="mh-cbon"/><br /><sub><b>mh-cbon</b></sub></a><br /><a href="https://github.com/suntong/cascadia/issues?q=author%3Amh-cbon" title="Bug reports">🐛</a> <a href="#ideas-mh-cbon" title="Ideas, Planning, & Feedback">🤔</a> <a href="#userTesting-mh-cbon" title="User Testing">📓</a></td>
   249        <td align="center" valign="top" width="14.28%"><a href="https://www.digglife.net"><img src="https://avatars.githubusercontent.com/u/1468378?v=4?s=100" width="100px;" alt="朱聖黎 (Zhu Sheng Li)"/><br /><sub><b>朱聖黎 (Zhu Sheng Li)</b></sub></a><br /><a href="https://github.com/suntong/cascadia/issues?q=author%3Adigglife" title="Bug reports">🐛</a> <a href="#userTesting-digglife" title="User Testing">📓</a></td>
   250        <td align="center" valign="top" width="14.28%"><a href="https://github.com/himcc"><img src="https://avatars.githubusercontent.com/u/3031794?v=4?s=100" width="100px;" alt="himcc"/><br /><sub><b>himcc</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=himcc" title="Code">💻</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3Ahimcc" title="Bug reports">🐛</a> <a href="#userTesting-himcc" title="User Testing">📓</a></td>
   251        <td align="center" valign="top" width="14.28%"><a href="http://www.devalias.net/"><img src="https://avatars.githubusercontent.com/u/753891?v=4?s=100" width="100px;" alt="Glenn 'devalias' Grant"/><br /><sub><b>Glenn 'devalias' Grant</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=0xdevalias" title="Code">💻</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3A0xdevalias" title="Bug reports">🐛</a> <a href="#userTesting-0xdevalias" title="User Testing">📓</a></td>
   252      </tr>
   253    </tbody>
   254  </table>
   255  
   256  <!-- markdownlint-restore -->
   257  <!-- prettier-ignore-end -->
   258  
   259  <!-- ALL-CONTRIBUTORS-LIST:END -->
   260  
   261  This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!