github.com/suntong/cascadia@v1.3.0/README.md (about) 1 # cascadia 2 <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> 3 [![All Contributors](https://img.shields.io/badge/all_contributors-6-orange.svg?style=flat-square)](#contributors-) 4 <!-- ALL-CONTRIBUTORS-BADGE:END --> 5 6 [![MIT License](http://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) 7 [![GoDoc](https://godoc.org/github.com/suntong/cascadia?status.svg)](http://godoc.org/github.com/suntong/cascadia) 8 [![Go Report Card](https://goreportcard.com/badge/github.com/suntong/cascadia)](https://goreportcard.com/report/github.com/suntong/cascadia) 9 [![Build Status](https://github.com/suntong/cascadia/actions/workflows/go-release-build.yml/badge.svg?branch=master)](https://github.com/suntong/cascadia/actions/workflows/go-release-build.yml) 10 [![PoweredBy WireFrame](https://github.com/go-easygen/wireframe/blob/master/PoweredBy-WireFrame-B.svg)](http://godoc.org/github.com/go-easygen/wireframe) 11 12 13 14 ## TOC 15 - [cascadia - CSS selector CLI tool](#cascadia---css-selector-cli-tool) 16 - [Usage](#usage) 17 - [$ cascadia](#-cascadia) 18 - [Examples](#examples) 19 - [Download/install binaries](#downloadinstall-binaries) 20 - [The binary executables](#the-binary-executables) 21 - [Distro package](#distro-package) 22 - [Debian package](#debian-package) 23 - [Install Source](#install-source) 24 - [Author](#author) 25 - [Contributors](#contributors-) 26 27 ## cascadia - CSS selector CLI tool 28 29 The [Go Cascadia package](https://github.com/andybalholm/cascadia) implements CSS selectors for html. This is the command line tool, started as a thin wrapper around that package, but growing into a better tool to test CSS selectors without writing Go code: 30 31 ## Usage 32 33 ### $ cascadia 34 ```sh 35 cascadia wrapper 36 Version 1.3.0 built on 2023-06-30 37 Copyright (C) 2016-2023, Tong Sun 38 39 Command line interface to go cascadia CSS selectors package 40 41 Usage: 42 cascadia -i in -c css -o [Options...] 43 44 Options: 45 46 -h, --help display help information 47 -i, --in *The html/xml file to read from (or stdin) 48 -o, --out *The output file (or stdout) 49 -c, --css *CSS selectors (can provide more if not using --piece) 50 -t, --text Text output for none-block selection mode 51 -R, --Raw Raw text output, no trimming of leading and trailing white space 52 -p, --piece sub CSS selectors within -css to split that block up into pieces 53 format: PieceName=[PieceStyle:]selector_string 54 PieceStyle: 55 RAW : will return the selected as-is 56 ATTR : will return the value of attribute selector_string 57 Else the text will be returned 58 -d, --delimiter delimiter for pieces csv output [= ] 59 -w, --wrap-html wrap up the output with html tags 60 -y, --style style component within the wrapped html head 61 -b, --base base href tag used in the wrapped up html 62 -q, --quiet be quiet 63 ``` 64 65 Its output has two modes, _none-block selection mode_ and _block selection mode_, depending on whether the `--piece` parameter is given on the command line or not. 66 67 For details about the concept of block and pieces, check out [andrew-d/goscrape](https://github.com/andrew-d/goscrape) (in fact, `cascadia` was initially developed just for it, so that I don't need to tweak Go code, build & run it just to test out the block and pieces selectors). Here is the exception: 68 69 - Inside each page, there's 1 or more *blocks* - some logical method of splitting up a page into subcomponents. 70 - Inside each block, you define some number of *pieces* of data that you wish 71 to extract. Each piece consists of a name, a selector, and what data to 72 extract from the current block. 73 74 This all sounds rather complicated, but in practice it's quite simple. See the next section for details. 75 76 In summary, 77 78 - The none-block selection mode will output the selection as HTML source by default 79 * but if `-t`, or `--text` cli option is provided, the none-block selection mode will [output as text](https://github.com/suntong/cascadia/issues/6#issuecomment-980757881) instead. 80 - By default, such text output will get their leading and trailing white space trimmed. 81 - However, if `-R`, or `--Raw` cli option is provided, no trimming will be done. 82 - The block selection mode will output HTML as text in a `tsv`/`csv` table form by default 83 * if the `--piece` selection is prefixed with `RAW:`, then that specific block selection will output in HTML instead. See the following for details. 84 85 ### Examples 86 87 All the three `-i -o -c` options are required. By default it reads from `stdin` and output to `stdout`: 88 89 ```sh 90 $ echo '<input type="radio" name="Sex" value="F" />' | tee /tmp/cascadia.xml | cascadia -i -o -c 'input[name=Sex][value=F]' 91 1 elements for 'input[name=Sex][value=F]': 92 <input type="radio" name="Sex" value="F"/> 93 ``` 94 95 Either the input or the output can be followed by a file name: 96 97 98 ```sh 99 $ cascadia -i /tmp/cascadia.xml -o -c 'input[name=Sex][value=F]' 100 1 elements for 'input[name=Sex][value=F]': 101 <input type="radio" name="Sex" value="F"/> 102 ``` 103 104 105 ```sh 106 $ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html 107 1 elements for 'input[name=Sex][value=F]': 108 109 $ cat /tmp/out.html 110 <input type="radio" name="Sex" value="F"/> 111 ``` 112 113 More other options can be applied too: 114 115 ```sh 116 # using --wrap-html 117 $ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w 118 1 elements for 'input[name=Sex][value=F]': 119 120 $ cat /tmp/out.html 121 <!DOCTYPE html> 122 <html> 123 <head> 124 <meta charset="utf-8"> 125 <base href=""> 126 127 </head> 128 <body> 129 <input type="radio" name="Sex" value="F"/> 130 </body> 131 132 # using --wrap-html with --style 133 $ cascadia -i /tmp/cascadia.xml -c 'input[name=Sex][value=F]' -o /tmp/out.html -w -y '<link rel="stylesheet" href="styles.css">' 134 1 elements for 'input[name=Sex][value=F]': 135 136 $ cat /tmp/out.html 137 <!DOCTYPE html> 138 <html> 139 <head> 140 <meta charset="utf-8"> 141 <base href=""> 142 <link rel="stylesheet" href="styles.css"> 143 </head> 144 <body> 145 <input type="radio" name="Sex" value="F"/> 146 </body> 147 ``` 148 149 - For more on using the `--style` option, check out ["adding styles"](https://github.com/suntong/cascadia/wiki/Adding-styles). 150 - For more examples, check out the [wiki](https://github.com/suntong/cascadia/wiki/), which includes but not limits to, 151 152 * [None-block selection mode](https://github.com/suntong/cascadia/wiki#none-block-selection-mode) 153 * [Multi-selection](https://github.com/suntong/cascadia/wiki#multi-selection) 154 * [Block selection mode](https://github.com/suntong/cascadia/wiki#block-selection-mode) 155 * [Block selection mode HTML output](https://github.com/suntong/cascadia/wiki#block-selection-mode-html-output) 156 * [Block selection mode table output](https://github.com/suntong/cascadia/wiki#block-selection-mode-table-output) 157 * [Attribute selection](https://github.com/suntong/cascadia/wiki#attribute-selection) 158 * [Twitter Search](https://github.com/suntong/cascadia/wiki#twitter-search) 159 * [Reconstruct the separated pages](https://github.com/suntong/cascadia/wiki#reconstruct-the-separated-pages) 160 * [More On CSS Selector](https://github.com/suntong/cascadia/wiki#more-on-css-selector) 161 162 ## Install Debian/Ubuntu package 163 164 sudo apt install -y cascadia 165 166 ## Download/install binaries 167 168 - The latest binary executables are available 169 as the result of the Continuous-Integration (CI) process. 170 - I.e., they are built automatically right from the source code at every git release by [GitHub Actions](https://docs.github.com/en/actions). 171 - There are two ways to get/install such binary executables 172 * Using the **binary executables** directly, or 173 * Using **packages** for your distro 174 175 ### The binary executables 176 177 - The latest binary executables are directly available under 178 https://github.com/suntong/cascadia/releases/latest 179 - Pick & choose the one that suits your OS and its architecture. E.g., for Linux, it would be the `cascadia_verxx_linux_amd64.tar.gz` file. 180 - Available OS for binary executables are 181 * Linux 182 * Mac OS (darwin) 183 * Windows 184 - If your OS and its architecture is not available in the download list, please let me know and I'll add it. 185 - The manual installation is just to unpack it and move/copy the binary executable to somewhere in `PATH`. For example, 186 187 ``` sh 188 tar -xvf cascadia_*_linux_amd64.tar.gz 189 sudo mv -v cascadia_*_linux_amd64/cascadia /usr/local/bin/ 190 rmdir -v cascadia_*_linux_amd64 191 ``` 192 193 194 ### Distro package 195 196 - [Packages available for Linux distros](https://cloudsmith.io/~suntong/repos/repo/packages/) are 197 * [Alpine Linux](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-alpine) 198 * [Debian](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-deb) 199 * [RedHat](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-rpm) 200 201 The repo setup instruction url has been given above. 202 For example, for [Debian](https://cloudsmith.io/~suntong/repos/repo/setup/#formats-deb) -- 203 204 ### Debian package 205 206 207 ```sh 208 curl -1sLf \ 209 'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \ 210 | sudo -E bash 211 212 # That's it. You then can do your normal operations, like 213 214 sudo apt update 215 apt-cache policy cascadia 216 217 sudo apt install -y cascadia 218 ``` 219 220 ## Install Source 221 222 To install the source code instead: 223 224 ``` 225 go install github.com/suntong/cascadia@latest 226 ``` 227 228 ## Author 229 230 Tong SUN 231 ![suntong from cpan.org](https://img.shields.io/badge/suntong-%40cpan.org-lightgrey.svg "suntong from cpan.org") 232 233 _Powered by_ [**WireFrame**](https://github.com/go-easygen/wireframe) 234 [![PoweredBy WireFrame](https://github.com/go-easygen/wireframe/blob/master/PoweredBy-WireFrame-Y.svg)](http://godoc.org/github.com/go-easygen/wireframe) 235 the _one-stop wire-framing solution_ for Go cli based projects, from _init_ to _deploy_. 236 ## Contributors ✨ 237 238 Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)): 239 240 <!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --> 241 <!-- prettier-ignore-start --> 242 <!-- markdownlint-disable --> 243 <table> 244 <tbody> 245 <tr> 246 <td align="center" valign="top" width="14.28%"><a href="https://github.com/suntong"><img src="https://avatars.githubusercontent.com/u/422244?v=4?s=100" width="100px;" alt="suntong"/><br /><sub><b>suntong</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=suntong" title="Code">💻</a> <a href="#ideas-suntong" title="Ideas, Planning, & Feedback">🤔</a> <a href="#design-suntong" title="Design">🎨</a> <a href="#data-suntong" title="Data">🔣</a> <a href="https://github.com/suntong/cascadia/commits?author=suntong" title="Tests">⚠️</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3Asuntong" title="Bug reports">🐛</a> <a href="https://github.com/suntong/cascadia/commits?author=suntong" title="Documentation">📖</a> <a href="#blog-suntong" title="Blogposts">📝</a> <a href="#example-suntong" title="Examples">💡</a> <a href="#tutorial-suntong" title="Tutorials">✅</a> <a href="#tool-suntong" title="Tools">🔧</a> <a href="#platform-suntong" title="Packaging/porting to new platform">📦</a> <a href="https://github.com/suntong/cascadia/pulls?q=is%3Apr+reviewed-by%3Asuntong" title="Reviewed Pull Requests">👀</a> <a href="#question-suntong" title="Answering Questions">💬</a> <a href="#maintenance-suntong" title="Maintenance">🚧</a> <a href="#infra-suntong" title="Infrastructure (Hosting, Build-Tools, etc)">🚇</a></td> 247 <td align="center" valign="top" width="14.28%"><a href="https://github.com/hoshsadiq"><img src="https://avatars.githubusercontent.com/u/600045?v=4?s=100" width="100px;" alt="Hosh"/><br /><sub><b>Hosh</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=hoshsadiq" title="Code">💻</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3Ahoshsadiq" title="Bug reports">🐛</a> <a href="#userTesting-hoshsadiq" title="User Testing">📓</a></td> 248 <td align="center" valign="top" width="14.28%"><a href="https://github.com/mh-cbon"><img src="https://avatars.githubusercontent.com/u/17096799?v=4?s=100" width="100px;" alt="mh-cbon"/><br /><sub><b>mh-cbon</b></sub></a><br /><a href="https://github.com/suntong/cascadia/issues?q=author%3Amh-cbon" title="Bug reports">🐛</a> <a href="#ideas-mh-cbon" title="Ideas, Planning, & Feedback">🤔</a> <a href="#userTesting-mh-cbon" title="User Testing">📓</a></td> 249 <td align="center" valign="top" width="14.28%"><a href="https://www.digglife.net"><img src="https://avatars.githubusercontent.com/u/1468378?v=4?s=100" width="100px;" alt="朱聖黎 (Zhu Sheng Li)"/><br /><sub><b>朱聖黎 (Zhu Sheng Li)</b></sub></a><br /><a href="https://github.com/suntong/cascadia/issues?q=author%3Adigglife" title="Bug reports">🐛</a> <a href="#userTesting-digglife" title="User Testing">📓</a></td> 250 <td align="center" valign="top" width="14.28%"><a href="https://github.com/himcc"><img src="https://avatars.githubusercontent.com/u/3031794?v=4?s=100" width="100px;" alt="himcc"/><br /><sub><b>himcc</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=himcc" title="Code">💻</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3Ahimcc" title="Bug reports">🐛</a> <a href="#userTesting-himcc" title="User Testing">📓</a></td> 251 <td align="center" valign="top" width="14.28%"><a href="http://www.devalias.net/"><img src="https://avatars.githubusercontent.com/u/753891?v=4?s=100" width="100px;" alt="Glenn 'devalias' Grant"/><br /><sub><b>Glenn 'devalias' Grant</b></sub></a><br /><a href="https://github.com/suntong/cascadia/commits?author=0xdevalias" title="Code">💻</a> <a href="https://github.com/suntong/cascadia/issues?q=author%3A0xdevalias" title="Bug reports">🐛</a> <a href="#userTesting-0xdevalias" title="User Testing">📓</a></td> 252 </tr> 253 </tbody> 254 </table> 255 256 <!-- markdownlint-restore --> 257 <!-- prettier-ignore-end --> 258 259 <!-- ALL-CONTRIBUTORS-LIST:END --> 260 261 This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!