github.com/influxdata/influxql@v1.1.0/README.md (about) 1 # The Influx Query Language Specification 2 3 ## Introduction 4 5 This is a reference for the Influx Query Language ("InfluxQL"). 6 7 InfluxQL is a SQL-like query language for interacting with InfluxDB. It has 8 been lovingly crafted to feel familiar to those coming from other SQL or 9 SQL-like environments while providing features specific to storing and analyzing 10 time series data. 11 12 13 ## Notation 14 15 The syntax is specified using Extended Backus-Naur Form ("EBNF"). EBNF is the 16 same notation used in the [Go](http://golang.org) programming language 17 specification, which can be found [here](https://golang.org/ref/spec). Not so 18 coincidentally, InfluxDB is written in Go. 19 20 ``` 21 Production = production_name "=" [ Expression ] "." . 22 Expression = Alternative { "|" Alternative } . 23 Alternative = Term { Term } . 24 Term = production_name | token [ "…" token ] | Group | Option | Repetition . 25 Group = "(" Expression ")" . 26 Option = "[" Expression "]" . 27 Repetition = "{" Expression "}" . 28 ``` 29 30 Notation operators in order of increasing precedence: 31 32 ``` 33 | alternation 34 () grouping 35 [] option (0 or 1 times) 36 {} repetition (0 to n times) 37 ``` 38 39 ## Comments 40 41 Both single and multiline comments are supported. A comment is treated 42 the same as whitespace by the parser. 43 44 ``` 45 -- single line comment 46 /* 47 multiline comment 48 */ 49 ``` 50 51 Single line comments will skip all text until the scanner hits a 52 newline. Multiline comments will skip all text until the end comment 53 marker is hit. Nested multiline comments are not supported so the 54 following does not work: 55 56 ``` 57 /* /* this does not work */ */ 58 ``` 59 60 ## Query representation 61 62 ### Characters 63 64 InfluxQL is Unicode text encoded in [UTF-8](http://en.wikipedia.org/wiki/UTF-8). 65 66 ``` 67 newline = /* the Unicode code point U+000A */ . 68 unicode_char = /* an arbitrary Unicode code point except newline */ . 69 ``` 70 71 ## Letters and digits 72 73 Letters are the set of ASCII characters plus the underscore character _ (U+005F) 74 is considered a letter. 75 76 Only decimal digits are supported. 77 78 ``` 79 letter = ascii_letter | "_" . 80 ascii_letter = "A" … "Z" | "a" … "z" . 81 digit = "0" … "9" . 82 ``` 83 84 ## Identifiers 85 86 Identifiers are tokens which refer to database names, retention policy names, 87 user names, measurement names, tag keys, and field keys. 88 89 The rules: 90 91 - double quoted identifiers can contain any unicode character other than a new line 92 - double quoted identifiers can contain escaped `"` characters (i.e., `\"`) 93 - double quoted identifiers can contain InfluxQL keywords 94 - unquoted identifiers must start with an upper or lowercase ASCII character or "_" 95 - unquoted identifiers may contain only ASCII letters, decimal digits, and "_" 96 97 ``` 98 identifier = unquoted_identifier | quoted_identifier . 99 unquoted_identifier = ( letter ) { letter | digit } . 100 quoted_identifier = `"` unicode_char { unicode_char } `"` . 101 ``` 102 103 #### Examples: 104 105 ``` 106 cpu 107 _cpu_stats 108 "1h" 109 "anything really" 110 "1_Crazy-1337.identifier>NAME👍" 111 ``` 112 113 ## Keywords 114 115 ``` 116 ALL ALTER ANALYZE ANY AS ASC 117 BEGIN BY CREATE CONTINUOUS DATABASE DATABASES 118 DEFAULT DELETE DESC DESTINATIONS DIAGNOSTICS DISTINCT 119 DROP DURATION END EVERY EXPLAIN FIELD 120 FOR FROM GRANT GRANTS GROUP GROUPS 121 IN INF INSERT INTO KEY KEYS 122 KILL LIMIT SHOW MEASUREMENT MEASUREMENTS NAME 123 OFFSET ON ORDER PASSWORD POLICY POLICIES 124 PRIVILEGES QUERIES QUERY READ REPLICATION RESAMPLE 125 RETENTION REVOKE SELECT SERIES SET SHARD 126 SHARDS SLIMIT SOFFSET STATS SUBSCRIPTION SUBSCRIPTIONS 127 TAG TO USER USERS VALUES WHERE 128 WITH WRITE 129 ``` 130 131 ## Literals 132 133 ### Integers 134 135 InfluxQL supports decimal integer literals. Hexadecimal and octal literals are 136 not currently supported. 137 138 ``` 139 int_lit = [ "+" | "-" ] ( "1" … "9" ) { digit } . 140 ``` 141 142 ### Floats 143 144 InfluxQL supports floating-point literals. Exponents are not currently supported. 145 146 ``` 147 float_lit = [ "+" | "-" ] ( "." digit { digit } | digit { digit } "." { digit } ) . 148 ``` 149 150 ### Strings 151 152 String literals must be surrounded by single quotes. Strings may contain `'` 153 characters as long as they are escaped (i.e., `\'`). 154 155 ``` 156 string_lit = `'` { unicode_char } `'` . 157 ``` 158 159 ### Durations 160 161 Duration literals specify a length of time. An integer literal followed 162 immediately (with no spaces) by a duration unit listed below is interpreted as 163 a duration literal. 164 165 ### Duration units 166 | Units | Meaning | 167 |--------|-----------------------------------------| 168 | u or µ | microseconds (1 millionth of a second) | 169 | ms | milliseconds (1 thousandth of a second) | 170 | s | second | 171 | m | minute | 172 | h | hour | 173 | d | day | 174 | w | week | 175 176 ``` 177 duration_lit = int_lit duration_unit . 178 duration_unit = "u" | "µ" | "ms" | "s" | "m" | "h" | "d" | "w" . 179 ``` 180 181 ### Dates & Times 182 183 The date and time literal format is not specified in EBNF like the rest of this document. It is specified using Go's date / time parsing format, which is a reference date written in the format required by InfluxQL. The reference date time is: 184 185 InfluxQL reference date time: January 2nd, 2006 at 3:04:05 PM 186 187 ``` 188 time_lit = "2006-01-02 15:04:05.999999" | "2006-01-02" . 189 ``` 190 191 ### Booleans 192 193 ``` 194 bool_lit = TRUE | FALSE . 195 ``` 196 197 ### Regular Expressions 198 199 ``` 200 regex_lit = "/" { unicode_char } "/" . 201 ``` 202 203 **Comparators:** 204 `=~` matches against 205 `!~` doesn't match against 206 207 > **Note:** Use regular expressions to match measurements and tags. 208 You cannot use regular expressions to match databases, retention policies, or fields. 209 210 ## Queries 211 212 A query is composed of one or more statements separated by a semicolon. 213 214 ``` 215 query = statement { ";" statement } . 216 217 statement = alter_retention_policy_stmt | 218 create_continuous_query_stmt | 219 create_database_stmt | 220 create_retention_policy_stmt | 221 create_subscription_stmt | 222 create_user_stmt | 223 delete_stmt | 224 drop_continuous_query_stmt | 225 drop_database_stmt | 226 drop_measurement_stmt | 227 drop_retention_policy_stmt | 228 drop_series_stmt | 229 drop_shard_stmt | 230 drop_subscription_stmt | 231 drop_user_stmt | 232 explain_stmt | 233 grant_stmt | 234 kill_query_statement | 235 show_continuous_queries_stmt | 236 show_databases_stmt | 237 show_field_keys_stmt | 238 show_grants_stmt | 239 show_measurements_stmt | 240 show_queries_stmt | 241 show_retention_policies | 242 show_series_stmt | 243 show_shard_groups_stmt | 244 show_shards_stmt | 245 show_subscriptions_stmt| 246 show_tag_keys_stmt | 247 show_tag_values_stmt | 248 show_users_stmt | 249 revoke_stmt | 250 select_stmt . 251 ``` 252 253 ## Statements 254 255 ### ALTER RETENTION POLICY 256 257 ``` 258 alter_retention_policy_stmt = "ALTER RETENTION POLICY" policy_name on_clause 259 retention_policy_option 260 [ retention_policy_option ] 261 [ retention_policy_option ] 262 [ retention_policy_option ] . 263 ``` 264 265 > Replication factors do not serve a purpose with single node instances. 266 267 #### Examples: 268 269 ```sql 270 -- Set default retention policy for mydb to 1h.cpu. 271 ALTER RETENTION POLICY "1h.cpu" ON "mydb" DEFAULT 272 273 -- Change duration and replication factor. 274 ALTER RETENTION POLICY "policy1" ON "somedb" DURATION 1h REPLICATION 4 275 ``` 276 277 ### CREATE CONTINUOUS QUERY 278 279 ``` 280 create_continuous_query_stmt = "CREATE CONTINUOUS QUERY" query_name on_clause 281 [ "RESAMPLE" resample_opts ] 282 "BEGIN" select_stmt "END" . 283 284 query_name = identifier . 285 286 resample_opts = (every_stmt for_stmt | every_stmt | for_stmt) . 287 every_stmt = "EVERY" duration_lit 288 for_stmt = "FOR" duration_lit 289 ``` 290 291 #### Examples: 292 293 ```sql 294 -- selects from DEFAULT retention policy and writes into 6_months retention policy 295 CREATE CONTINUOUS QUERY "10m_event_count" 296 ON "db_name" 297 BEGIN 298 SELECT count("value") 299 INTO "6_months"."events" 300 FROM "events" 301 GROUP BY time(10m) 302 END; 303 304 -- this selects from the output of one continuous query in one retention policy and outputs to another series in another retention policy 305 CREATE CONTINUOUS QUERY "1h_event_count" 306 ON "db_name" 307 BEGIN 308 SELECT sum("count") as "count" 309 INTO "2_years"."events" 310 FROM "6_months"."events" 311 GROUP BY time(1h) 312 END; 313 314 -- this customizes the resample interval so the interval is queried every 10s and intervals are resampled until 2m after their start time 315 -- when resample is used, at least one of "EVERY" or "FOR" must be used 316 CREATE CONTINUOUS QUERY "cpu_mean" 317 ON "db_name" 318 RESAMPLE EVERY 10s FOR 2m 319 BEGIN 320 SELECT mean("value") 321 INTO "cpu_mean" 322 FROM "cpu" 323 GROUP BY time(1m) 324 END; 325 ``` 326 327 ### CREATE DATABASE 328 329 ``` 330 create_database_stmt = "CREATE DATABASE" db_name 331 [ WITH 332 [ retention_policy_duration ] 333 [ retention_policy_replication ] 334 [ retention_policy_shard_group_duration ] 335 [ retention_policy_name ] 336 ] . 337 ``` 338 339 > Replication factors do not serve a purpose with single node instances. 340 341 #### Examples: 342 343 ```sql 344 -- Create a database called foo 345 CREATE DATABASE "foo" 346 347 -- Create a database called bar with a new DEFAULT retention policy and specify the duration, replication, shard group duration, and name of that retention policy 348 CREATE DATABASE "bar" WITH DURATION 1d REPLICATION 1 SHARD DURATION 30m NAME "myrp" 349 350 -- Create a database called mydb with a new DEFAULT retention policy and specify the name of that retention policy 351 CREATE DATABASE "mydb" WITH NAME "myrp" 352 ``` 353 354 ### CREATE RETENTION POLICY 355 356 ``` 357 create_retention_policy_stmt = "CREATE RETENTION POLICY" policy_name on_clause 358 retention_policy_duration 359 retention_policy_replication 360 [ retention_policy_shard_group_duration ] 361 [ "DEFAULT" ] . 362 ``` 363 364 > Replication factors do not serve a purpose with single node instances. 365 366 #### Examples 367 368 ```sql 369 -- Create a retention policy. 370 CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 371 372 -- Create a retention policy and set it as the DEFAULT. 373 CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 DEFAULT 374 375 -- Create a retention policy and specify the shard group duration. 376 CREATE RETENTION POLICY "10m.events" ON "somedb" DURATION 60m REPLICATION 2 SHARD DURATION 30m 377 ``` 378 379 ### CREATE SUBSCRIPTION 380 381 Subscriptions tell InfluxDB to send all the data it receives to Kapacitor or other third parties. 382 383 ``` 384 create_subscription_stmt = "CREATE SUBSCRIPTION" subscription_name "ON" db_name "." retention_policy "DESTINATIONS" ("ANY"|"ALL") host { "," host} . 385 ``` 386 387 #### Examples: 388 389 ```sql 390 -- Create a SUBSCRIPTION on database 'mydb' and retention policy 'autogen' that send data to 'example.com:9090' via UDP. 391 CREATE SUBSCRIPTION "sub0" ON "mydb"."autogen" DESTINATIONS ALL 'udp://example.com:9090' 392 393 -- Create a SUBSCRIPTION on database 'mydb' and retention policy 'autogen' that round robins the data to 'h1.example.com:9090' and 'h2.example.com:9090'. 394 CREATE SUBSCRIPTION "sub0" ON "mydb"."autogen" DESTINATIONS ANY 'udp://h1.example.com:9090', 'udp://h2.example.com:9090' 395 ``` 396 397 ### CREATE USER 398 399 ``` 400 create_user_stmt = "CREATE USER" user_name "WITH PASSWORD" password 401 [ "WITH ALL PRIVILEGES" ] . 402 ``` 403 404 #### Examples: 405 406 ```sql 407 -- Create a normal database user. 408 CREATE USER "jdoe" WITH PASSWORD '1337password' 409 410 -- Create an admin user. 411 -- Note: Unlike the GRANT statement, the "PRIVILEGES" keyword is required here. 412 CREATE USER "jdoe" WITH PASSWORD '1337password' WITH ALL PRIVILEGES 413 ``` 414 415 > **Note:** The password string must be wrapped in single quotes. 416 417 ### DELETE 418 419 ``` 420 delete_stmt = "DELETE" ( from_clause | where_clause | from_clause where_clause ) . 421 ``` 422 423 #### Examples: 424 425 ```sql 426 DELETE FROM "cpu" 427 DELETE FROM "cpu" WHERE time < '2000-01-01T00:00:00Z' 428 DELETE WHERE time < '2000-01-01T00:00:00Z' 429 ``` 430 431 ### DROP CONTINUOUS QUERY 432 433 ``` 434 drop_continuous_query_stmt = "DROP CONTINUOUS QUERY" query_name on_clause . 435 ``` 436 437 #### Example: 438 439 ```sql 440 DROP CONTINUOUS QUERY "myquery" ON "mydb" 441 ``` 442 443 ### DROP DATABASE 444 445 ``` 446 drop_database_stmt = "DROP DATABASE" db_name . 447 ``` 448 449 #### Example: 450 451 ```sql 452 DROP DATABASE "mydb" 453 ``` 454 455 ### DROP MEASUREMENT 456 457 ``` 458 drop_measurement_stmt = "DROP MEASUREMENT" measurement . 459 ``` 460 461 #### Examples: 462 463 ```sql 464 -- drop the cpu measurement 465 DROP MEASUREMENT "cpu" 466 ``` 467 468 ### DROP RETENTION POLICY 469 470 ``` 471 drop_retention_policy_stmt = "DROP RETENTION POLICY" policy_name on_clause . 472 ``` 473 474 #### Example: 475 476 ```sql 477 -- drop the retention policy named 1h.cpu from mydb 478 DROP RETENTION POLICY "1h.cpu" ON "mydb" 479 ``` 480 481 ### DROP SERIES 482 483 ``` 484 drop_series_stmt = "DROP SERIES" ( from_clause | where_clause | from_clause where_clause ) . 485 ``` 486 487 #### Example: 488 489 ```sql 490 DROP SERIES FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu8' 491 492 ``` 493 494 ### DROP SHARD 495 496 ``` 497 drop_shard_stmt = "DROP SHARD" ( shard_id ) . 498 ``` 499 500 #### Example: 501 502 ``` 503 DROP SHARD 1 504 ``` 505 506 ### DROP SUBSCRIPTION 507 508 ``` 509 drop_subscription_stmt = "DROP SUBSCRIPTION" subscription_name "ON" db_name "." retention_policy . 510 ``` 511 512 #### Example: 513 514 ```sql 515 DROP SUBSCRIPTION "sub0" ON "mydb"."autogen" 516 ``` 517 518 ### DROP USER 519 520 ``` 521 drop_user_stmt = "DROP USER" user_name . 522 ``` 523 524 #### Example: 525 526 ```sql 527 DROP USER "jdoe" 528 ``` 529 530 ### EXPLAIN 531 532 > **NOTE:** This functionality is unimplemented. 533 534 ``` 535 explain_stmt = "EXPLAIN" [ "ANALYZE" ] select_stmt . 536 ``` 537 538 ### GRANT 539 540 > **NOTE:** Users can be granted privileges on databases that do not exist. 541 542 ``` 543 grant_stmt = "GRANT" privilege [ on_clause ] to_clause . 544 ``` 545 546 #### Examples: 547 548 ```sql 549 -- grant admin privileges 550 GRANT ALL TO "jdoe" 551 552 -- grant read access to a database 553 GRANT READ ON "mydb" TO "jdoe" 554 ``` 555 556 ### KILL QUERY 557 558 ``` 559 kill_query_statement = "KILL QUERY" query_id . 560 ``` 561 562 #### Examples: 563 564 ``` 565 --- kill a query with the query_id 36 566 KILL QUERY 36 567 ``` 568 569 > **NOTE:** Identify the `query_id` from the `SHOW QUERIES` output. 570 571 ### SHOW CONTINUOUS QUERIES 572 573 ``` 574 show_continuous_queries_stmt = "SHOW CONTINUOUS QUERIES" . 575 ``` 576 577 #### Example: 578 579 ```sql 580 -- show all continuous queries 581 SHOW CONTINUOUS QUERIES 582 ``` 583 584 ### SHOW DATABASES 585 586 ``` 587 show_databases_stmt = "SHOW DATABASES" . 588 ``` 589 590 #### Example: 591 592 ```sql 593 -- show all databases 594 SHOW DATABASES 595 ``` 596 597 ### SHOW FIELD KEYS 598 599 ``` 600 show_field_keys_stmt = "SHOW FIELD KEYS" [ from_clause ] . 601 ``` 602 603 #### Examples: 604 605 ```sql 606 -- show field keys and field value data types from all measurements 607 SHOW FIELD KEYS 608 609 -- show field keys and field value data types from specified measurement 610 SHOW FIELD KEYS FROM "cpu" 611 ``` 612 613 ### SHOW GRANTS 614 615 ``` 616 show_grants_stmt = "SHOW GRANTS FOR" user_name . 617 ``` 618 619 #### Example: 620 621 ```sql 622 -- show grants for jdoe 623 SHOW GRANTS FOR "jdoe" 624 ``` 625 626 ### SHOW MEASUREMENTS 627 628 ``` 629 show_measurements_stmt = "SHOW MEASUREMENTS" [ with_measurement_clause ] [ where_clause ] [ limit_clause ] [ offset_clause ] . 630 ``` 631 632 #### Examples: 633 634 ```sql 635 -- show all measurements 636 SHOW MEASUREMENTS 637 638 -- show measurements where region tag = 'uswest' AND host tag = 'serverA' 639 SHOW MEASUREMENTS WHERE "region" = 'uswest' AND "host" = 'serverA' 640 641 -- show measurements that start with 'h2o' 642 SHOW MEASUREMENTS WITH MEASUREMENT =~ /h2o.*/ 643 ``` 644 645 ### SHOW QUERIES 646 647 ``` 648 show_queries_stmt = "SHOW QUERIES" . 649 ``` 650 651 #### Example: 652 653 ```sql 654 -- show all currently-running queries 655 SHOW QUERIES 656 ``` 657 658 ### SHOW RETENTION POLICIES 659 660 ``` 661 show_retention_policies = "SHOW RETENTION POLICIES" on_clause . 662 ``` 663 664 #### Example: 665 666 ```sql 667 -- show all retention policies on a database 668 SHOW RETENTION POLICIES ON "mydb" 669 ``` 670 671 ### SHOW SERIES 672 673 ``` 674 show_series_stmt = "SHOW SERIES" [ from_clause ] [ where_clause ] [ limit_clause ] [ offset_clause ] . 675 ``` 676 677 #### Example: 678 679 ```sql 680 SHOW SERIES FROM "telegraf"."autogen"."cpu" WHERE cpu = 'cpu8' 681 ``` 682 683 ### SHOW SHARD GROUPS 684 685 ``` 686 show_shard_groups_stmt = "SHOW SHARD GROUPS" . 687 ``` 688 689 #### Example: 690 691 ```sql 692 SHOW SHARD GROUPS 693 ``` 694 695 ### SHOW SHARDS 696 697 ``` 698 show_shards_stmt = "SHOW SHARDS" . 699 ``` 700 701 #### Example: 702 703 ```sql 704 SHOW SHARDS 705 ``` 706 707 ### SHOW SUBSCRIPTIONS 708 709 ``` 710 show_subscriptions_stmt = "SHOW SUBSCRIPTIONS" . 711 ``` 712 713 #### Example: 714 715 ```sql 716 SHOW SUBSCRIPTIONS 717 ``` 718 719 ### SHOW TAG KEYS 720 721 ``` 722 show_tag_keys_stmt = "SHOW TAG KEYS" [ from_clause ] [ where_clause ] [ group_by_clause ] 723 [ limit_clause ] [ offset_clause ] . 724 ``` 725 726 #### Examples: 727 728 ```sql 729 -- show all tag keys 730 SHOW TAG KEYS 731 732 -- show all tag keys from the cpu measurement 733 SHOW TAG KEYS FROM "cpu" 734 735 -- show all tag keys from the cpu measurement where the region key = 'uswest' 736 SHOW TAG KEYS FROM "cpu" WHERE "region" = 'uswest' 737 738 -- show all tag keys where the host key = 'serverA' 739 SHOW TAG KEYS WHERE "host" = 'serverA' 740 ``` 741 742 ### SHOW TAG VALUES 743 744 ``` 745 show_tag_values_stmt = "SHOW TAG VALUES" [ from_clause ] with_tag_clause [ where_clause ] 746 [ group_by_clause ] [ limit_clause ] [ offset_clause ] . 747 ``` 748 749 #### Examples: 750 751 ```sql 752 -- show all tag values across all measurements for the region tag 753 SHOW TAG VALUES WITH KEY = "region" 754 755 -- show tag values from the cpu measurement for the region tag 756 SHOW TAG VALUES FROM "cpu" WITH KEY = "region" 757 758 -- show tag values across all measurements for all tag keys that do not include the letter c 759 SHOW TAG VALUES WITH KEY !~ /.*c.*/ 760 761 -- show tag values from the cpu measurement for region & host tag keys where service = 'redis' 762 SHOW TAG VALUES FROM "cpu" WITH KEY IN ("region", "host") WHERE "service" = 'redis' 763 ``` 764 765 ### SHOW USERS 766 767 ``` 768 show_users_stmt = "SHOW USERS" . 769 ``` 770 771 #### Example: 772 773 ```sql 774 -- show all users 775 SHOW USERS 776 ``` 777 778 ### REVOKE 779 780 ``` 781 revoke_stmt = "REVOKE" privilege [ on_clause ] "FROM" user_name . 782 ``` 783 784 #### Examples: 785 786 ```sql 787 -- revoke admin privileges from jdoe 788 REVOKE ALL PRIVILEGES FROM "jdoe" 789 790 -- revoke read privileges from jdoe on mydb 791 REVOKE READ ON "mydb" FROM "jdoe" 792 ``` 793 794 ### SELECT 795 796 ``` 797 select_stmt = "SELECT" fields from_clause [ into_clause ] [ where_clause ] 798 [ group_by_clause ] [ order_by_clause ] [ limit_clause ] 799 [ offset_clause ] [ slimit_clause ] [ soffset_clause ] 800 [ timezone_clause ] . 801 ``` 802 803 #### Examples: 804 805 ```sql 806 -- select mean value from the cpu measurement where region = 'uswest' grouped by 10 minute intervals 807 SELECT mean("value") FROM "cpu" WHERE "region" = 'uswest' GROUP BY time(10m) fill(0) 808 809 -- select from all measurements beginning with cpu into the same measurement name in the cpu_1h retention policy 810 SELECT mean("value") INTO "cpu_1h".:MEASUREMENT FROM /cpu.*/ 811 812 -- select from measurements grouped by the day with a timezone 813 SELECT mean("value") FROM "cpu" GROUP BY region, time(1d) fill(0) tz("America/Chicago") 814 ``` 815 816 ## Clauses 817 818 ``` 819 from_clause = "FROM" measurements . 820 821 group_by_clause = "GROUP BY" dimensions fill(fill_option). 822 823 into_clause = "INTO" ( measurement | back_ref ). 824 825 limit_clause = "LIMIT" int_lit . 826 827 offset_clause = "OFFSET" int_lit . 828 829 slimit_clause = "SLIMIT" int_lit . 830 831 soffset_clause = "SOFFSET" int_lit . 832 833 timezone_clause = tz(string_lit) . 834 835 on_clause = "ON" db_name . 836 837 order_by_clause = "ORDER BY" sort_fields . 838 839 to_clause = "TO" user_name . 840 841 where_clause = "WHERE" expr . 842 843 with_measurement_clause = "WITH MEASUREMENT" ( "=" measurement | "=~" regex_lit ) . 844 845 with_tag_clause = "WITH KEY" ( "=" tag_key | "!=" tag_key | "=~" regex_lit | "IN (" tag_keys ")" ) . 846 ``` 847 848 ## Expressions 849 850 ``` 851 binary_op = "+" | "-" | "*" | "/" | "%" | "&" | "|" | "^" | "AND" | 852 "OR" | "=" | "!=" | "<>" | "<" | "<=" | ">" | ">=" . 853 854 expr = unary_expr { binary_op unary_expr } . 855 856 unary_expr = "(" expr ")" | var_ref | time_lit | string_lit | int_lit | 857 float_lit | bool_lit | duration_lit | regex_lit . 858 ``` 859 860 ## Other 861 862 ``` 863 alias = "AS" identifier . 864 865 back_ref = ( policy_name ".:MEASUREMENT" ) | 866 ( db_name "." [ policy_name ] ".:MEASUREMENT" ) . 867 868 db_name = identifier . 869 870 dimension = expr . 871 872 dimensions = dimension { "," dimension } . 873 874 field_key = identifier . 875 876 field = expr [ alias ] . 877 878 fields = field { "," field } . 879 880 fill_option = "null" | "none" | "previous" | "linear" | int_lit | float_lit . 881 882 host = string_lit . 883 884 measurement = measurement_name | 885 ( policy_name "." measurement_name ) | 886 ( db_name "." [ policy_name ] "." measurement_name ) . 887 888 measurements = measurement { "," measurement } . 889 890 measurement_name = identifier | regex_lit . 891 892 password = string_lit . 893 894 policy_name = identifier . 895 896 privilege = "ALL" [ "PRIVILEGES" ] | "READ" | "WRITE" . 897 898 query_id = int_lit . 899 900 query_name = identifier . 901 902 retention_policy = identifier . 903 904 retention_policy_option = retention_policy_duration | 905 retention_policy_replication | 906 retention_policy_shard_group_duration | 907 "DEFAULT" . 908 909 retention_policy_duration = "DURATION" duration_lit . 910 911 retention_policy_replication = "REPLICATION" int_lit . 912 913 retention_policy_shard_group_duration = "SHARD DURATION" duration_lit . 914 915 retention_policy_name = "NAME" identifier . 916 917 series_id = int_lit . 918 919 shard_id = int_lit . 920 921 sort_field = field_key [ ASC | DESC ] . 922 923 sort_fields = sort_field { "," sort_field } . 924 925 subscription_name = identifier . 926 927 tag_key = identifier . 928 929 tag_keys = tag_key { "," tag_key } . 930 931 user_name = identifier . 932 933 var_ref = measurement . 934 ``` 935 936 ## Query Engine Internals 937 938 Once you understand the language itself, it's important to know how these 939 language constructs are implemented in the query engine. This gives you an 940 intuitive sense for how results will be processed and how to create efficient 941 queries. 942 943 The life cycle of a query looks like this: 944 945 1. InfluxQL query string is tokenized and then parsed into an abstract syntax 946 tree (AST). This is the code representation of the query itself. 947 948 2. The AST is passed to the `QueryExecutor` which directs queries to the 949 appropriate handlers. For example, queries related to meta data are executed 950 by the meta service and `SELECT` statements are executed by the shards 951 themselves. 952 953 3. The query engine then determines the shards that match the `SELECT` 954 statement's time range. From these shards, iterators are created for each 955 field in the statement. 956 957 4. Iterators are passed to the emitter which drains them and joins the resulting 958 points. The emitter's job is to convert simple time/value points into the 959 more complex result objects that are returned to the client. 960 961 962 ### Understanding Iterators 963 964 Iterators are at the heart of the query engine. They provide a simple interface 965 for looping over a set of points. For example, this is an iterator over Float 966 points: 967 968 ``` 969 type FloatIterator interface { 970 Next() (*FloatPoint, error) 971 } 972 ``` 973 974 These iterators are created through the `IteratorCreator` interface: 975 976 ``` 977 type IteratorCreator interface { 978 CreateIterator(m *Measurement, opt IteratorOptions) (Iterator, error) 979 } 980 ``` 981 982 The `IteratorOptions` provide arguments about field selection, time ranges, 983 and dimensions that the iterator creator can use when planning an iterator. 984 The `IteratorCreator` interface is used at many levels such as the `Shards`, 985 `Shard`, and `Engine`. This allows optimizations to be performed when applicable 986 such as returning a precomputed `COUNT()`. 987 988 Iterators aren't just for reading raw data from storage though. Iterators can be 989 composed so that they provided additional functionality around an input 990 iterator. For example, a `DistinctIterator` can compute the distinct values for 991 each time window for an input iterator. Or a `FillIterator` can generate 992 additional points that are missing from an input iterator. 993 994 This composition also lends itself well to aggregation. For example, a statement 995 such as this: 996 997 ``` 998 SELECT MEAN(value) FROM cpu GROUP BY time(10m) 999 ``` 1000 1001 In this case, `MEAN(value)` is a `MeanIterator` wrapping an iterator from the 1002 underlying shards. However, if we can add an additional iterator to determine 1003 the derivative of the mean: 1004 1005 ``` 1006 SELECT DERIVATIVE(MEAN(value), 20m) FROM cpu GROUP BY time(10m) 1007 ``` 1008 1009 1010 ### Understanding Auxiliary Fields 1011 1012 Because InfluxQL allows users to use selector functions such as `FIRST()`, 1013 `LAST()`, `MIN()`, and `MAX()`, the engine must provide a way to return related 1014 data at the same time with the selected point. 1015 1016 For example, in this query: 1017 1018 ``` 1019 SELECT FIRST(value), host FROM cpu GROUP BY time(1h) 1020 ``` 1021 1022 We are selecting the first `value` that occurs every hour but we also want to 1023 retrieve the `host` associated with that point. Since the `Point` types only 1024 specify a single typed `Value` for efficiency, we push the `host` into the 1025 auxiliary fields of the point. These auxiliary fields are attached to the point 1026 until it is passed to the emitter where the fields get split off to their own 1027 iterator. 1028 1029 1030 ### Built-in Iterators 1031 1032 There are many helper iterators that let us build queries: 1033 1034 * Merge Iterator - This iterator combines one or more iterators into a single 1035 new iterator of the same type. This iterator guarantees that all points 1036 within a window will be output before starting the next window but does not 1037 provide ordering guarantees within the window. This allows for fast access 1038 for aggregate queries which do not need stronger sorting guarantees. 1039 1040 * Sorted Merge Iterator - This iterator also combines one or more iterators 1041 into a new iterator of the same type. However, this iterator guarantees 1042 time ordering of every point. This makes it slower than the `MergeIterator` 1043 but this ordering guarantee is required for non-aggregate queries which 1044 return the raw data points. 1045 1046 * Limit Iterator - This iterator limits the number of points per name/tag 1047 group. This is the implementation of the `LIMIT` & `OFFSET` syntax. 1048 1049 * Fill Iterator - This iterator injects extra points if they are missing from 1050 the input iterator. It can provide `null` points, points with the previous 1051 value, or points with a specific value. 1052 1053 * Buffered Iterator - This iterator provides the ability to "unread" a point 1054 back onto a buffer so it can be read again next time. This is used extensively 1055 to provide lookahead for windowing. 1056 1057 * Reduce Iterator - This iterator calls a reduction function for each point in 1058 a window. When the window is complete then all points for that window are 1059 output. This is used for simple aggregate functions such as `COUNT()`. 1060 1061 * Reduce Slice Iterator - This iterator collects all points for a window first 1062 and then passes them all to a reduction function at once. The results are 1063 returned from the iterator. This is used for aggregate functions such as 1064 `DERIVATIVE()`. 1065 1066 * Transform Iterator - This iterator calls a transform function for each point 1067 from an input iterator. This is used for executing binary expressions. 1068 1069 * Dedupe Iterator - This iterator only outputs unique points. It is resource 1070 intensive so it is only used for small queries such as meta query statements. 1071 1072 1073 ### Call Iterators 1074 1075 Function calls in InfluxQL are implemented at two levels. Some calls can be 1076 wrapped at multiple layers to improve efficiency. For example, a `COUNT()` can 1077 be performed at the shard level and then multiple `CountIterator`s can be 1078 wrapped with another `CountIterator` to compute the count of all shards. These 1079 iterators can be created using `NewCallIterator()`. 1080 1081 Some iterators are more complex or need to be implemented at a higher level. 1082 For example, the `DERIVATIVE()` needs to retrieve all points for a window first 1083 before performing the calculation. This iterator is created by the engine itself 1084 and is never requested to be created by the lower levels. 1085 1086 ### Subqueries 1087 1088 Subqueries are built on top of iterators. Most of the work involved in 1089 supporting subqueries is in organizing how data is streamed to the 1090 iterators that will process the data. 1091 1092 The final ordering of the stream has to output all points from one 1093 series before moving to the next series and it also needs to ensure 1094 those points are printed in order. So there are two separate concepts we 1095 need to consider when creating an iterator: ordering and grouping. 1096 1097 When an inner query has a different grouping than the outermost query, 1098 we still need to group together related points into buckets, but we do 1099 not have to ensure that all points from one buckets are output before 1100 the points in another bucket. In fact, if we do that, we will be unable 1101 to perform the grouping for the outer query correctly. Instead, we group 1102 all points by the outermost query for an interval and then, within that 1103 interval, we group the points for the inner query. For example, here are 1104 series keys and times in seconds (fields are omitted since they don't 1105 matter in this example): 1106 1107 cpu,host=server01 0 1108 cpu,host=server01 10 1109 cpu,host=server01 20 1110 cpu,host=server01 30 1111 cpu,host=server02 0 1112 cpu,host=server02 10 1113 cpu,host=server02 20 1114 cpu,host=server02 30 1115 1116 With the following query: 1117 1118 SELECT mean(max) FROM (SELECT max(value) FROM cpu GROUP BY host, time(20s)) GROUP BY time(20s) 1119 1120 The final grouping keeps all of the points together which means we need 1121 to group `server01` with `server02`. That means we output the points 1122 from the underlying engine like this: 1123 1124 cpu,host=server01 0 1125 cpu,host=server01 10 1126 cpu,host=server02 0 1127 cpu,host=server02 10 1128 cpu,host=server01 20 1129 cpu,host=server01 30 1130 cpu,host=server02 20 1131 cpu,host=server02 30 1132 1133 Within each one of those time buckets, we calculate the `max()` value 1134 for each unique host so the output stream gets transformed to look like 1135 this: 1136 1137 cpu,host=server01 0 1138 cpu,host=server02 0 1139 cpu,host=server01 20 1140 cpu,host=server02 20 1141 1142 Then we can process the `mean()` on this stream of data instead and it 1143 will be output in the correct order. This is true of any order of 1144 grouping since grouping can only go from more specific to less specific. 1145 1146 When it comes to ordering, unordered data is faster to process, but we 1147 always need to produce ordered data. When processing a raw query with no 1148 aggregates, we need to ensure data coming from the engine is ordered so 1149 the output is ordered. When we have an aggregate, we know one point is 1150 being emitted for each interval and will always produce ordered output. 1151 So for aggregates, we can take unordered data as the input and get 1152 ordered output. Any ordered data as input will always result in ordered 1153 data so we just need to look at how an iterator processes unordered 1154 data. 1155 1156 | | raw query | selector (without group by time) | selector (with group by time) | aggregator | 1157 |-----------------|------------------|----------------------------------|-------------------------------|----------------| 1158 | ordered input | ordered output | ordered output | ordered output | ordered output | 1159 | unordered input | unordered output | unordered output | ordered output | ordered output | 1160 1161 Since we always need ordered output, we just need to work backwards and 1162 determine which pattern of input gives us ordered output. If both 1163 ordered and unordered input produce ordered output, we prefer unordered 1164 input since it is faster. 1165 1166 There are also certain aggregates that require ordered input like 1167 `median()` and `percentile()`. These functions will explicitly request 1168 ordered input. It is also important to realize that selectors that are 1169 grouped by time are the equivalent of an aggregator. It is only 1170 selectors without a group by time that are different.