README.md (1966B)
1 # HTML [![API reference](https://img.shields.io/badge/godoc-reference-5272B4)](https://pkg.go.dev/github.com/tdewolff/parse/v2/html?tab=doc) 2 3 This package is an HTML5 lexer written in [Go][1]. It follows the specification at [The HTML syntax](http://www.w3.org/TR/html5/syntax.html). The lexer takes an io.Reader and converts it into tokens until the EOF. 4 5 ## Installation 6 Run the following command 7 8 go get -u github.com/tdewolff/parse/v2/html 9 10 or add the following import and run project with `go get` 11 12 import "github.com/tdewolff/parse/v2/html" 13 14 ## Lexer 15 ### Usage 16 The following initializes a new Lexer with io.Reader `r`: 17 ``` go 18 l := html.NewLexer(parse.NewInput(r)) 19 ``` 20 21 To tokenize until EOF an error, use: 22 ``` go 23 for { 24 tt, data := l.Next() 25 switch tt { 26 case html.ErrorToken: 27 // error or EOF set in l.Err() 28 return 29 case html.StartTagToken: 30 // ... 31 for { 32 ttAttr, dataAttr := l.Next() 33 if ttAttr != html.AttributeToken { 34 break 35 } 36 // ... 37 } 38 // ... 39 } 40 } 41 ``` 42 43 All tokens: 44 ``` go 45 ErrorToken TokenType = iota // extra token when errors occur 46 CommentToken 47 DoctypeToken 48 StartTagToken 49 StartTagCloseToken 50 StartTagVoidToken 51 EndTagToken 52 AttributeToken 53 TextToken 54 ``` 55 56 ### Examples 57 ``` go 58 package main 59 60 import ( 61 "os" 62 63 "github.com/tdewolff/parse/v2/html" 64 ) 65 66 // Tokenize HTML from stdin. 67 func main() { 68 l := html.NewLexer(parse.NewInput(os.Stdin)) 69 for { 70 tt, data := l.Next() 71 switch tt { 72 case html.ErrorToken: 73 if l.Err() != io.EOF { 74 fmt.Println("Error on line", l.Line(), ":", l.Err()) 75 } 76 return 77 case html.StartTagToken: 78 fmt.Println("Tag", string(data)) 79 for { 80 ttAttr, dataAttr := l.Next() 81 if ttAttr != html.AttributeToken { 82 break 83 } 84 85 key := dataAttr 86 val := l.AttrVal() 87 fmt.Println("Attribute", string(key), "=", string(val)) 88 } 89 // ... 90 } 91 } 92 } 93 ``` 94 95 ## License 96 Released under the [MIT license](https://github.com/tdewolff/parse/blob/master/LICENSE.md). 97 98 [1]: http://golang.org/ "Go Language"