README.md (9458B)
1 # Go XML Formatter 2 3 [![MIT License](http://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) 4 [![Go Doc](https://img.shields.io/badge/godoc-reference-4b68a3.svg)](https://godoc.org/github.com/go-xmlfmt/xmlfmt) 5 [![Go Report Card](https://goreportcard.com/badge/github.com/go-xmlfmt/xmlfmt)](https://goreportcard.com/report/github.com/go-xmlfmt/xmlfmt) 6 [![Codeship Status](https://codeship.com/projects/c49f02b0-a384-0134-fb20-2e0351080565/status?branch=master)](https://codeship.com/projects/190297) 7 8 ## Synopsis 9 10 The Go XML Formatter, xmlfmt, will format the XML string in a readable way. 11 12 ```go 13 package main 14 15 import "github.com/go-xmlfmt/xmlfmt" 16 17 func main() { 18 xmlfmt.NL = "\n" 19 xml1 := `<root><this><is>a</is><test /><message><!-- with comment --><org><cn>Some org-or-other</cn><ph>Wouldnt you like to know</ph></org><contact><fn>Pat</fn><ln>Califia</ln></contact></message></this></root>` 20 x := xmlfmt.FormatXML(xml1, "\t", " ") 21 print(x) 22 23 // If the XML Comments have nested tags in them 24 xml1 = `<book> <author>Fred</author> 25 <!-- 26 <price>20</price><currency>USD</currency> 27 --> 28 <isbn>23456</isbn> </book>` 29 x = xmlfmt.FormatXML(xml1, "", " ", true) 30 print(x) 31 } 32 33 ``` 34 35 Output: 36 37 ```xml 38 <root> 39 <this> 40 <is>a 41 </is> 42 <test /> 43 <message> 44 <!-- with comment --> 45 <org> 46 <cn>Some org-or-other 47 </cn> 48 <ph>Wouldnt you like to know 49 </ph> 50 </org> 51 <contact> 52 <fn>Pat 53 </fn> 54 <ln>Califia 55 </ln> 56 </contact> 57 </message> 58 </this> 59 </root> 60 61 62 <book> 63 <author>Fred 64 </author> 65 <!-- <price>20</price><currency>USD</currency> --> 66 <isbn>23456 67 </isbn> 68 </book> 69 ``` 70 71 There is no XML decoding and encoding involved, only pure regular expression matching and replacing. So it is much faster than going through decoding and encoding procedures. Moreover, the exact XML source string is preserved, instead of being changed by the encoder. This is why this package exists in the first place. 72 73 Note that 74 75 - the XML is mainly used in Windows environments, thus the default line ending is in Windows' `CRLF` format. To change the default line ending, see the above sample code (first line). 76 - the case of XML comments nested within XML comments is ***not*** supported. Please avoid them or use any other tools to correct them before using this package. 77 - don't turn on the `nestedTagsInComments` parameter blindly, as the code has become 10+ times more complicated because of it. 78 79 ## Command 80 81 To use it on command line, check out [xmlfmt](https://github.com/AntonioSun/xmlfmt): 82 83 84 ``` 85 $ xmlfmt 86 XML Formatter 87 Version 1.1.0 built on 2021-12-06 88 Copyright (C) 2021, Antonio Sun 89 90 The xmlfmt will format the XML string without rewriting the document 91 92 Options: 93 94 -h, --help display help information 95 -f, --file *The xml file to read from (or stdin) 96 -p, --prefix each element begins on a new line and this prefix 97 -i, --indent[= ] indent string for nested elements 98 -n, --nested nested tags in comments 99 100 $ xmlfmt -f https://pastebin.com/raw/z3euQ5PR 101 102 <root> 103 <this> 104 <is>a 105 </is> 106 <test /> 107 <message> 108 <!-- with comment --> 109 <org> 110 <cn>Some org-or-other 111 </cn> 112 <ph>Wouldnt you like to know 113 </ph> 114 </org> 115 <contact> 116 <fn>Pat 117 </fn> 118 <ln>Califia 119 </ln> 120 </contact> 121 </message> 122 </this> 123 </root> 124 125 $ xmlfmt -f https://pastebin.com/raw/Zs0qy0qz -n 126 127 <book> 128 <author>Fred 129 </author> 130 <!-- <price>20</price><currency>USD</currency> --> 131 <isbn>23456 132 </isbn> 133 </book> 134 ``` 135 136 137 ## Justification 138 139 ### The format 140 141 The Go XML Formatter is not called XML Beautifier because the result is not *exactly* as what people would expect -- some, but not all, closing tags stays on the same line, just as shown above. Having been looking at the result and thinking over it, I now think it is actually a better way to present it, as those closing tags on the same line are better stay that way in my opinion. I.e., 142 143 When it comes to very big XML strings, which is what I’m dealing every day, saving spaces by not allowing those closing tags taking extra lines is plus instead of negative to me. 144 145 ### The alternative 146 147 To format it “properly”, i.e., as what people would normally see, is very hard using pure regular expression. In fact, according to Sam Whited from the go-nuts mlist, 148 149 > Regular expression is, well, regular. This means that they can parse regular grammars, but can't parse context free grammars (like XML). It is actually impossible to use a regex to do this task; it will always be fragile, unfortunately. 150 151 So if the output format is so important to you, then unfortunately you have to go through decoding and encoding procedures. But there are some drawbacks as well, as put by James McGill, in http://stackoverflow.com/questions/21117161, besides such method being slow: 152 153 > I like this solution, but am still in search of a Golang XML formatter/prettyprinter that doesn't rewrite the document (other than formatting whitespace). Marshalling or using the Encoder will change namespace declarations. 154 > 155 > For example an element like "< ns1:Element />" will be translated to something like '< Element xmlns="http://bla...bla/ns1" >< /Element >' which seems harmless enough except when the intent is to not alter the xml other than formatting. -- James McGill Nov 12 '15 156 157 Using Sam's code as an example, 158 159 https://play.golang.org/p/JUqQY3WpW5 160 161 The above code formats the following XML 162 163 ```xml 164 <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" 165 xmlns:ns="http://example.com/ns"> 166 <soapenv:Header/> 167 <soapenv:Body> 168 <ns:request> 169 <ns:customer> 170 <ns:id>123</ns:id> 171 <ns:name type="NCHZ">John Brown</ns:name> 172 </ns:customer> 173 </ns:request> 174 </soapenv:Body> 175 </soapenv:Envelope> 176 ``` 177 178 into this: 179 180 ```xml 181 <Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:_xmlns="xmlns" _xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" _xmlns:ns="http://example.com/ns"> 182 <Header xmlns="http://schemas.xmlsoap.org/soap/envelope/"></Header> 183 <Body xmlns="http://schemas.xmlsoap.org/soap/envelope/"> 184 <request xmlns="http://example.com/ns"> 185 <customer xmlns="http://example.com/ns"> 186 <id xmlns="http://example.com/ns">123</id> 187 <name xmlns="http://example.com/ns" type="NCHZ">John Brown</name> 188 </customer> 189 </request> 190 </Body> 191 </Envelope> 192 ``` 193 194 I know they are syntactically the same, however the problem is that they *look* totally different. 195 196 That's why there is this package, an XML Beautifier that doesn't rewrite the document. 197 198 ## Credit 199 200 The credit goes to **diotalevi** from his post at http://www.perlmonks.org/?node_id=261292. 201 202 However, it does not work for all cases. For example, 203 204 ```sh 205 $ echo '<Envelope xmlns=http://schemas.xmlsoap.org/soap/envelope/ xmlns:_xmlns=xmlns _xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ _xmlns:ns=http://example.com/ns><Header xmlns=http://schemas.xmlsoap.org/soap/envelope/></Header><Body xmlns=http://schemas.xmlsoap.org/soap/envelope/><request xmlns=http://example.com/ns><customer xmlns=http://example.com/ns><id xmlns=http://example.com/ns>123</id><name xmlns=http://example.com/ns type=NCHZ>John Brown</name></customer></request></Body></Envelope>' | perl -pe 's/(?<=>)\s+(?=<)//g; s(<(/?)([^/>]+)(/?)>\s*(?=(</?))?)($indent+=$3?0:$1?-1:1;"<$1$2$3>".($1&&($4 eq"</")?"\n".(" "x$indent):$4?"\n".(" "x$indent):""))ge' 206 ``` 207 ```xml 208 <Envelope xmlns=http://schemas.xmlsoap.org/soap/envelope/ xmlns:_xmlns=xmlns _xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ _xmlns:ns=http://example.com/ns><Header xmlns=http://schemas.xmlsoap.org/soap/envelope/></Header> 209 <Body xmlns=http://schemas.xmlsoap.org/soap/envelope/><request xmlns=http://example.com/ns><customer xmlns=http://example.com/ns><id xmlns=http://example.com/ns>123</id> 210 <name xmlns=http://example.com/ns type=NCHZ>John Brown</name> 211 </customer> 212 </request> 213 </Body> 214 </Envelope> 215 ``` 216 217 I simplified the algorithm, and now it should work for all cases: 218 219 ```sh 220 echo '<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:_xmlns="xmlns" _xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" _xmlns:ns="http://example.com/ns"><Header xmlns="http://schemas.xmlsoap.org/soap/envelope/"></Header><Body xmlns="http://schemas.xmlsoap.org/soap/envelope/"><request xmlns="http://example.com/ns"><customer xmlns="http://example.com/ns"><id xmlns="http://example.com/ns">123</id><name xmlns="http://example.com/ns" type="NCHZ">John Brown</name></customer></request></Body></Envelope>' | perl -pe 's/(?<=>)\s+(?=<)//g; s(<(/?)([^>]+)(/?)>)($indent+=$3?0:$1?-1:1;"<$1$2$3>"."\n".(" "x$indent))ge' 221 ``` 222 ```xml 223 <Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:_xmlns="xmlns" _xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" _xmlns:ns="http://example.com/ns"> 224 <Header xmlns="http://schemas.xmlsoap.org/soap/envelope/"> 225 </Header> 226 <Body xmlns="http://schemas.xmlsoap.org/soap/envelope/"> 227 <request xmlns="http://example.com/ns"> 228 <customer xmlns="http://example.com/ns"> 229 <id xmlns="http://example.com/ns"> 230 123</id> 231 <name xmlns="http://example.com/ns" type="NCHZ"> 232 John Brown</name> 233 </customer> 234 </request> 235 </Body> 236 </Envelope> 237 ``` 238 239 This package is a direct translate from above Perl code into Go, 240 then further enhanced by @ruandao.