gtsocial-umbx

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | README | LICENSE

README.md (9458B)


      1 # Go XML Formatter
      2 
      3 [![MIT License](http://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
      4 [![Go Doc](https://img.shields.io/badge/godoc-reference-4b68a3.svg)](https://godoc.org/github.com/go-xmlfmt/xmlfmt)
      5 [![Go Report Card](https://goreportcard.com/badge/github.com/go-xmlfmt/xmlfmt)](https://goreportcard.com/report/github.com/go-xmlfmt/xmlfmt)
      6 [![Codeship Status](https://codeship.com/projects/c49f02b0-a384-0134-fb20-2e0351080565/status?branch=master)](https://codeship.com/projects/190297)
      7 
      8 ## Synopsis
      9 
     10 The Go XML Formatter, xmlfmt, will format the XML string in a readable way. 
     11 
     12 ```go
     13 package main
     14 
     15 import "github.com/go-xmlfmt/xmlfmt"
     16 
     17 func main() {
     18 	xmlfmt.NL = "\n"
     19 	xml1 := `<root><this><is>a</is><test /><message><!-- with comment --><org><cn>Some org-or-other</cn><ph>Wouldnt you like to know</ph></org><contact><fn>Pat</fn><ln>Califia</ln></contact></message></this></root>`
     20 	x := xmlfmt.FormatXML(xml1, "\t", "  ")
     21 	print(x)
     22 
     23 	// If the XML Comments have nested tags in them
     24 	xml1 = `<book> <author>Fred</author>
     25 <!--
     26 <price>20</price><currency>USD</currency>
     27 -->
     28  <isbn>23456</isbn> </book>`
     29 	x = xmlfmt.FormatXML(xml1, "", "  ", true)
     30 	print(x)
     31 }
     32 
     33 ```
     34 
     35 Output:
     36 
     37 ```xml
     38 	<root>
     39 	  <this>
     40 	    <is>a
     41 	    </is>
     42 	    <test />
     43 	    <message>
     44 	      <!-- with comment -->
     45 	      <org>
     46 	        <cn>Some org-or-other
     47 	        </cn>
     48 	        <ph>Wouldnt you like to know
     49 	        </ph>
     50 	      </org>
     51 	      <contact>
     52 	        <fn>Pat
     53 	        </fn>
     54 	        <ln>Califia
     55 	        </ln>
     56 	      </contact>
     57 	    </message>
     58 	  </this>
     59 	</root>
     60 
     61 
     62 <book>
     63   <author>Fred
     64   </author>
     65   <!-- <price>20</price><currency>USD</currency> -->
     66   <isbn>23456
     67   </isbn>
     68 </book>
     69 ```
     70 
     71 There is no XML decoding and encoding involved, only pure regular expression matching and replacing. So it is much faster than going through decoding and encoding procedures. Moreover, the exact XML source string is preserved, instead of being changed by the encoder. This is why this package exists in the first place. 
     72 
     73 Note that 
     74 
     75 - the XML is mainly used in Windows environments, thus the default line ending is in Windows' `CRLF` format. To change the default line ending, see the above sample code (first line).
     76 - the case of XML comments nested within XML comments is ***not*** supported. Please avoid them or use any other tools to correct them before using this package.
     77 - don't turn on the `nestedTagsInComments` parameter blindly, as the code has become 10+ times more complicated because of it.
     78 
     79 ## Command
     80 
     81 To use it on command line, check out [xmlfmt](https://github.com/AntonioSun/xmlfmt):
     82 
     83 
     84 ```
     85 $ xmlfmt 
     86 XML Formatter
     87 Version 1.1.0 built on 2021-12-06
     88 Copyright (C) 2021, Antonio Sun
     89 
     90 The xmlfmt will format the XML string without rewriting the document
     91 
     92 Options:
     93 
     94   -h, --help          display help information
     95   -f, --file         *The xml file to read from (or stdin)
     96   -p, --prefix        each element begins on a new line and this prefix
     97   -i, --indent[=  ]   indent string for nested elements
     98   -n, --nested        nested tags in comments
     99 
    100 $ xmlfmt -f https://pastebin.com/raw/z3euQ5PR
    101 
    102 <root>
    103   <this>
    104     <is>a
    105     </is>
    106     <test />
    107     <message>
    108       <!-- with comment -->
    109       <org>
    110         <cn>Some org-or-other
    111         </cn>
    112         <ph>Wouldnt you like to know
    113         </ph>
    114       </org>
    115       <contact>
    116         <fn>Pat
    117         </fn>
    118         <ln>Califia
    119         </ln>
    120       </contact>
    121     </message>
    122   </this>
    123 </root>
    124 
    125 $ xmlfmt -f https://pastebin.com/raw/Zs0qy0qz -n
    126 
    127 <book>
    128   <author>Fred
    129   </author>
    130   <!-- <price>20</price><currency>USD</currency> -->
    131   <isbn>23456
    132   </isbn>
    133 </book>
    134 ```
    135 
    136 
    137 ## Justification
    138 
    139 ### The format
    140 
    141 The Go XML Formatter is not called XML Beautifier because the result is not *exactly* as what people would expect -- some, but not all, closing tags stays on the same line, just as shown above. Having been looking at the result and thinking over it, I now think it is actually a better way to present it, as those closing tags on the same line are better stay that way in my opinion. I.e., 
    142 
    143 When it comes to very big XML strings, which is what I’m dealing every day, saving spaces by not allowing those closing tags taking extra lines is plus instead of negative to me. 
    144 
    145 ### The alternative
    146 
    147 To format it “properly”, i.e., as what people would normally see, is very hard using pure regular expression. In fact, according to Sam Whited from the go-nuts mlist, 
    148 
    149 > Regular expression is, well, regular. This means that they can parse regular grammars, but can't parse context free grammars (like XML). It is actually impossible to use a regex to do this task; it will always be fragile, unfortunately.
    150 
    151 So if the output format is so important to you, then unfortunately you have to go through decoding and encoding procedures. But there are some drawbacks as well, as put by James McGill, in http://stackoverflow.com/questions/21117161, besides such method being slow:
    152 
    153 > I like this solution, but am still in search of a Golang XML formatter/prettyprinter that doesn't rewrite the document (other than formatting whitespace). Marshalling or using the Encoder will change namespace declarations.
    154 > 
    155 > For example an element like "< ns1:Element />" will be translated to something like '< Element xmlns="http://bla...bla/ns1" >< /Element >' which seems harmless enough except when the intent is to not alter the xml other than formatting. -- James McGill Nov 12 '15
    156 
    157 Using Sam's code as an example, 
    158 
    159 https://play.golang.org/p/JUqQY3WpW5
    160 
    161 The above code formats the following XML
    162 
    163 ```xml
    164 <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    165   xmlns:ns="http://example.com/ns">
    166    <soapenv:Header/>
    167    <soapenv:Body>
    168      <ns:request>
    169       <ns:customer>
    170        <ns:id>123</ns:id>
    171        <ns:name type="NCHZ">John Brown</ns:name>
    172       </ns:customer>
    173      </ns:request>
    174    </soapenv:Body>
    175 </soapenv:Envelope>
    176 ```
    177 
    178 into this:
    179 
    180 ```xml
    181 <Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:_xmlns="xmlns" _xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" _xmlns:ns="http://example.com/ns">
    182  <Header xmlns="http://schemas.xmlsoap.org/soap/envelope/"></Header>
    183  <Body xmlns="http://schemas.xmlsoap.org/soap/envelope/">
    184   <request xmlns="http://example.com/ns">
    185    <customer xmlns="http://example.com/ns">
    186     <id xmlns="http://example.com/ns">123</id>
    187     <name xmlns="http://example.com/ns" type="NCHZ">John Brown</name>
    188    </customer>
    189   </request>
    190  </Body>
    191 </Envelope>
    192 ```
    193 
    194 I know they are syntactically the same, however the problem is that they *look* totally different.
    195 
    196 That's why there is this package, an XML Beautifier that doesn't rewrite the document. 
    197 
    198 ## Credit
    199 
    200 The credit goes to **diotalevi** from his post at http://www.perlmonks.org/?node_id=261292.
    201 
    202 However, it does not work for all cases. For example,
    203 
    204 ```sh
    205 $ echo '<Envelope xmlns=http://schemas.xmlsoap.org/soap/envelope/ xmlns:_xmlns=xmlns _xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ _xmlns:ns=http://example.com/ns><Header xmlns=http://schemas.xmlsoap.org/soap/envelope/></Header><Body xmlns=http://schemas.xmlsoap.org/soap/envelope/><request xmlns=http://example.com/ns><customer xmlns=http://example.com/ns><id xmlns=http://example.com/ns>123</id><name xmlns=http://example.com/ns type=NCHZ>John Brown</name></customer></request></Body></Envelope>' | perl -pe 's/(?<=>)\s+(?=<)//g; s(<(/?)([^/>]+)(/?)>\s*(?=(</?))?)($indent+=$3?0:$1?-1:1;"<$1$2$3>".($1&&($4 eq"</")?"\n".("  "x$indent):$4?"\n".("  "x$indent):""))ge'
    206 ```
    207 ```xml
    208 <Envelope xmlns=http://schemas.xmlsoap.org/soap/envelope/ xmlns:_xmlns=xmlns _xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/ _xmlns:ns=http://example.com/ns><Header xmlns=http://schemas.xmlsoap.org/soap/envelope/></Header>
    209 <Body xmlns=http://schemas.xmlsoap.org/soap/envelope/><request xmlns=http://example.com/ns><customer xmlns=http://example.com/ns><id xmlns=http://example.com/ns>123</id>
    210 <name xmlns=http://example.com/ns type=NCHZ>John Brown</name>
    211 </customer>
    212 </request>
    213 </Body>
    214 </Envelope>
    215 ```
    216 
    217 I simplified the algorithm, and now it should work for all cases:
    218 
    219 ```sh
    220 echo '<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:_xmlns="xmlns" _xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" _xmlns:ns="http://example.com/ns"><Header xmlns="http://schemas.xmlsoap.org/soap/envelope/"></Header><Body xmlns="http://schemas.xmlsoap.org/soap/envelope/"><request xmlns="http://example.com/ns"><customer xmlns="http://example.com/ns"><id xmlns="http://example.com/ns">123</id><name xmlns="http://example.com/ns" type="NCHZ">John Brown</name></customer></request></Body></Envelope>' | perl -pe 's/(?<=>)\s+(?=<)//g; s(<(/?)([^>]+)(/?)>)($indent+=$3?0:$1?-1:1;"<$1$2$3>"."\n".("  "x$indent))ge'
    221 ```
    222 ```xml
    223 <Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/" xmlns:_xmlns="xmlns" _xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" _xmlns:ns="http://example.com/ns">
    224   <Header xmlns="http://schemas.xmlsoap.org/soap/envelope/">
    225     </Header>
    226   <Body xmlns="http://schemas.xmlsoap.org/soap/envelope/">
    227     <request xmlns="http://example.com/ns">
    228       <customer xmlns="http://example.com/ns">
    229         <id xmlns="http://example.com/ns">
    230           123</id>
    231         <name xmlns="http://example.com/ns" type="NCHZ">
    232           John Brown</name>
    233         </customer>
    234       </request>
    235     </Body>
    236   </Envelope>
    237 ```
    238 
    239 This package is a direct translate from above Perl code into Go,
    240 then further enhanced by @ruandao.