unicode-hax

A library to assist in security-testing Unicode enabled applications. The original intent of putting this together was threefold:

To provide a reduced set of useful Unicode input to a software fuzzer
To document historically problematic Unicode characters sequences which might negatively affect protocols and Web applications.
To lookup mappings for ASCII equivalent characters

For example, the best-fit and normalization mappings can be useful for testing Web applications for cross-site scripting (XSS) or SQL injection (SQLi) vulnerabilities, by providing you with alternative characters which map back, or transform, to the intended ASCII encoded input - such as "<", "'", etc.

Additionally, many problem characters have been pre-defined as a small set, reducing the number of iterations a fuzzer might need to perform.

Major features:

best fit mappings
Unicode normalization mappings
hard-coded Unicode characters useful in fuzzing

For fuzzing applications it includes:

ill-formed byte sequences
non-characters
private use area (PUA)
unassigned code points
code points with special meaning such as the BOM and RLO
half-surrogate values

/TestUniHax

This Windows form application loads the UniHax library mainly to test the best-fit and normalization mappings.
If you simply input a single ASCII character, all of its equivalent characters will be displayed.

e.g. If you're testing a Web-application and want to test equivalents for the "<" character U+003C, enter that as input and select either "best-fit mapping", which is linked to a charset encoding, or "normalization" equivalents. For this character, the following are best-fits:

U+003B in the APL-ISO-IR-68 encoding
U+0014 in the CP424 encoding
etc...

Also, the following are normalization decomposition mappings:

U+FE64 SMALL LESS-THAN SIGN
U+FF1C FULLWIDTH LESS-THAN SIGN

/UniHax

This library contains a small set of problematic Unicode characters in Fuzzer.cs such as the following:

        /// <summary>
        /// An unassigned code point U+0FED
        /// </summary>
        public static readonly string uUnassigned = "\u0FED";
        /// <summary>
        ///  An illegal low half-surrogate U+DEAD
        /// </summary>
        public static readonly string uDEAD = "\uDEAD";

Also the following method to return those characters as a byte array in any encoding.

public byte[] GetCharacterBytes(string encoding, string character)

There's also the following method to return any Unicode character as a malformed byte sequence, simply by trimming the last byte.

public byte[] GetCharacterBytesMalformed(string encoding, string character)

This project also contains the data files, pre-created in the /data folder, and a Mapping.cs Mapping class which can lookup mapping equivalents for the following:

ASCII equivalent best-fit mappings across legacy character encodings
ASCII equivalent mappings for Unicode normalization types. For example, Web browsers commonly use a form of normalization for keeping URL content and host names compatible.

For more on Unicode Normalization see TR15: http://www.unicode.org/reports/tr15/

License

Unicode-Hax by Chris Weber is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License . Based on a work at https://github.com/cweb/unicode-hax.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
TestUniHax		TestUniHax
UniHax		UniHax
.gitignore		.gitignore
LICENSE.html		LICENSE.html
README.md		README.md
UniHax.sln		UniHax.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestUniHax

TestUniHax

UniHax

UniHax

.gitignore

.gitignore

LICENSE.html

LICENSE.html

README.md

README.md

UniHax.sln

UniHax.sln

Repository files navigation

unicode-hax

/TestUniHax

/UniHax

License

About

Releases

Packages

Languages

License

cweb/unicode-hax

Folders and files

Latest commit

History

Repository files navigation

unicode-hax

/TestUniHax

/UniHax

License

About

Resources

License

Stars

Watchers

Forks

Languages