SafeStrings: Representing Strings as Structured Data
Strings are ubiquitous in code. Not all strings are created equal, some contain structure that makes them incompatible with other strings. CSS units are an obvious example. Worse, type checkers cannot see this structure: this is the latent structure problem. We introduce SafeStrings to solve this problem and expose latent structure in strings. Once visible, operations can leverage this structure to efficiently manipulate it; further, SafeStrings permit the establishment of closure properties. SafeStringsharness the subtyping and inheritance mechanics of their host language to create a natural hierarchy of string subtypes. SafeStrings define an elegant programming model over strings: the front end use of a SafeString is clear and uncluttered, with complexity confined inside the definition of a particular SafeString. They are lightweight, language-agnostic and deployable, as we demonstrate by implementing SafeStrings in TypeScript. SafeStrings reduce the surface area for cross-site scripting, argument selection defects, and they can facilitate fuzzing and analysis.
READ FULL TEXT