From BlackCompanyCoding
Jump to: navigation, search

Use of strings

Strings are, in general, an order of magnitude more expensive to process than integral values (enums, IDs, etc.). As such, they should never be used in high-performance code. Values which are only ever passed around inside the code should always be enums rather than strings.

That being said, when processing data from assets and/or scripting languages, it is often advantageous to use strings as identifiers. A script command
is far more readable than
where you have to go off and look up a table or spreadsheet somewhere else to find out exactly which is menu number 4.

As a rough rule of thumb, there should be no more than a few hundred string operations taking place each frame, and they should be limited to assignment and comparison (i.e. no building of strings from component strings on the fly). In asset loading, as much string processing as is feasible should be taken care of at the pre-compilation stage (i.e. resolve string references into direct references as soon as possible).


Hashed Strings

If the only operations being performed on strings are assignment and comparisons, then the operation can be made equivalent to an integer comparison by using a hashed string. This takes the original string and uses a hashing algorithm to turn it into a 32-bit integer. The algorithm has a sufficiently small probability of producing the same hash for different strings that they can be compared and determined to be equal without needing to store or check the entire string. For debugging purposes, a portion of the original string can be stored with the hash to allow visual identification of the original string without needing to look up the hash in any kind of table.

The major limitation of a hashed string is that given only the hashed version, it is impossible to reconstruct the original string.

Personal tools