These notes form the basis for an as-yet-unwritten "String Datatypes" UG section.
H5Pset_char_encoding (objects), 
              H5Tset_cset (datatypes)
          H5Tvlen_create creates something quite different
    
Creating variable-length string datatypes
    
A heavily revised version of this section 
(see _topic/create_vlen_strings.htm) 
is included via PHP on the H5T RM page.
    As the term implies, 
    variable-length strings are strings of varying lengths.
    Real variable-length strings can be arbitrarily long,
    anywhere from 1 character to thousands of characters long.
    These are what HDF5 calls variable-length strings
    and, for the sake of discussion, we'll call them
    unconstrained variable-length strings in this article.
    
But there is also a subclass of variable-length strings that vary within a well-defined range. For example, a set of strings might be known to always be between 5 and 20 characters long. In this article, we will call this subclass constrained variable-length strings. From HDF5’s point of view, these are actually just fixed-length strings that may happen to be shorter in length than the assigned datatype. Think of them as faux variable-length strings; we'll discuss them in more detail shortly.
Before we start creating strings, let’s look at string and character datatypes for a minute. HDF5 provides the following predefined datatypes that are relevant to this discussion, one string datatype and three character datatypes:
    H5T_C_S1
    H5T_NATIVE_CHAR
    H5T_NATIVE_SCHAR
    H5T_NATIVE_UCHAR
    
    The character datatypes, 
        H5T_NATIVE_CHAR,
        H5T_NATIVE_SCHAR, and 
        H5T_NATIVE_UCHAR, 
    are single-character datatypes; 
    a data element of one of these datatypes always contains one character. 
    They are unsuitable for creating a string datatype.
    
    The string datatype, 
        H5T_C_S1 for C and 
        H5T_FORTRAN_S1 for Fortran,
    defaults to one character in size but can be resized to any length. 
    These types are therefore the base type for any fixed-length 
    or variable-length string datatype.
    
    
    Creating unconstrained 
    (or real) variable-length string datatypes:
    
    The following HDF5 call creates a variable-length string datatype, 
    vls_type_id:
    
    vls_type_id = H5Tcreate(H5T_C_S1, H5T_VARIABLE)                 (call 1)
    
    Strings of type vls_type_id can be of arbitrary length.
    In a C environment, these strings will always be NULL-terminated, so the buffer to hold such a string in memory must be one byte larger than the string itself to accomadate the NULL terminator.
Under the covers, variable-length strings are stored in a heap, which can present challenges for efficient storage and read/write access.
The next section discusses a different approach which may be useful in situations where it is known that the string length in a dataset will vary within known bounds.
    
    Creating datatypes for constrained 
    (or faux) variable-length strings:
    
    To avoid the storage and I/O overhead associated with heaps,
    it will sometimes be useful to take a different approach when 
    it is known that the string length in a dataset 
    will always fall within known bounds.
    
Consider the example of a dataset containing one million strings that you know will range from 5 to 20 bytes in length. The following HDF5 call creates a string datatype for strings up to 20 bytes.
    to20B_type_id = H5Tcreate(H5T_C_S1, 20)                         (call 2)
    
    If a particular data element is just a 5-byte string, 
    simply write it to the dataset as a 5-byte string plus a 
    NULL terminator (6 bytes total).
    When HDF5 reads the data back in a C environment 
    and as it works with the data, HDF5 will interpret the 
    NULL-terminated string transparently and correctly.
    Note that variable-length strings stored in this manner must always be NULL-terminated unless they exactly fill the full datatype space (exactly 20 bytes in this case). Failure to include the NULL-terminator will result in either misinterpreted data or undefined values.
Strings in this dataset can be of any length up to 20 bytes, giving you essentially a constrained variable-length string. But since everything is handled within a fixed-length datatype, you receive all the benefits of HDF5’s highly efficient sequential I/O without the overhead of extracting data from a heap.
If this datatype were defined as in call 1 and the million-element dataset were fully populated, reading the entire dataset would require HDF5, under the covers, to issue up to 2 million seeks and reads to pluck the data elements 1-by-1 from the heap. Using this faux variable-length datatype, HDF5 can read the entire dataset with a couple of seeks and reads.
Note that this dataset can also be chunked, an option that is not available in a heap and is thus unavailable for a dataset of unconstrained variable-length strings.
    
    Creating fixed-length string datatypes:
    
    Relative to any form of variable-length string datatype,
    fixed-length string datatypes are straight-forward.
    The following HDF5 call creates a a fixed-length, 30-byte 
    string datatype: 
    
    20B_type_id = H5Tcreate(H5T_C_S1, 30)
    
    This datatype can be used for 30-character ASCII strings 
    without any need for NULL terminators
    or any other special handling. 
    [ Consider a note regarding the accommodations necessary to handle fixed-length UTF-8 strings. ]
H5Tvlen_create 
    does not create variable-length strings
    H5Tvlen_create,
    that function actually creates a fundamentally different datatype object.
    
    H5Tvlen_create creates a datatype that is a 
    one-dimensional array datatype with array elements of the base datatype.
    Consider the following examples: 
    
    vl_char_type_id       = H5Tvlen_create(H5T_NATIVE_CHAR) 
    This call creates a datatype that holds a variable-size, 
    one-dimensional array of data elements; 
    each element is of the H5T_NATIVE_CHAR base datatype.
    
    12B_string_type_id    = H5Tset_size(H5T_C_S1, 12)
    vl_12B_string_type_id = H5Tvlen_create(12B_string) 
    This pair of calls creates a datatype that holds a variable-size, 
    one-dimensional array of 12-byte strings.
    
    vl_int8_type_id       = H5Tvlen_create(H5T_IEEE_F32BE) 
    The above call creates a datatype that holds a variable-size, 
    one-dimensional array of IEEE big-endian 32-bit floats.