Kv Format 1.0 Specification

kvformat.org
Version
1.0 RC1
Published
Status
Under Review
Planned Stable
2026-06-15
Editor
Bojan Đuričković
Email
[email protected]

1 Introduction

Kv (pronounced: cave) Format's driving idea is radical simplicity— to use a regular grammar (simpler than the context-free grammars of JSON and TOML). It builds upon .env file conventions while imposing constraints necessary for regular language properties. Kv Format 1.0 is thus a standardized minimal common denominator to .env files used in practice, intended to provide a foundation for extensions in 1.x versions.

The goal of Kv Format is to provide:

  1. a regular grammar, parsable with deterministic finite automata where each character is examined exactly once, with:
    • a minimal syntax with no escape mechanisms, no multiline values, and no interpolation,
    • a compact syntax with no structural whitespace, and no significant quotes or brackets,
  2. a streaming-first design, with entry stream enabling CSP/channel-based processing,
  3. application flexibility, supporting different duplicate key handling strategies, and optional comment preservation.

This design trades expressiveness (escaping, nesting, interpolation, multiline values) in exchange for single-pass parsing, deterministic processing, and straightforward implementation.

To avoid confusion with incompatible .env dialects, Kv Format files should use the .kv extension.

1.1 Scope

This specification defines:

This specification does not define:

In Kv Format terminology, the parser processes Kv text to emit entries. The parser's responsibility ends with entry emission; application logic that consumes entries is outside this specification's scope. Thus, a parser in Kv Format terminology has a narrower meaning than typical: it is a token producer that yields entries for subsequent processing, without constructing concrete or abstract syntax trees.1

The entry stream abstraction separates parsing (syntax recognition) from processing (semantic interpretation). This enables flexible integration into different application architectures while maintaining a clear boundary between the parser's responsibilities and the application's domain logic.

1.2 Specification Structure

This specification is structured for stable evolution across 1.x versions:

The Technicalia provide the authoritative interpretation of the Normative Requirements. Implementations that conform to the Technicalia necessarily satisfy the Normative Requirements, and in case of ambiguity, the Technicalia has precedence.

2 Terminology

The all-caps key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [rfc2119].

Unicode code points in the U+XXXX form are defined in the Unicode Standard [unicode].

2.1 Containers

2.2 Atomic Building Blocks

2.3 Text Elements

A Kv text is composed of different types of lines:

Data lines and comment lines are collectively termed content lines.

2.4 Entries

The parser emits an output of parsed entries, which come in several varieties:

2.5 Whitespace

In this specification, whitespace refers exclusively to:

In Kv Format syntax, whitespace characters are generally treated like any other characters, except that they are tolerated as indentation and as optional spacing between keys and assignment operators.

2.6 Line and EOL

EOL (End of Line) is a line terminator, either LF (POSIX-style line feed) or CRLF (Windows-style line ending). Kv Format tolerates intermixing of LF and CRLF within the same Kv text.

In this specification, a line refers to the sequence of characters from the start of the line up to and including the line terminator (EOL).

2.7 Parser

A parser is a process that reads Kv text and emits an entry stream according to this specification.

3 Syntax

This section represents a descriptive definition of the Kv Format syntax. The formal grammar presented in Appendix A is the normative definition, and has precedence in case of any ambiguity.

3.1 Data Line

A key-value data line has the basic form (BNF):

key OP value EOL

where:

For example:

host=kvformat.org

Keys and assignment operators may be indented for presentation clarity. For example:

database =db1
port     =5432
user     =admin

Note that any whitespace characters after the assignment operator = would not be considered optional indentation, but part of the value.

The general ABNF form of a data line is therefore

*WSP key *WSP OP value EOL

where *WSP represents zero or more whitespace characters.

3.2 Comment Line

A comment line has the basic form (BNF):

"#" comment-text EOL

where # is the NUMBER SIGN character (%x23).

For example:

# This is a comment

As with data lines, comments may be indented. The following lines are all syntactically equivalent:

# comment
 # comment
      # comment

This does not hold for whitespace characters after the # character, which are considered part of the comment text. Thus, the following lines are all different:

#comment
# comment
#    comment

General ABNF form of a comment line:

*WSP "#" comment-text EOL

3.3 Blank Line

A blank line is an empty line or a line containing only whitespace characters:

*WSP EOL

Blank lines are used for visual separation without semantic meaning.

3.4 EOL

EOLs are line terminators.

EOL = LF / CRLF

where:

Every line in a Kv text MUST be terminated with an EOL (either LF or CRLF), including the final line. An EOL is considered part of the line (see Section 2.6).

3.5 Shebang Line (File-Level)

Shebang lines [execveman] are file-level directives, not part of Kv text syntax. Shebangs MUST NOT be treated as comments. See Section 6.4 for details.

4 Parsing

A parser processes a Kv text line by line, emitting a stream of entries.

4.1 Line Numbers

Parsers MUST keep track of line numbers.

4.2 Entry Structure

Each line is parsed into an entry that contains:

Entry payloads contain the line's semantic content; the EOL terminator is consumed during parsing and is not part of any entry payload.

The method of representing entries (tuples, structs, objects, etc.) is implementation-defined.

Keys in KV entries MUST NOT contain any whitespace characters. Any leading or trailing whitespace in a key MUST be removed by the parser, while any whitespace inside the key represents an INVALID_KEY_ERROR (Appendix B.6).

4.3 Entry Stream

Parsers emit an entry stream — a unidirectional flow of entries derived from lines in the Kv text.

Shebang handling: When parsing a Kv file that begins with a shebang line (see Section 6.4), the parser MAY emit a shebang entry for the shebang line as line number zero (Section 4.1). Shebang entries MUST preserve all characters of the shebang line verbatim except the EOL terminator, including the #! prefix. Shebang entries MUST NOT be emitted for any line beyond the first.

4.4 General Parsing Requirements

  1. Parsers MUST parse texts matching the grammar defined in Appendix A.
  2. Parsers MUST detect all error conditions in Appendix B.
  3. Parsers MUST NOT handle duplicate key detection or processing. Duplicate key handling is an application-level decision.
  4. The emitted entries MUST conform to the entry structure (Section 4.2) and entry stream requirements (Section 4.3).
  5. Parsers MAY emit entries in any order. Implementations MUST document their emission order.

4.5 Whitespace Handling

  1. Parsers MUST remove leading whitespace (any whitespace between the start of the line and the first non-whitespace character) in all lines.
  2. Parsers MUST remove whitespace between keys and assignment operators.
  3. Parsers MUST NOT remove any whitespace characters after the assignment operator.

4.6 Value Preservation

Parsers MUST treat all Kv text values as strings, and MUST preserve values verbatim. Parsers MUST NOT interpret any characters (including but not limited to quotes, backslashes, dollar signs, percent signs, etc.) as having special meaning or initiating escape sequences. Specifically:

Values MUST NOT contain EOL characters. All other characters, including any leading or trailing whitespace characters (see Section 4.5), MUST be preserved verbatim.

Empty value parsing: When an assignment operator is immediately followed by EOL, parsers MUST emit the value as an empty string.

4.7 UTF-8 Processing

Parsers MUST ensure valid UTF-8 encoding. Validation may occur:

BOM handling: Kv Format does not support the UTF-8 Byte Order Mark (BOM). If a Kv text begins with the bytes EF BB BF, parsers MUST raise a BOM_ERROR (Appendix B.1).

4.8 Error Handling

Parsers implement one of two models:

  1. eager parsing: validate entire text before emitting entries,
  2. streaming parsing: emit entries as lines are parsed.

Parsers MUST document which model they implement and their behavior when error occurs:

Lines causing errors SHOULD NOT emit entries.

A comprehensive list of parsing errors is given in Appendix B.

4.9 Implementation Limits

This specification does not mandate maximum limits for:

Implementations MAY impose reasonable limits based on available resources, but SHOULD document these limits and provide clear error messages when limits are exceeded.

5 Semantics

By design, Kv Format relegates most semantic choices to applications consuming the entry stream. This includes decisions such as value whitespace trimming, duplicate key handling, and comment preservation. This section is therefore limited to reiterating some basic principles.

5.1 Values are Verbatim Literals

Kv Format values preserve all characters exactly as written (see Section 4.6). No characters have special meaning: quotes, backslashes, dollar signs, percent signs, etc., are all literal characters.

This design gives applications flexibility to interpret values according to their needs. For example, a configuration loader might trim trailing whitespace, while a round-trip serializer would preserve it exactly.

5.2 Duplicate Key Semantics

Consistent with .env file precedent, Kv Format texts may contain duplicate keys. This follows from the regular grammar: parsers emit entries without maintaining key state.

The entry stream thus contains all instances of repeated keys. Applications consuming the entry stream determine how to handle duplicate keys according to their needs. Common strategies include:

6 File Representation

6.1 File Extension

Kv Format files SHOULD use the .kv file extension.

6.2 Media Type (MIME Type)

When transmitting Kv Format content over protocols that use media types [rfc2046] (such as HTTP [rfc9110] or email [rfc5322]), implementations SHOULD use text/org.kvformat as the Content-Type, with the following optional parameters (following RFC 2045 syntax [rfc2045]):

Example valid HTTP Content-Type headers:

Content-Type: text/org.kvformat
Content-Type: text/org.kvformat; version=1.0
Content-Type: text/org.kvformat; charset=utf-8
Content-Type: text/org.kvformat; version="1.0"; charset="utf-8"

If Kv Format gains sufficient adoption, text/kv may be registered with IANA in the future.

6.3 File Encoding

Kv files MUST be encoded as UTF-8 without a Byte Order Mark (BOM).

6.4 Shebang Lines

A shebang line [execveman] is an optional first line in a Kv file that specifies how to execute the file as a script. Shebang lines typically have the form:

#!/path/to/interpreter [optional argument]

7 Examples

7.1 Valid Examples

  1. Simple examples:

    APP_NAME=My Application
    API_KEY=sk_live_1234567890abcdef
    DEBUG=true
    PATH=/usr/local/bin:/usr/bin
  2. Empty value (value is an empty string):

    EMPTY=
  3. Leading and trailing space in values preserved (values are: «foo·» and «·bar»)3

    trailing=foo·
    leading=·bar
  4. No inline comments (value is «1 # default value»)

    NOT_AS_INTENDED=1 # default value
  5. Quotes are characters like any other (value is «"hello"»)

    quoted="hello"
  6. Backslashes are literals, never used as escapes (value is «\n» — two characters: a backslash and the letter "n")

    BACKSLASH=\n
  7. No interpolation (values are «/usr/bin» and «$PATH:/usr/var/bin»)

    PATH=/usr/bin
    PATH=$PATH:/usr/var/bin
  8. Empty comment (comment text is an empty string)

    #
  9. Comment consisting of a space (comment text is «·»)

  10. Comment with leading and trailing spaces (comment text is «··comment···»)

    #··comment···
  11. Duplicate keys are valid in the parser context (see Section 5.2); this results in two emitted entries:

    KEY=1
    KEY=2

7.2 Invalid Examples

  1. Invalid key: dashes not allowed (INVALID_KEY_ERROR)

    KEY-NAME=value
  2. Invalid key: cannot start with a digit (INVALID_KEY_ERROR)

    123KEY=value
  3. Invalid key: cannot contain period (INVALID_KEY_ERROR)

    KEY.NAME=value
  4. Missing assignment operator = (MISSING_OPERATOR_ERROR)

    KEY value
  5. Empty key (EMPTY_KEY_ERROR)

    =value

7.3 Edge Cases

  1. An empty text (0 bytes) is a valid Kv text and yields no entries.
  2. Comments, not data lines

    #=foo
    #key=value
  3. Valid data line with value «=foo».

    key==foo
  4. Valid data line with an empty string for a value

    key=
  5. Valid data line with a leading space in the value: «·abc»

    key = abc
  6. Error — space is not an operator (MISSING_OPERATOR_ERROR)

    key value
  7. Error — colon is not an operator (MISSING_OPERATOR_ERROR)

    key:value
  8. Error — colon is not a valid key character (INVALID_KEY_ERROR)

    key:=value
  9. Error — space is not a valid key character (INVALID_KEY_ERROR)

    foo bar=value

8 Versioning

Kv Format 1.x versions are not sequential updates in the traditional sense. All 1.x format definitions were designed and specified concurrently, then organized into a layered specification collection with progressive feature inclusion. This approach reflects the fundamental trade-off between feature richness and implementation size.

8.1 Layered Feature Progression

The Kv Format 1.x versions provide increasing functionality:

Each version includes all features of previous versions. The complete 1.4 specification is not an "update" to 1.0 but a superset specification that adds new syntactic constructs while maintaining backward compatibility.

As a regular language, Kv Format implementations are expected to outperform context-free alternatives (JSON, TOML, YAML) while maintaining significantly smaller parser binary (compiled implementation) footprints. The layered version approach directly addresses resource-constrained systems by enabling implementers to choose the minimal feature set their application requires:

Each successive version adds parsing complexity and increases the minimum implementation size. The version numbers thus serve as size/complexity indicators rather than "latest and greatest" markers.

8.2 Implementation Versioning

To help users select optimal implementations for their constraints, Kv Format mandates an unorthodox implementation versioning scheme:

Examples:

This scheme provides immediate visual identification of format support: the major version 10-14 directly maps to the supported feature set, while preserving semantic versioning semantics for the implementation itself.

This versioning scheme applies to the Kv Format 1.x specification series, which at the time of Kv Format 1.0 publication includes versions 1.0 - 1.4. Any possible future version mapping will be addressed in those versions' specifications.

9 References

[Aho06]
Aho, Alfred V., Lam, Monica S., Sethi, Ravi, and Ullman, Jeffrey D. Compilers: Principles, Techniques, and Tools, 2nd edition. Addison-Wesley, 2006. ISBN 978-0-321-48681-3.
[rfc2045]
Freed, Ned and Borenstein, Nathaniel S. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. RFC 2045, RFC Editor, November 1996. https://www.rfc-editor.org/rfc/rfc2045.html
[rfc2046]
Freed, Ned and Borenstein, Nathaniel S. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. RFC 2046, RFC Editor, November 1996. https://www.rfc-editor.org/rfc/rfc2046.html
[rfc2119]
Bradner, Scott. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119, RFC Editor, March 1997. https://www.rfc-editor.org/rfc/rfc2119.html
[rfc5234]
Crocker, Dave and Overell, Paul. Augmented BNF for Syntax Specifications: ABNF. RFC 5234, RFC Editor, January 2008. https://www.rfc-editor.org/rfc/rfc5234.html
[rfc5322]
Resnick, Pete. Internet Message Format. RFC 5322, RFC Editor, October 2008. https://www.rfc-editor.org/rfc/rfc5322.html
[rfc9110]
Fielding, Roy, Nottingham, Mark, and Reschke, Julian. HTTP Semantics. RFC 9110, RFC Editor, June 2022. https://www.rfc-editor.org/rfc/rfc9110.html
[unicode]
The Unicode Consortium. The Unicode Standard, Version 15.0.0. 2023. https://www.unicode.org/versions/Unicode15.0.0/
[execveman]
Linux/BSD man-pages project. execve(2) — execute a file. 2023. https://man7.org/linux/man-pages/man2/execve.2.html
[semver]
Preston-Werner, Tom. Semantic Versioning 2.0.0. 2013. https://semver.org/

Appendix A: Formal Grammar

The following grammar is the normative definition of Kv Format syntax in ABNF form [rfc5234]. In case of any ambiguities in the descriptive main text of this specification, this grammar has precedence.

kv-text = *kv-line
kv-line = data-line / comment-line / blank-line
data-line = *WSP key *WSP "=" value EOL
comment-line = *WSP "#" comment-text EOL
blank-line = *WSP EOL
key = ( ALPHA / "_" ) *( ALPHA / DIGIT / "_" )
value = *valid-char
comment-text = *valid-char
; Terminals
EOL = LF / CRLF
valid-char = %x01-09 / %x0B-0C / %x0E-10FFFF
    

The rules from RFC 5234 [rfc5234] are used:

Note on valid-char: The valid-char rule excludes NUL (U+0000), LF (U+000A), and CR (U+000D) for parsing reasons. Implementations MUST perform UTF-8 validation separately (Section 4.7). As valid-char includes only valid code points, the range U+D800 to U+DFFF (UTF-16 surrogates) is therefore implicitly excluded.

Note on EOF: This grammar requires every line to be terminated by EOL. Input where the final line lacks an EOL terminator does not match this grammar; parsers MUST raise a MISSING_FINAL_EOL_ERROR (Appendix B.7).

Appendix B: Error Conditions

This appendix classifies the cases of parse failures. If feasible in the implementation environment, in case of parse failures, parsers SHOULD return one of the errors below. If possible, error reports SHOULD include line numbers (Section 4.1).

The errors below are listed in order of priority. In case of multiple errors occurring on any line, parsers MUST report the highest priority error, and MAY report further errors from the same line.

B.1 BOM_ERROR

A Kv text MUST NOT begin with a UTF-8 BOM sequence (EF BB BF) [unicode]. Parsers MUST detect texts starting with the BOM sequence and raise a BOM_ERROR.

Whether parsing proceeds after this error detection follows the implementation's error handling strategy.

B.2 INVALID_UTF8_ERROR

Parsers MUST process input as UTF-8. Upon encountering invalid UTF-8 sequences, parsers MUST raise an INVALID_UTF8_ERROR.

B.3 INVALID_CHARACTER_ERROR

Kv text MUST NOT contain standalone CR characters (U+000D) or NUL characters (U+0000). Parsers MUST raise an INVALID_CHARACTER_ERROR when encountering:

B.4 EMPTY_KEY_ERROR

A key MUST NOT be an empty string. If a data line starts with the assignment operator, parsers MUST raise an EMPTY_KEY_ERROR.

B.5 MISSING_OPERATOR_ERROR

If a data line (i.e., a line that is neither blank nor a comment line) does not contain an assignment operator, parsers MUST raise a MISSING_OPERATOR_ERROR.

B.6 INVALID_KEY_ERROR

All keys in a Kv text MUST:

  1. start with an ASCII letter or an underscore,
  2. contain only ASCII letters, numbers, and underscores.

Otherwise, parsers MUST raise an INVALID_KEY_ERROR.

B.7 MISSING_FINAL_EOL_ERROR

If the final line of a Kv text is missing an EOL terminator, parsers MUST consider the line invalid, and MUST raise a MISSING_FINAL_EOL_ERROR. This error MAY be reported without a line number.

Appendix C: Example Parsing Algorithm

This section provides informative guidance for implementing a parser using a simple scanning approach. The normative requirements are specified in the preceding Sections and Appendices. Implementations MAY use different parsing strategies as long as they conform to the grammar specified in Appendix A, detect all errors defined in Appendix B and produce equivalent results.

A. initialize pos = 0, error_count = 0
B. File-level artifacts
    1. if file starts with UTF-8 BOM (EF BB BF):
        - echo "BOM_ERROR at start of file" > stderr
        - error_count += 1
        - pos += 3  # skip BOM
    2. if file[pos..] starts with "#!":
        - shebang_line, eol = extract first line starting at pos
        - if shebang_line contains invalid UTF-8:
            * echo "INVALID_UTF8_ERROR in line 0 (shebang)" > stderr
            * error_count += 1
        - pos += length(shebang_line) + length(eol)  # skip shebang line
C. Extract Kv text
    - text = file[pos..]
    - initialize i = 0 (byte-level index), line_num = 0
D. while i < length(text):
    1. (string eol, index i_eol) = find next LF or CRLF starting from i
    2. if not found:
        - echo "MISSING_FINAL_EOL_ERROR at end of file" > stderr
        - error_count += 1
        - -> break  # exit loop and go to step E
    3. line = text[i..i_eol]
    4. line_num += 1
    5. parse line:
        a. if line starts with U+0020 or U+0009: # trim leading whitespace:
          - i_nonws = find first non-whitespace character index
          - if not found (all whitespaces): line = ""
          - else: line = line[i_nonws..]
        b. if line == "":  # ignore blank line
          - -> continue to step 6
        c. if line contains invalid UTF-8:
          - echo "INVALID_UTF8_ERROR in $line_num" > stderr
          - error_count += 1
          - -> continue to step 6
        d. if line contains NUL or CR:
          - echo "INVALID_CHARACTER_ERROR in $line_num" > stderr
          - error_count += 1
          - -> continue to step 6
        e. if line starts with '#':
          - yield ("COMMENT_ENTRY", line[1..], line_num)
          - -> continue to step 6
        f. else: parse as data line:
          - i_op = index of first '=' in line
          - if '=' is not found:
            * echo "MISSING_OPERATOR_ERROR in $line_num" > stderr
            * error_count += 1
            * -> continue to step 6
          - key = trim_trailing_whitespace(line[0..i_op])
          - if key == "":
            * echo "EMPTY_KEY_ERROR in $line_num" > stderr
            * error_count += 1
            * -> continue to step 6
          - if key does not match /^[_a-zA-Z][_a-zA-Z0-9]*$/:
            * echo "INVALID_KEY_ERROR in $line_num" > stderr
            * error_count += 1
            * -> continue to step 6
          - value = line[i_op+1..]
          - yield ("KV_ENTRY", key, value, line_num)
    6. i = i_eol + length(eol); # skip LF or CRLF
E. return (error_count > 0 ? 1 : 0)

Notes:



License

This specification is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). A human-readable summary of the license terms is provided below, but the full legal code is the authoritative version.

1 Summary

You are free to:

Under the following terms:

Notices:

2 Attribution Practice

When attributing this specification, please include:

Example attribution:

Kv Format 1.0 Specification RC1. © 2026 Bojan Đuričković.
https://kvformat.org/spec/1.0-rc1.html (CC BY 4.0)
        

3 Disclaimer

This specification is provided "AS IS" without warranty of any kind. The authors and copyright holders make no representations or warranties, express or implied, regarding the accuracy, completeness, or suitability of this specification for any particular purpose.

In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with this specification or the use or other dealings in this specification.

4 Scope

This license applies to the specification document only. Reference implementations, example code, and other associated materials are licensed separately as indicated in their respective repositories.