Chip Data Files

The Chip Data Files define everything we need to know about a chip type. Their purpose is to keep libhei agnostic to specific chip information. Required information includes:

  • All hardware addresses needed for error isolation.
  • A definition of how errors propagate from register to register.
  • A list of top level registers to use as a starting point for isolation.
  • A list of additional registers to capture for each register bit (for debug).
  • Rules defining how to clear and mask register bits (only applicable if __HEI_ENABLE_HW_WRITE is defined).

File extensions are not required, but it is recommended to use the extension .cdb (chip data binary).

Requirements

  • These files must be consumable by different host architectures. So all data fields within the files will be stored in big-endian format (use endian.h).

File Format

1) File metadata

The following data will be defined at the very beginning of each Chip Data File:

BytesDescValue/Example
8chip data keyword0x4348495044415441 (ascii for "CHIPDATA")
4chip model/levelUnique ID defined by data file owner
1file versionVersion 1 => 0x01, Version 2 => 0x02, etc.

The user application will use the chip model/level ID to determine which Chip Data File(s) should be used for the chip(s) that exist in the user application's system.

2) Registers

Immediately following the metadata in section 1, there will be a list of all registers referenced by the isolation nodes for this chip starting with:

BytesDescValue/Example
4register keyword0x52454753 (ascii for "REGS")
3# of registers0 is invalid

Then, each register will start with:

BytesDescValue/Example
3register IDUnique ID defined by data file owner
1register typeSee appendix for supported register types
1attribute flagsFor each bit 0:disabled and 1:enabled
1# of address instances0 is invalid

The register ID must be unique for all registers withing the Chip Data File.

Supported attribute flags (bits ordered 0-7, left to right): 0: When enabled, the register is readable. 1: When enabled, the register is writable. 2-7: Reserved (default disabled)

2.1) Register Instances

A register could have multiple instances within a chip, each with a different address. A common example is that the same register could exist for each core on a processor chip.

So, immdiately following a register's metadata there will be a list of all instance addresses for a register. Each instance will have the following:

BytesDescValue/Example
1instance #Unique value within the register
*addressThe address size is defined by the register type

3) Isolation Nodes

Hardware errors will be reported via registers organized in a tree-like structure. Isolation of these errors will traverse the tree from a root node down to the actual bit that raised the attention.

Immediately following all of the metadata described in section 2:

BytesDescValue/Example
4isolation node keyword0x4e4f4445 (ascii for "NODE")
2# of isolation nodes0 is invalid

Then, each node will start the following data:

BytesDescValue/Example
2node IDUnique ID defined by data file owner
1register typeSee appendix for supported types
1# of node instances0 is invalid

The node ID must be unique for all nodes within a Chip Data File.

IMPORTANT: All registers referenced in a node's isolation rules must be of the same register type expressed in this field. This will ensure there is no ambiguity when resolving the bitwise expressions in the isolation rules.

3.1) Isolation Node Instances

Much like a register, a node can have multiple instances. Each instance will have the following:

BytesDescValue/Example
1instance #Unique value within the node
1# of capture registers
1# of isolation rules0 is invalid
1# of child nodes
3.1.1) Capture Registers

This list specifies which registers to capture and store for debugging purposes. Note that any register referenced in the isolation rules will be automatically captured and do not need to be duplicated in this list.

Each capture register will have the following metadata, if any exist, and will immediately following the metadata for each node instance.

BytesDescValue/Example
3register IDSee section 2 for details
1register instanceSee section 2 for details

Version 2 and newer:

The user application can now specify registers to be captured when isolating to a specific bit in an isolation node as opposed to any bit in the isolation node. This can reduce the amount of default data captured if a particular bit requires capturing registers that are uninteresting to the other bits.

Beginning with version 2, the following will be appended to the above capture register metadata:

BytesDescValue/Example
1bit position within the isolation nodesee notes below

Notes:

  • The bit position will not exceed number of bits defined by the register type.
  • The order of the bit position is dependent on the register type.
  • A value of 255 indicates the register will be captured for all bit positions within the isolation node.
3.1.2) Isolation Rules

Each node instance will represent a register, or set of registers. The register(s) can be configured to represent one or more attention types. Therefore, the isolation rules are used to define how attentions are reported. Expressions (see appendix) will be used to perform bitwise operations on register values. Any bits set after the expressions have been resolved will indicate active attentions.

Each rule will have the following metadata and will immediately following the capture register metadata for a node instance, if any exists.

BytesDescValue/Example
1attention typeSee appendix for supported attention types
*rule expressionSee expression definition in appendix

Note that the size of the expression field is variable. See the appendix for details.

3.1.3) Child Nodes

Any bits set after an isolation rule resolution represents an active attention. A child node should exist for any bits in the resolution that indicate the attention originated from another node.

The following metadata will exist for each child node associated with a node instance:

BytesDescValue/Example
1bit position within the resolutionSee notes below
2child node IDSee node metadata
1child node instanceSee node metadata

Notes:

  • The size of the isolation rule resolution is defined by the register type represented by this node.
  • The bit position will not exceed the register size.
  • The order of the bit position is dependent on the register type.

4) Root Nodes

After all of the Isolation Nodes have been specified, there will be a short list of all the root nodes to each of the isolation trees. The metadata for this section starts with:

BytesDescValue/Example
4root isolation node keyword0x524f4f54 (ascii for "ROOT")
1# of root nodes0 is invalid

Each isolation tree will report attentions for a single attention type and there can only be one tree per attention type. Immediately following the above metadata will be the following for each root node:

BytesDescValue/Example
1attention typeSee appendix for supported attention types
2root node IDSee node ID description in section 3
1root node instanceSee node instance description in section 3

Appendix

1) Supported Register Types

  • Power Systems SCOM register

    • Type value: 0x01
    • Address size: 4 bytes
    • Register size: 8 bytes
    • Bit order: ascending (0-63, left to right)
  • Power Systems Indirect SCOM register

    • Type value: 0x02
    • Address size: 8 bytes
    • Register size: 8 bytes
    • Bit order: ascending (0-63, left to right)

2) Supported Attention Types

ValueDescription
1System checkstop hardware attention
2Unit checkstop hardware attention
3Recoverable hardware attention
4SW or HW event requiring action by the service processor FW
5SW or HW event requiring action by the host FW

3) Expressions

Expressions are used in various locations within a Chip Data File. They can be used to characterize operations carried out against registers and/or integer constants. For example, <some_register> & (0xffff << 16).

The first byte of every expression is the expression type. The data immediately following this field is dependent on the expression type.

3.1) Non-recursive expressions

3.1.1) Register reference expression

This is a special expression that indicates the value of the target register should be used in this expression. Generally, this means reading the register value from hardware.

The following is the complete byte definition of the expression:

BytesDescription/Value
1expression type = 0x01
3register ID, see section 2 for register details
1register instance, see section 2 for register details

As you can see, the register ID and instance can be used to find this register's metadata (e.g. the address) from the register lists (see section 2).

3.1.2) Integer constant expression

This simply contains an unsigned integer constant.

The following is the complete byte definition of the expression:

BytesDescription/Value
1expression type = 0x02
*An unsigned integer constant (see note below)

IMPORTANT: The size of the constant is determined by the register type specified by the containing node. See section 3 for node details.

3.2) Recursive expressions

These expressions will contain other expressions. Each sub-expression will always be resolved before handling the containing expression. For example, say we need to do something like:

REG_1 & ~REG_2 | CONST_1

Where REG_* are register reference expressions and CONST_* are integer constant expressions. Following standard C++ order of operations, that would be evaluate into an expression like:

OR( AND( REG_1, NOT(REG_2) ), CONST_1 )

Where the NOT will be evaluated first, then the AND, then finally the OR.

3.2.1) AND expression

A bitwise AND operation (i.e. EXPR_1 & EXPR_2).

BytesDescription/Value
1expression type = 0x10
1# of sub-expressions
*all sub-expressions
3.2.2) OR expression

A bitwise OR operation (i.e. EXPR_1 | EXPR_2).

BytesDescription/Value
1expression type = 0x11
1# of sub-expressions
*all sub-expressions
3.2.3) NOT expression

A bitwise NOT operation (i.e. ~EXPR).

BytesDescription/Value
1expression type = 0x12
*sub-expression
3.2.4) Left shift expression

A left shift operation (i.e. EXPR << shift_value).

BytesDescription/Value
1expression type = 0x13
1shift value
*sub-expression
3.2.5) Right shift expression

A left shift operation (i.e. EXPR >> shift_value).

BytesDescription/Value
1expression type = 0x14
1shift value
*sub-expression