Messiah Art Resource Check Process

Introduction

This document introduces asset checking in ArtEase with a focus on art asset checking in Messiah. Current version is 1.x. 'Version 2.0' is a new Messiah asset checker scheduled to launch in the near future. It includes major upgrades on performance checking and usability while supporting more functions. Version 2.0 is compatible with the majority of Version 1.x's configuration with a few exceptions. All functions/configuration noted 'Version 2.0' in this document are only available in this version.

Rule Definition

A Checking rule contains two parts, one of which is the description, displaying errors. For example, 'name' shows the name of a rule and 'Severity' shows the importance of a rule. These fields contain information but do not participate in the asset checking.
The other part of a rule is the definition whose values will be passed to related checking process. Let's get started on the definition.

'rpath': Its value is a regular expression or a predefined file collection.

When the field starts with 'artfunc_', it means the file list is a predefined file collection. Available predefined file collections will be described later in the document.

If the regular expression matches the full path of a file, the file will be checked by the rule.

'not_rpath': Its value is a regular expression.

If the regular expression matches the full path of a file, the file will not be checked by the rule.

'xpath'/'subxpath': It requires a string type value. This field should be written differently according to file types. How it is written will be elaborated later in the document. This field is to retrieve a designated value in the file to participate in the later checking process.
'filter'/'subfilter': Its value is a regular expression or a function name.
This field is to filter or convert the values returned from the previous step.

When the field starts with 'artfunc_', it means it will execute a function. It can be a built-in function or a customized function.

When the field's value is a regular expression, checking will continue only if the expression matches the values returned from the previous step.

'not_filter': Its value is a regular expression.

Checking will stop if the expression matches the values returned from the previous step.

'condition': Its value is list format.

The value of its first element is a checking method. If the method starts with 'func_', it means to execute a built-in or customized function and other elements serve as the parameters of the checking method.

'Version 2.0' supports strings starting with messiah_condfunc. which means to execute the checking functions in messiah_condfunc. Methods in this module are usually exclusive to a certain rule.

Checking methods available in the field and their parameters are described below. The result of this step determines the result of a file check by this rule.

'func_str': The code of a customized function.
Configure 'filter', 'condition' and 'subfilter' to make their values conform to the function format and so that they can be called on in later checking process.
For example, the name of a checking function is 'custom_func', and thus the value of this function in 'filter'/'subfilter', 'artfunc_custom_func', or the first element, 'func_custom_func', in the value list of 'condition' will be called on to the customized function.

The arguments to be received by the customized function and the format of　return value will be described later in the document.

Checking Process

Asset checking flowchart is as followed: ! flow
For a checking rule, 'rpath', 'xpath' and 'condition' are required and other fields are optional.

The return values of each step are passed into the next step as arguments. The return value of 'condition' will be the checking result of a file.

rpath: Returns a path list of all the files which satisfy the regular expression of rpath in the checking directory.
not_rpath: Receives a file path list and returns the path of files which do not satisfy the regular expression of not_rpath in a list.
xpath: Receives a file path list and returns a list of data retrieved from these files. Each element in the return list is data retrieved from a certain file, usually in the format of a list.
filter: Receives a set of data and returns the dataset where a regular expression filter or function conversion is performed.
not_filter: Receives a set of data and returns datasets which do not match the regular expression.
subxpath: Input and output format are the same as those of xpath. subxpath and subfilter will be executed usually when the filter is 'res_filter.guidToRealPath' or 'res_filter.guidToResourcePath' (Version 2.0).
subfilter: Receives the output of subxpath. Input and output format are the same as that of filter.
condition: Receives a set of data and returns the data which do not satisfy the checking rules as exception in asset data.

xpath Format

Xpath can be used to retrieve data in designated files. Xpath format changes according to file format. As Messiah asset info files (including resource.repository, .iworld, .ilevel files) are xml format files, the document will elaborate on the xpath format of xml files.

xpath of a xml format file

Notes: Some xpath formats in the current version are not supported in Version 2.0 and Version 2.0 supports more formats. Please pay attention to the noted version.

Notes: In Version 1.x, xml files are interpreted by Python's module, 'xml.etree.ElementTree', while in Version 2.0, it is the module, 'lxml.etree'.

Three steps to extract data in a xml file.
1. Find the designated node ('Element'); 2. Check if data should be extracted in a node by the node value, its child node value and node attributes. 3. Retrieve the needed values according to the definition of the expression.

The above mentioned is the three steps which make up the xpath expression of xml format files. Here is the introduction of xpath expression format with a simple example.

This is a resource.repository file in Messiah asset.

<?xml version="1.0" encoding="utf-8"? >
<Repository>
    <Items>
        <Item>
            <Type>Material</Type>
            <Flags>0</Flags>
            <GUID>914fa3a0-62cd-4b5a-83df-92fdf494534a</GUID>
            <Package>Test/Test_Demo</Package>
            <Class>Material</Class>
            <Deps>9dc97d9a-64b5-49fb-acfa-c1704ebc2ef2</Deps>
            <Deps>12345678-9876-1234-abcd-1234567890ab</Deps>
            <Name>Hole_mat</Name>
            <Annotation>
                <SourcePath />
                <CreationTime>0</CreationTime>
            </Annotation>
        </Item>
        <Item>
            <Type>Material</Type>
            <Flags>0</Flags>
            <GUID>e081bf41-a561-4750-83af-78c80468bc6c</GUID>
            <Package>Test/Test_Demo</Package>
            <Class>Material</Class>
            <Name>Trim_mat</Name>
            <Annotation>
                <SourcePath />
                <CreationTime>0</CreationTime>
            </Annotation>
        </Item>
        <Item>
            <Type>Mesh</Type>
            <Flags>0</Flags>
            <GUID>e5ce0940-f689-404d-968f-775317804069</GUID>
            <Package>Test/Test_Demo</Package>
            <Class>Mesh</Class>
            <Name>Wall</Name>
            <Annotation>
                <SourcePath />
                <CreationTime>0</CreationTime>
                <Anno>
                  <Key>CollisionShape</Key>
                  <Value>e4ea79e4-3505-4e96-bbc0-bc567a1204e5</Value>
                </Anno>
            </Annotation>
        </Item>
    </Items>
</Repository>

To find out all GUIDs of material type resources, we need to:

Find the specific nodes. Target nodes are Item nodes. The result values are hierarchical relations between nodes and nodes are separated by '/'. In this example, Item node path is 'Repository/Items/Item'. The root node can be replaced by a '' to get '/Items/Item'.
Check if data extraction should be performed in a node. Check if the type value of Item child nodes is Material through the expression: 'Type==Material'.
Define the required data. This should be the value of GUID child node. And thus the tag of the expression child node is: 'GUID'.

The expression in step 1 and step 2 should be separated by a period, '.', and the expression in step 2 and step 3 should be separated by a comma, ','. The complete xpath expression is: '*/Items/Item.Type==Material,GUID'.

Taking the above file as an example, the return result of the xpath step is '['914fa3a0-62cd-4b5a-83df-92fdf494534a', 'e081bf41-a561-4750-83af-78c80468bc6c']'.

The structure of a complete xpath expression is: 'Node path (. the node expression to be checked)(,name of the nodes to be extracted)(: name of the attributes to be extracted)'. The structure of a complete xpath expression in Version 2.0 is: ‘Node path ([@node attribute expression to be checked])(. the node expression to be checked)(,name of the nodes to be extracted)(: name of the attributes to be extracted)‘. Content enclosed in the parentheses are optional.

Here are the formats of each part of an xml xpath expression.

Node Paths

Node paths show the nodes where attribute and child node checking and data extraction will be performed. Other parts of xpath expression will be performed on tags in order from the root tag to the current node tag according to xml hierarchy. Tags are separated by '/'. Root node tag can be replaced by '*'.

Fuzzy matches which start or end with a '' representing an arbitrary string can be performed in node tags. For example, 'Item' will match the nodes whose tags start with Item in the required hierarchy.

Node tag can be replaced by a '' to represent all nodes in the same hierarchy. For example, '/Items/*' means all child nodes of Items nodes under the root node.

Child Node Value Checking Expression

The structure of value checking expression: Node name == or ! = the value that node name should satisfy.

Node names are the tag names of child nodes which require a value check and the node value is a string or a regular expression. The testing condition is if the tests values are equal (==) or not equal (! =). If the checking value starts and ends with backtick (`), the content in between two (`) will serve as a regular expression. Otherwise, the content will be recognized as a string (usually containing letters, numbers and underscores and excluding special characters such as ., :, & and |). When the checking value is a string, the tag value and the checking value should meet the checking condition; when the checking value is a regular expression, checking condition is tested equally means that the regular expression can match the node value, and it is tested not equal means that the regular expression does not match the node value.

Expressions can be connected by Logical connectives. When two expressions are connected by '&' (the AND operator), both of the expressions need to be valid. When two expressions are connected by '|' (the OR operator), either of the expressions should be valid. Please note that there are no rules as to which of the two operators takes priority, and thus please avoid using the two operators at the same time.

Examples:

Type==Material means ' node value ' requires a Material value.

Name==`^Mesh_` means the value of ' node value ' should start with Mesh_ (to match the regular expression ^Mesh_.)

Type==Material&Name==`^Mesh_` means that both of the above conditions should be met.

'Version 2.0' expands the checking expression. New updates include: * Value check on all child nodes in all hierarchies. Hierarchies are separated by '.'. For example, to check the value of Key which is the child node of Anno under another child node, Annotation, the expression can be 'Annotation.Anno.Key'. * Strings can be enclosed by double quotation marks in the checking value which can contain all characters except for double quotation marks and backticks.

'Version 2.0' requires expressions to check node attributes.

The format of attribute-checking expression is the same as that of node value checking expression.

Name of the nodes whose values need to be extracted.

This part contains the names of the nodes whose values need to be extracted. When it is required to return the values of multiple child nodes, child node tags should be separated by comma ',' To return the value of '', this part of expression should be Name. To return the value of '' and '', this part of expression should be Package,Name.

If there is only one node, the return value will be a list of node values. If there are more than one node, the return value will be in the following format:

[
  {
    "Node name 1": ["Node value 1"],
    "Node name 2": ["Node value 2"],
  },
  {
    "Node name 1": ["Node value 3"],
    "Node name 2": ["Node value 4”, "Node value 5"],
  }
]

One of the special values in this part is '{dom}' which will return the object of xml.etree.ElementTree.Element of a node. 1 'Version 2.0' can extract values from any child nodes in any hierarchies. For example, to check the value of Value which is the child node of Anno under another child node, Annotation, the expression can be 'Annotation.Anno.Value'. Examples: To find out the material-type GUID of CollisionShape on which Mesh depends, the expression in Version 2.0 can be: '*/Items/Item.Type==Mesh&Annotation.Anno.Key==CollisionShape,Annotation.Anno.Value'

Attribute Names

The format is the same as that of the node value extraction expression.

Other Rules
Node value extraction and node attribute extraction cannot be included in a xpath expression at the same time.
Data extraction will be performed on all nodes which satisfy the path requirement when there isn't a node value expression nor a node attribute expression. 'Not recommended' When the xpath expression contains only the part of node paths, the value of all nodes which satisfy the path condition will be returned.
Examples of xpath Expressions (target file: resource.repository)
- '*/Items/Item.Type==Texture,GUID': Obtains the GUIDs of all Texture type resources.
- ''*/Items/Item.Type==Model&Name=='^char_`,Deps': Obtains a GUID list of all resources on which resources named Model and starting with char_ (satisfying the regular expression, ^char_) depend.
- '/Items/Item/GUID': (Not recommended) Obtains all resource GUIDs. The recommended expression in Version 2.0 is '/Items/Item.GUID'.

Checking Methods Available for Condition

The condition statement determines the checking method of a rule and checks the data returned from the previous step. Whether a file passes the checking depends on the result of this statement.

The format of the field, condition, is a list with all elements being a string type value. The first element is the checking method and the rest are parameters. The data passed to the checking function is a data list, and result and data which fail the checking will be returned.

Available checking methods: * Exist: The data to be checked exist (not null, not False, not None and tested true in Python.)

Format: '["Exist"]'.

Not Exist: The data to be checked do not exist (null, False, None and tested false in Python.)

Format: '["Not Exist"]'.

Exist in: The data to be checked exist in a collection.

Format: '["Exist in", "Collection where the value should exist"]' The collection can be a built-in predefined collection (check the following article for details). In Version 2.0, it can be a given list. Other collections supported in current rules are not recommended.

A collection starting with artfunc_ will be recognized as a built-in predefined collection and when it is a dict-type collection, the given value will be checked to see if it exists in Key in dictionary. In 'Version 2.0', when a collection of strings starts with '['and ends with']', the strings will be recognized a list. In this case, the strings should conform to the format of list in Python and should be separated by commas.

For example, the expression to check if resource files exist in GUID can be: '["Exist in", "artfunc_res_filter.ALL_RES_GUID"]'

Not exist in: The data to be checked do not exist in a certain colletion.

Format: '["Not exist in", "Target colletion"]'。 The format of collection is the same as that in "Exist in".

Null collection: The target is a null collection (length: 0).

Format: '["Null collection”]'.

*'["Not null collection"]': The target is not a null collection (length>0).

Format: '["Not null collection"]'.

Unique: The data to be checked are sorted by their origins and each piece of data from a same origin is unique. It requires the elements to be hashable ones.

Format: '["Unique"]'.

Comparison Operation: Check if the target satisfies a simple mathematical operation.

Format: '["Operator", "Value"]'. The operator should be one of '>', '<', '>=', '<=', '==' and '! ='. If the value in the condition can be converted to numbers, both the value and condition will be converted to numbers first and then numeric operation is performed. Otherwise, it will be a string-type operation.

Satisfy the Expression

Format: ["Satisfy the expression", "expression"]. The expression should conform to the expression format in Python and returns bool-type values. Variable 'd' can represent the values to be checked and a built-in predefined collection can be used in the expression.

For example, the expression to check if there are duplicate GUIDs in resource.repository is: '["Satisfy the expression", "d in res_filter.ALL_RES and len(res_filter.ALL_RES[d]) == 1"]'

Customized Checking Functions

When a checking method starts with 'func_', the checking function will be executed and the return value of this function will be recognized as the execution result of a rule. Please check the following for details.

Built-In Predefined Collections and Methods

Predefined Collections

Before executing any rules in Messiah art asset checking, asset information in the directory, Package, will be collected and sorted by their origins and content into two groups: * Resource attributes such as GUID, Type, Name, Package and Deps. These are obtained from resource.repository files in asset warehouse. * The relations between resource GUIDs and the paths to their attribute files. These are obtained by traversing files in asset warehouse. The attribute files of the majority of the resources are the 'resource.xml' or 'resource' files in their respective directories. For some 'texture.xml' texture resources, additional information such as the file path of 'resource.data' will be included.

These information are used to obtain and check the resource's basic attributes (information in resource.repository) and the resource's detailed information or to check if there are duplicate resource GUIDs or virtual paths or to check if there are any missing depending resources. Storage Variables: * Resource information obtained from resource.repository: 'res_filter.ALL_RES'. The format of its data is: 'Dict[str, List[Tuple[str, str, str, List[str], str]]]'. Resource GUIDs are keys in a dictionary, and the value are lists (because when there are duplicate GUIDs, a GUID could correspond to multiple resources.) The 5-tuple child elements are the definition of resource basic information. The definition is '(Package, Name, Type, Deps, resource path)'.

Resource attribute file paths obtained from hard disk files: 'res_filter.ALL_RES_GUID'. The format of its data is: 'Dict[str, List[Dict[str, str]]]'. Resource GUIDs are keys in a dictionary, and the value are lists (because there might be duplicate GUID directories.) Dictionary-type child elements are resource file path information including the following Keys:
Resource: The paths of 'resource.xml', 'resource' or 'texture.xml' files depending on resource types.
Data: The paths of 'resource.data' or 'texture.data' depending on resource types.
HDR: '.hdr' file path.
Source: The paths of files whose names start with 'source'.
All ilevel file paths obtained from iworld files: 'res_filter.ALL_USED_FILTER'. The format of its data is: 'Set[str]', the full path collection of all ilevel files in iworld files.

These pedefined collections may be used in rpath, condition and customized functions. In rpath and condition: 'artfunc_'+ Name. For example, 'artfunc_res_filter.ALL_RES'. A predefined collection should be called on after import res_filter in customized codes.

def custom_check_func(**pdict):
    import res_filter
    res_filter.ALL_RES_GUID

'Version 2.0' has made major adjustments to the definition format and record location of predefined collections. * Resource information obtained from resource.repository: 'res_store.res_in_repo'. The format is 'Dict[str, Tuple[MessiahResource]]'. Resource GUIDs are keys in a dictionary, and the value is the tuple of MessiahResource-type instances.

'MessiahResource' types include: * GUID: Resource GUID. * Package: The package where resources exist. * Name: Resource name. * Type: Resource type. * Deps: 'Tuple[str]' type, the GUID tuple of the resource on which another resource depend. * Repository: Resource's warehouse name. * VirtualPath: Virtual path of the resources. The format is Package/Name. * RepositoryFullPath: Warehouse full path.

Resource attribute file paths obtained from hard disk files: 'res_store.res_in_disk'. The format is 'Dict[str, Tuple[ResourceFile]]'. Resource GUIDs are keys in a dictionary, and the value is the tuple of ResourceFile-type instances.

'ResourceFile' type include: * GUID: Resource GUID. * Repository: Resource's warehouse name. * Type: Resource type. * Resource: Resource attribute file paths. It can be file path of 'resource.xml', 'resource' or 'texture.xml' files, depending on the resource type. * Data: File path of 'resource.data' or 'texture.data' file. It should be 'None' if neither of the files exists. * HDR: File path of '.hdr'. It should be 'None' if the file does not exist. * Source: The path of files whose name start with 'source'. It should be 'None' if the files do not exist.

Mapping dictionary of virtual paths and resource's basic attributes: 'res_store.res_virt_info'. The format is 'Dict[str, Tuple[MessiahResource]]'. The keys in the dictionary is the virtual paths.

all ilevel file paths in iworld files: 'res_store.ilevels'.

Version 2.0 still recognizes variable names of the 'res_filter.ALL_RES', 'res_filter.ALL_RES_GUID' and 'res_filter.ALL_USED_ILEVEL' but 'res_store' module is recommended instead for better readability.

Predefined Methods

Predefined methods in the checking framework can be used in filter and condition. The majority of these methods are coded for specific checking requirements and rules. There are some general methods for filter.

The often used methods in filter: * 'res_filter.guidToRealPath': Obtains GUID resource information file path. It is the most commonly used predefined method in checking rules. When the filter uses this method, subxpat will continue to extract values to check in resource attribute files

In 'Version 2.0' , a more meaningful alias is assigned to the function: 'res_filter.guidToResourcePath'. * 'res_filter.getResourcesInILevel': Obtains all GUID lists of resources cited in ilevel files. * 'res_filter.validGUID' in Version 2.0: Returns strings which conform to the GUID format in the receiving lists.

Methods used in the field, condition, are usually used in specific checking rules: * 'condfunc.compressed_src_tex': Checks if the original maps are compressed and returns uncompressed ones as errors. * 'condfunc.foliage_shader': Check if the materials of an object are in the allow-list. * 'condfunc.rigidbody_deps': Checks if the resource on which a physical resource depends is missing. * 'condfunc.effect_collision_non_local': Collision effects cannot be local. * 'condfunc.effect_emitter_count_check': Checks if the number of effect launchers correspond to the number of material layers on which special effects depend. * 'g83_runner.srcTexConflict': Checks if the size of a map in attribute file is the same as the size of a tga file.

Customized Functions

When checking logic is complicated and cannot be realized with standard checking process and built-in functions or the execution is complicated, you can write function codes through customized functions and call on the codes in filter and condition to check rules.

Notes: In Version 1.x, customized functions in all rules are valid globally (in spaces named with globles) and will be executed in all global spaces even if the rules do not activate them. By that, there will be function overlay if functions with the same name exist in different rules.

Parameters and the Format of Return Data of Customized Functions

Parameters will be passed into customized functions as named parameters. Thus, the general format of a customized function is:

def custom_func(**pdict):
    # Logic code
    return ret_value  # returns values which are different from the ones returned by filter and condition.

* Functions in filter/subfilter

The parameters passed in are: * data: The data are usually list type data. * filepath: File path of data origin.

The return value is 'List' type displaying the processed/ converted data.

Funtions in condition

The parameters passed in are: * data: The data to be checked are usually lists. * filepath: Checks data origin file path. * condition: The value of rule condition.

The return value is 'Tuple[bool, List]' type. The first element is checking the result with True representing success and False representing failure; the second element is a list of exception data.

In Version 2.0, Customized functions in condition only return the 'List' of exception data. An empty list means success, and otherwise, failure. It is compatible with the return value format in Version 1.x but the value of the first element will be obsolete.

It is not commended to use customized functions as a condition. Instead, it is recommended to extract values through a customized function in filter and check the values through general metholds. A customized function should be applied when data check cannot be completed with general checking methods (for example when it is required to extract and check values from many associated files), and thus a customized description of exception data is needed.

Calling Methods in Checking Rules

Calling in filter. Add 'artfunc_' + customized function name in the field of filter. For example, to call on a customized function named 'custom_filter_func', 'artfunc_custom_filter_func' should be added to filter.
Calling in condition. Add 'func_'+ customized function name in the field of condition. For example, to call on a customized function named 'custom_check_func', '["func_custom_check_func"]' should be added to condition.

Examples

Please see the public rule base for more checking rules. Here are the checking methods for two rules:

Resource GUIDs in the warehouse must be unique. Rule definition:
```
rpath: .*Repository.*resource.repository
xpath: */Items/Item/GUID
condition: [‘Satisfy the expression','d in res_filter.ALL_RES and len(res_filter.ALL_RES[d]) == 1']
```
As it is required by the rule to find out all resource GUIDs in resource.repository, rpath is a regular expression which searches for resource.repository files. xpath is '*/Items/Item/GUID' which extracts all resource GUIDs. According to the description of dictionary, 'res_filter.ALL_RES', if multiple resources share a GUID, these resources will be stored in the value of 'res_filter.ALL_RES'. As a result, if the length of the value is greater than 1, there are duplicate GUIDs. Thus, condition is '['Satisfy the expression','d in res_filter.ALL_RES and len(res_filter.ALL_RES[d]) == 1']'.
Number of Model Faces Check Rule definition:
```
rpath: .*Repository.*resource.repository
xpath: */Items/Item.Type==Model,GUID
filter: artfunc_res_filter.guidToRealPath
subxpath: */ModelInfo/Root/Entity/NumFaces
condition: ["<=",  5000 ]
```
Checking Process: First, obtain all Model resource GUIDs and then compare the number of model faces read in Model resource attribute files (resource.xml). Rpath and xpath in the rule are to obtain Model resource GUIDs; filter calls on the function, 'res_filter.guidToRealPath', to obtain attribute file paths according to resource GUID. Subxpath is to read the number of model faces and the number will be compared in condition to see if it is <=5000.