YAML specification¶
PyASTrX can also be used to create validations for your YAML files. To do so, you need to create a specification with a type of yaml in the pyastrx.yaml file.
DBT examples¶
DBT is a tool that helps data engineers and data scientists to create data pipelines. It is a great tool, but sometimes it can be hard to find bugs in the YAML files or bad practices made by the developers. PyASTrX can help you to find these bugs and bad practices.
Checking types of attributes¶
The following example shows how to enforce that the persist_docs attribute of a model should be a dictionary.
after_context: 3
before_context: 3
specifications:
my_dbt_specification:
language: yaml
folder: .
rules:
persist_docs_should_be_a_dict:
xpath:
|
//KeyNode[@name="persist_docs"]
/*[
not(self::MappingNode)
]
description: "persist_docs should be a dict"
Quoting database in dbt is a boolean attribute, so to check if it is a boolean, we can use the following rule:
my_dbt_specification:
language: yaml
folder: .
rules:
quoting_database_should_be_a_boolean:
xpath:
|
//KeyNode[@name="quoting"]
/MappingNode
/KeyNode
/*[
not(self::BoolNode)
]
severity: error
description: "Database quoting should be a boolean"
persist_docs_should_be_a_dict:
xpath:
|
//KeyNode[@name="persist_docs"]
/*[
not(self::MappingNode)
]
description: "persist_docs should be a dict"
Enforcing taxonomy and source restrictions¶
It’s common to have a taxonomy in database projects, and it’s also common to have a project that should not have acces to some sources. To see how to enforce this, suppose that we want to enforce that each source model should be one that starts with svc_ pattern,
my_dbt_specification:
language: yaml
folder: .
rules:
sources-should-be-svc:
xpath:
|
//KeyNode[@name="sources"]
//StrNode[not(pyastrx:match('svc_*',text()))]
description: "Sources should be prefixed with svc_"
severity: error
quoting_database_should_be_a_boolean:
xpath:
|
//KeyNode[@name="quoting"]
/MappingNode
/KeyNode
/*[
not(self::BoolNode)
]
severity: error
description: "Database quoting should be a boolean"
persist_docs_should_be_a_dict:
xpath:
|
//KeyNode[@name="persist_docs"]
/*[
not(self::MappingNode)
]
description: "persist_docs should be a dict"
severity: error
Python specification¶
Default arguments¶
Mutable default arguments¶
mutable-defaults:
xpath: "//defaults/*[self::Dict or self::List or self::Set or self::Call]"
description: "Can create bugs that are hard to find"
severity: "error"
why: "bad practice"
Global variables¶
Global definition¶
global-keyword:
xpath: "//FunctionDef/body/Global"
description: "This can create annoying side effects"
severity: "info"
use_in_linter: false
why: ""
Unnecessary global keyword in function¶
mutable-defaults:
xpath: "//FunctionDef/body/Global/names[not(item=../../Assign/targets/Name/@id)]"
description: "An unnecessary global keyword is being used"
severity: "info"
why: "bad practice"
Function definitions¶
Recursion¶
recursion:
xpath: "//FunctionDef[@name=body//Call/func/Name/@id and not(parent::node()/parent::ClassDef)]"
description: "Recursion pattern detected in this file"
severity: "info"
why: "should be refactored"
Recursion in a class method¶
This example also shows that we can use multiple lines to define a complex xpath expression.
recursion-class-method:
xpath:
|
//ClassDef
/body
/FunctionDef[
@name=body
//Call
/func
/Attribute[
value/Name[@id='self']
]
/@attr
]
description: ""
severity: "info"
New variable with the same name as the current function¶
redefinition-of-function-var:
xpath: "//FunctionDef[@name=body/Assign/targets/Name/@id]"
description: "Please, avoid defining a new variable with the same name as the current function"
severity: "error"
why: "bad practice"
Allow and deny Lists¶
Is possible to define allow and deny lists to be used in the expressions. To do so, you need to add a match_params in the pyastrx.yaml file, like this:
match_params:
allow_dict:
list_name_1:
- allowed_name_1
- allowed_name_2
- etc
deny_dict:
list_name_2:
- denied_name_1
- denied_name_2
- etc
To use this lists on the xpath expressions, you must call the pyastrx:allow-list or pyastrx:deny-list functions, let’s see some examples:
Arguments replacing built-in functions¶
A hard behavior and bugs can be created if someone associate an argument with the same name as a built-in function. For example,
def foo(dict, list):
for key in dict:
list.append(key)
print(list)
create an entry in the deny_dict inside your pyastrx.yaml file:
match_params:
deny_dict:
built-in:
- dict
- list
- ...
Now, you can use the following rule to detect this behavior:
built-in-function-as-argument:
xpath:
|
//FunctionDef
/args
/arguments
/args
/Name[pyastrx:deny-list('built-in', @id)]
description: "This function uses a built-in function as argument"
severity: "error"
why: "bad practice"
Allow list:¶
pyastrx:allow-list:[pyastrx:allow-list('list_name', @ATTR_TO_BE_CHECKED)]