Configurations

This page provides a comprehensive reference for the configuration models used across different text-mallet algorithms.

POS Filter Configuration ("pos-filter")

Configures token masking based on specific Part-of-Speech tags.

POSFilterConfig Parameters

Parameter

Type

Default

Description

filter_type

FilterType or List[FilterType]

FilterType.Retain

Action to take on the target tags (e.g., retain them or remove them).

pos_tags

List[POSTag]

[NOUN, PROPN]

A collection of Universal POS tags targeted by the filter.

replacement_mechanism

str or List[str]

"default"

Strategy used to replace targeted text (e.g., replacing with the tag name).

Shannon Filter Configuration ("shannon")

Configures text filtering using information-theoretic metrics calculated from an underlying language model context.

ShannonFilterConfig Parameters

Parameter

Type

Default

Description

threshold

float or List[float]

10

The information surprisal ceiling or floor value used to trigger a mask.

bound

str or List[str]

"upper"

Determines if values above ("upper") or below ("lower") the threshold are masked.

replacement_mechanism

str or List[str]

"default"

Strategy used to replace targeted text.

max_context_length

int

8192

Max token historical window parsed by the LM for context evaluation.

output_mi_values

bool

False

If true, outputs calculated mutual information/surprisal values along with the text.

Linear Scramble Configuration ("scramble-BoW")

Configures a lightweight, framework-free Bag-of-Words shuffling operation.

LinearScrambleConfig Parameters

Parameter

Type

Default

Description

level

str or List[str]

"document"

The text layer boundaries applied to shuffles. Allowed values are typically "sentence" or "document".

seed

int

DEFAULT_SEED

Pseudorandom generator state seed to guarantee deterministic text layouts.

Hierarchical Scramble Configuration ("scramble-hier")

Configures deep syntactic structural scrambled layouts using dependency trees generated by a spaCy backend.

HierarchicalScrambleConfig Parameters

Parameter

Type

Default

Description

strength

str or List[str]

"strong"

Rearrangement severity settings. Typical options are "weak" (sibling shuffling) or "strong" (sibling shuffling + directional flips).

seed

int

DEFAULT_SEED

Pseudorandom generator seed utilized when scrambling leaf node distributions.

Multi-Obfuscation Feature: For parameters that accept a union type of a single value or a list (e.g., str | List[str]), passing a list of choices will cause TMallet to compute multiple obfuscation passes simultaneously. The result will be returned as a nested dictionary keyed by the options instead of a flat string. This is with the single exception of passing POS tags to the POS filter, as this must be an array and will not automatically provide the result as a dict. If you want to compute multuple obfuscation configurations at once, passing multiple parameters in the single config as an array leads to performance gains as the text is only processed a single time where possible.