Detection – Dymium

Dymium detects sensitive information using a combination of a proprietary machine learning model, regular expression patterns, and administrator-defined entity lists. Detection works across two contexts:

GhostDB — PII is detected in database columns and content during data source analysis. Detection results feed into policy templates, where each detected entity type can be assigned an action (allow, block, redact, or obfuscate).
GhostAI / GhostLLM — PII is detected in user prompts and uploaded documents before they are sent to an LLM. Detected entities are substituted with synthetic data and restored in the response.

For GhostDB, detection is configured under Policy Templates → Detectors. For GhostAI, detection is configured under GhostAI → LLM → Detection, which has sub-tabs for Entities, Custom Entities, Regex Detectors, and Source Code.

Built-in detectors

Dymium includes a built-in PII detection model that recognizes the following entity types:

Entity	Description
Address	Street addresses, mailing addresses
Age	Age values
Date/Time	Dates, times, timestamps
Email	Email addresses
IP Address	IPv4 and IPv6 addresses
Location	Geographic locations, landmarks
Name	Person names
Organization	Company and organization names
Password	Password values
Phone	Phone numbers
SSN	US Social Security Numbers
URL	Web URLs
Username	Usernames and login identifiers

These built-in detectors cannot be removed but their associated actions can be configured per policy template.

Regular expression detectors

Administrators can create custom detectors using regular expressions. There are two types:

Regexp for Table Columns — matches against database column names. For example, a pattern like (vin|VehicleIdentificationNumber) would flag any column whose name contains "vin". This is useful for detecting domain-specific sensitive data that the built-in model may not recognize.
Regexp for Content — matches against the actual data content in database columns (based on a subsample). For example, a pattern matching credit card number formats.

Dymium ships with a set of pre-built column-name patterns covering common sensitive categories including citizenship, criminal history, device IDs, ethnic background, medical information, religious affiliation, sexual orientation, vehicle identification numbers, and more. These can be edited or extended.

To add a custom regular expression detector:

Navigate to the Regular Expressions tab under Detectors.
Enter a Name for the detector.
Select the Type (column name or content).
Enter the Regular Expression.
Configure the action for each policy template.
Click Apply.

User-defined patterns

For cases where regular expressions are too broad, administrators can define exact-match entity lists. These are literal strings that should always be treated as sensitive — for example, specific internal project names, codenames, or proprietary terms.

Patterns can be added in two ways:

Manual entry — type or paste values into the text area, separated by commas or newlines.
File upload — upload a .txt file containing one pattern per line.

The system automatically detects and removes duplicates when new patterns are added. Existing patterns can be searched, individually deleted, or cleared entirely.

Source code blocking

GhostAI can detect and block source code in prompts to prevent proprietary code from being sent to public LLMs. This is configured under GhostAI → LLM → Detection → Source Code:

Prevent posting source code in a prompt — toggle to enable or disable source code detection.
Detection confidence threshold (1–99%) — lower values are more sensitive (flag more content as code), higher values are less sensitive. The default is 80%.
Detect computer languages — choose which programming languages to detect. Individual languages can be enabled or disabled, or use Select/Deselect All.

Note: source code blocking prevents uploading proprietary code to the LLM. Asking the LLM to generate code examples or discussing code concepts is still allowed.

Actions per policy template

Each detected entity type can be assigned one of four actions, configured per policy template:

Action	Effect
Allow	No transformation — data passes through unchanged
Block	Field is completely denied — it does not appear in the Ghost Database
Redact	Field value is replaced with a placeholder (e.g., `[REDACTED]`)
Obfuscate	Field value is replaced with synthetic, format-preserving data

Different policy templates can assign different actions to the same entity type, allowing the same data to be treated differently for different groups of users.