Dymium detects sensitive information using a combination of a proprietary machine learning model, regular expression patterns, and administrator-defined entity lists. Detection works across two contexts:
- GhostDB — PII is detected in database columns and content during data source analysis. Detection results feed into policy templates, where each detected entity type can be assigned an action (allow, block, redact, or obfuscate).
- GhostAI / GhostLLM — PII is detected in user prompts and uploaded documents before they are sent to an LLM. Detected entities are substituted with synthetic data and restored in the response.
For GhostDB, detection is configured under Policy Templates → Detectors. For GhostAI, detection is configured under GhostAI → LLM → Detection, which has sub-tabs for Entities, Custom Entities, Regex Detectors, and Source Code.
Built-in detectors
Dymium includes a built-in PII detection model that recognizes the following entity types:
| Entity | Description |
|---|---|
| Address | Street addresses, mailing addresses |
| Age | Age values |
| Date/Time | Dates, times, timestamps |
| Email addresses | |
| IP Address | IPv4 and IPv6 addresses |
| Location | Geographic locations, landmarks |
| Name | Person names |
| Organization | Company and organization names |
| Password | Password values |
| Phone | Phone numbers |
| SSN | US Social Security Numbers |
| URL | Web URLs |
| Username | Usernames and login identifiers |
These built-in detectors cannot be removed but their associated actions can be configured per policy template.
Regular expression detectors
Administrators can create custom detectors using regular expressions. There are two types:
- Regexp for Table Columns — matches against database column names. For example, a pattern like
(vin|VehicleIdentificationNumber)would flag any column whose name contains "vin". This is useful for detecting domain-specific sensitive data that the built-in model may not recognize. - Regexp for Content — matches against the actual data content in database columns (based on a subsample). For example, a pattern matching credit card number formats.
Dymium ships with a set of pre-built column-name patterns covering common sensitive categories including citizenship, criminal history, device IDs, ethnic background, medical information, religious affiliation, sexual orientation, vehicle identification numbers, and more. These can be edited or extended.
To add a custom regular expression detector:
- Navigate to the Regular Expressions tab under Detectors.
- Enter a Name for the detector.
- Select the Type (column name or content).
- Enter the Regular Expression.
- Configure the action for each policy template.
- Click Apply.
User-defined patterns
For cases where regular expressions are too broad, administrators can define exact-match entity lists. These are literal strings that should always be treated as sensitive — for example, specific internal project names, codenames, or proprietary terms.
Patterns can be added in two ways:
- Manual entry — type or paste values into the text area, separated by commas or newlines.
- File upload — upload a
.txtfile containing one pattern per line.
The system automatically detects and removes duplicates when new patterns are added. Existing patterns can be searched, individually deleted, or cleared entirely.
Source code blocking
GhostAI can detect and block source code in prompts to prevent proprietary code from being sent to public LLMs. This is configured under GhostAI → LLM → Detection → Source Code:
- Prevent posting source code in a prompt — toggle to enable or disable source code detection.
- Detection confidence threshold (1–99%) — lower values are more sensitive (flag more content as code), higher values are less sensitive. The default is 80%.
- Detect computer languages — choose which programming languages to detect. Individual languages can be enabled or disabled, or use Select/Deselect All.
Note: source code blocking prevents uploading proprietary code to the LLM. Asking the LLM to generate code examples or discussing code concepts is still allowed.
Actions per policy template
Each detected entity type can be assigned one of four actions, configured per policy template:
| Action | Effect |
|---|---|
| Allow | No transformation — data passes through unchanged |
| Block | Field is completely denied — it does not appear in the Ghost Database |
| Redact | Field value is replaced with a placeholder (e.g., [REDACTED]) |
| Obfuscate | Field value is replaced with synthetic, format-preserving data |
Different policy templates can assign different actions to the same entity type, allowing the same data to be treated differently for different groups of users.