How do I configure DLP Dictionaries and Engines?

If necessary, you can configure Zscaler's DLP dictionaries and engines to meet the needs of your organization. Complete this step before you add policy rules for Zscaler's DLP engines.

Configuring DLP Dictionaries

A DLP dictionary contains a set of patented algorithms that are designed to detect specific kinds of information in your users’ traffic. The service provides the following predefined dictionaries.

  • Adult Content
  • Credit Cards
  • Financial Statements
  • Gambling
  • Illegal Drugs
  • Medical Information
  • Names (US)
  • National Insurance Numbers (UK)
  • NRIC Numbers (Singapore)
  • Salesforce.com Data
  • Social Insurance Numbers (Canada)
  • Social Security Numbers (US)
  • Source Code:
    NOTE: The Source Code dictionary will not trigger unless the data size of the detected content is at least 1 KB.
  • Weapons

These predefined dictionaries can be modified, and you can also create custom dictionaries for content not covered by the dictionaries below. Click below for configuration instructions:

Any combination of these dictionaries can be added to a DLP engine, which is what you must reference when creating DLP policies. See About DLP Engines for more information.

Editing Predefined Dictionaries

If necessary you can modify, depending on the dictionary, either one or both of the settings detailed below:

  • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here.  For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
    • If you enter a value of less than five, the dictionary will not trigger unless it also finds relevant keywords.  For example, for the Credit Card Numbers dictionary, if you enter a value less than five, the dictionary will require keywords like "Visa," or “MasterCard” in order to trigger.
    • You may enter any value less than 10,000.
  • Confidence Score Threshold: You can select a value of Low, Medium, or High.
    • For dictionaries where you can also specify the Number of Violations Threshold, the confidence score threshold you select helps determine what counts as a violation. A lower confidence score threshold means the dictionary is more aggressive in counting an instance as a violation. Conversely, a higher confidence score threshold means the dictionary needs more exact format and verification requirements to be met before it counts an instance as a violation.
    • For dictionaries where you can specify only the Confidence Score Threshold, a lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and that it also needs more instances of matching content to be found before it triggers.

Below is a list of the predefined dictionaries Zscaler provides. Click on a dictionary to learn how to make edits.

Adult Content

This dictionary detects adult content.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Adult Content dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

Credit Cards

This dictionary detects leakage of credit cards.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Credit Cards dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here.  For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds relevant keywords.  For example, for this dictionary, if you enter a value less than five, the dictionary will require keywords like "Visa," or “MasterCard” in order to trigger.
      • You may enter any value less than 10,000.
    • Confidence Score Threshold: Select a value of Low, Medium, or High. For the Credit Cards dictionary, the confidence scores have the following implications:
      • Low: Dictionary counts an instance as a violation if a CCN can be validated with a Luhn check.
      • Medium: Dictionary counts an instance as a violation if:
        • The requirements of Low Confidence are met.
        • The CCN is in a popular format.
        • The CCN, the length, and starting range of the number matches that of credit card providers.
      • High: Dictionary counts an instance as a violation if:
        • The requirements of Medium Confidence are met.
        • The CCN is accompanied by keywords such as "American Express," "Amex," "master card," "visa," "CV code," "select card type," "discover," "diners club," "jcb," "pay with checking account," "pay by check or money order," "credit card number," "card holder name," and "expiration date."
  4. Click Save and activate the change.

Financial Statements

This dictionary detects leakage of financial statements.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Financial Statements dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

Gambling

This dictionary detects content related to gambling.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Gambling dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

Illegal Drugs

This dictionary detects content related to illegal drugs.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Illegal Drugs dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

Medical Information

This dictionary detects leakage of medical information.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Medical Information dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

Names (US)

This dictionary detects leakage of names from the United States region.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Names (US) dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to US names.
      • You may enter any value less than 10,000.
    • Confidence Score Threshold: Select a value of Low, Medium, or High. For the Names (US) dictionary, the confidence scores have the following implications:
      • Low: Dictionary counts an instance as a violation if the content contains either a first or last name that is in the dictionary.
      • Medium: Dictionary counts an instance as a violation if the content contains a first and last name that is in the dictionary.
      • High: Dictionary counts an instance as a violation if the content contains either the word "name" or "address" as well as a full name that is in the dictionary, in one of the following formats:
        • Firstname Lastname
        • Lastname Firstname (This is the default.)
  4. Click Save and activate the change.

National Insurance Numbers (UK)

This dictionary detects leakage of UK National Insurance Numbers.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the National Insurance Numbers (UK) dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here.  For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to UK National Insurance Numbers.
      • You may enter any value less than 10,000.
  4. Click Save and activate the change.

NRIC Numbers (Singapore)

This dictionary detects leakage of National Registration Identity Card Numbers (UIN and FIN) for Singapore.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the NRIC Numbers (Singapore) dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here.  For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to NRIC Numbers.
      • You may enter any value less than 10,000.
  4. Click Save and activate the change.

Salesforce Data

This dictionary detects content related to Salesforce.com data.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Salesforce.com Data dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4. Click Save and activate the change.

Social Insurance Numbers (Canada)

This dictionary detects leakage of Canadian Social Insurance Numbers.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Canadian Social Insurance Numbers dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to Canadian Social Insurance Numbers.
      • You may enter any value less than 10,000.
  4. Click Save and activate the change.

Social Security Numbers (US)

This dictionary detects leakage of U.S. Social Security Numbers.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Social Security Numbers (US) dictionary and click the Edit icon.
  3. You can modify:
    • Number of Violations Threshold: The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations.
      • If you enter a value of less than five, the dictionary will not trigger unless it also finds keywords relevant to US Social Security Numbers.
      • You may enter any value less than 10,000.
    • Confidence Score Threshold: Select a value of Low, Medium, or High. For the Social Security Numbers (US) dictionary, the confidence scores have the following implications:
      • Low: Dictionary counts an instance as a violation if it matches a valid range.
      • Medium: Dictionary counts an instance as a violation if:
        • The requirements of Low Confidence are met.
        • The SSN is in a popular format.
      • High: Dictionary counts an instance as a violation if:
        • The requirements of Medium Confidence are met.
        • The SSN is accompanied by keywords such as “date of birth,” “social security number,” “tax payer id,” “ssn,” and “password.”
  4. Click Save and activate the change.

Source Code

This dictionary detects leakage of source code. Note that the Source Code dictionary will not trigger unless the data size of the detected content is at least 1 KB.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Source Code dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4.  Click Save and activate the change.

Weapons

This dictionary detects leakage of weapons.

To edit this dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. Point to the Weapons dictionary and click the Edit icon.
  3. You can modify:
    • Confidence Score Threshold: Select a value of Low, Medium, or High. Confidence scores inform the dictionary how high it must raise the bar, or threshold, for identifying violations and triggering. A lower confidence score threshold means the dictionary is more aggressive in identifying violations and requires less matching content to be found before it triggers. Conversely, a higher confidence score threshold means that the dictionary needs more exact format and verification requirements to be met for matching content, and  that it also needs more instances of matching content to be found before it triggers.
  4.  Click Save and activate the change.

Adding Custom Dictionaries

You can create a maximum of 31 custom DLP dictionaries. For each dictionary, you can add custom phrases and alphanumeric patterns that represent the content you want to protect, and which the dictionary is to detect.

To add a custom DLP Dictionary:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. From the DLP Dictionary tab, click Add and do the following.
  3. Complete the following fields below:
    • Name
      Enter a name for the dictionary.
    • Number of Violations Threshold
      The dictionary triggers only if it finds more violations than the number specified here. For example, if you enter 7, the dictionary will only trigger upon finding 8 or more violations. You may enter any value less than 10,000. Note that for patterns, the dictionary counts only unique matches of patterns toward the number of violations, while for phrases, the dictionary counts all matches of phrases, including identical phrases.
    • Description
      Enter a description for the dictionary. (Optional)
    • Patterns
    • Phrases
  4. Click Save to activate the change.

Configuring DLP Engines

A DLP engine is a collection of one or more DLP dictionaries. When you define your DLP policy rules, you must reference DLP engines, rather than DLP dictionaries. By using a DLP engine, you can create rules to detect content that encompasses more than one dictionary. For example, if your organization wants to protect social security and credit card numbers, you would create a rule using the PCI Engine, which contains the Credit Cards and Social Security Numbers dictionaries. Note that when a DLP engine uses two or more dictionaries, the Zscaler service blocks content only if all of the dictionaries in the engine are triggered. Zscaler DLP engines can scan files with a maximum size of 100 MB.  For an archived file, the size of individual files when decompressed can also be a maximum of 100 MB.

The service provides four predefined engines:

  • HIPAA: This engine is designed to detect Health Insurance Portability and Accountability Act (HIPAA) violations, using the Social Security Numbers (US) and Medical Information dictionaries.
  • GLBA: This engine is designed to detect violations of the Gramm-Leach-Bliley Act (GLBA), using the Social Security Numbers (US) and Financial Statements dictionaries.
  • PCI: This engine is designed to detect Payment Card Industry (PCI) compliance violations, using the Credit Cards and Social Security Numbers (US) dictionaries.
  • Offensive Language: This engine is designed to detect offensive language, using the Adult Content dictionary.

You can add dictionaries to the predefined engines and also create custom DLP engines to detect content that relevant to your organization. Click below for configuration instructions.

Editing Predefined Engines

To add dictionaries to a predefined engine:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. From the DLP Engines tab, point to an engine and click the Edit icon.
    • From the Dictionaries menu, choose the dictionaries that you want to include in the engine.
  3. Click Save and activate the change.

Adding Custom DLP Engines

To add a custom DLP engine:

  1. Go to Administration > Resources > DLP Dictionaries & Engines.
  2. From the DLP Engines tab, click Add and do the following:
    • Enter a Name for the custom DLP engine.
    • Optionally, enter a Description. Enter additional notes or information. The description cannot exceed 255 characters.
    • From the Dictionaries menu, choose the dictionaries that you want to include in the engine. You can search for dictionaries or click the Add icon to add a new dictionary. All selected dictionaries must trigger for an engine to trigger.
  3. Click Save and activate the change.