Maintaining a safe and engaging environment is crucial when developing chat applications for children. Murnitur Shield offers robust tools for content moderation, including custom metrics to tailor the moderation process. This blog post will dive into a complex use case: preventing political discussions in a children's chat application using advanced custom metrics.

The Challenge: Detecting Political Content with Context

For a chat application designed for children, it's essential to avoid discussions about political topics. We'll use Murnitur Shield to implement a custom metric that not only identifies political keywords but also understands the context in which these terms are used. This will ensure more accurate and nuanced content moderation.

Step 1: Configuring Murnitur Shield

Set up Murnitur Shield using the Guard class directly, without instantiation:

from murnitur import Guard, GuardConfig
from murnitur.guard import Payload, RuleSet

# Create the configuration
config = GuardConfig()

Step 2: Develop an Advanced Custom Metric

We'll create a custom metric function that performs a more nuanced analysis. This function will:

  1. Check for specific political keywords.
  2. Analyze the context around these keywords to determine if the content is politically charged.
  3. Use a scoring mechanism to evaluate the likelihood of political content.

Advanced Custom Metric Function

Here's the advanced custom metric function:

import re
from typing import Tuple, Optional
from murnitur.guard import Payload

political_terms = [
    "politics",
    "government",
    "election",
    "policy",
    "candidate",
    "party",
    "debate",
    "congress",
    "senate",
    "president",
    "parliament",
]


def advanced_political_content_metric(payload: Payload) -> Tuple[bool, Optional[str]]:
    # Retrieve the chat message from the payload
    chat_message = payload.get("output", "").lower()

    # Check for political terms in the chat message
    for term in political_terms:
        if re.search(r"\b" + re.escape(term) + r"\b", chat_message):
            return True, None

    # Optionally, apply more advanced checks here, such as analyzing message context
    # For example, checking for named entities related to politics

    # If no political terms are found
    return False, chat_message

Registering the Custom Metric

Register the custom metric with the Guard class:

# Register the advanced custom metric
Guard.register_custom_metric('advanced_political_content', advanced_political_content_metric)

Step 3: Implementing Murnitur Shield with Custom Metrics

Configure Murnitur Shield to use the custom metric. Define rules and use the shield method to check the payload:

rulesets = [
    {
        "rules": [{"metric": "custom", "value": "advanced_political_content"}],
        "action": {
            "type": "OVERRIDE",
            "fallback": "Sorry, political discussions are not allowed.",
        },
    }
]

payload = {
    "output": "The upcoming election is a major topic in the news.",
    "contexts": [
        "The election year is causing a lot of debates.",
        "Government policies are changing rapidly.",
    ],
}

# Check the payload using Murnitur Shield
response = Guard.shield(payload, rulesets, config)

print(response.text)

In this example:

  • Advanced Custom Metric: Evaluates the presence and context of political keywords and phrases, scoring the likelihood of political content.
  • Ruleset: Uses the custom metric to detect political content and override messages accordingly.
  • Payload: Contains the chat output and context, which are evaluated for political content.

Conclusion

By implementing advanced custom metrics with Murnitur Shield, you can achieve precise content moderation for applications targeting children. The enhanced custom metric function ensures that political discussions are accurately identified and blocked, maintaining a safe and enjoyable chat environment. This approach not only filters out unwanted content but also respects the context in which terms are used, offering a robust solution for content moderation.