Are the URLs in Your Categories Set Correctly?

Friday, December 7th, 2012

If you are upgrading your CyBlock or Cyfin product, you will be using the Wavecrest URL List 7. List 7 supports wildcard entries in domain, path, and parameter matching in URLs. In List 6, wildcard entries were possible, but limited, and thus, the URL matching was slightly different. Therefore, we recommend that you recheck and reset the URLs that were added to your standard and custom categories.

To do this, go to the Advanced Settings – Category Setup – Edit URLs screen and select the category you want to change. In the Supplemental URLs or Custom URLs box, modify your URLs according to the List 7 rules. List 7 allows you to use the following wildcard rules to add multiple URLs simultaneously.

  1. Wildcards With Domain Matching.This URL matching method categorizes Web sites whose pages all contain the same type (category) of content, e.g., Shopping, News, and Sports. In these relatively simple cases, one category applies to the entire site. Under this method, if the Web log entries are in any of the following formats, and the URL List contains a matching URL, the product will categorize the visit on the basis of the domain name.
    • www.mydomain.com
    • *.mydomain.com
    • www.mydomain.*
    • *.mydomain.*


    Note:
     For this method to work, and as reflected in the examples, the entry in the URL List must contain a complete domain name element. That is, the domain name between the periods (dots) must be complete and must not be augmented with an asterisk or any other character. For example, the list must not contain mydomain*.com or *mydomain.com.

  2. Wildcards With Domain and Path Matching.This URL matching method categorizes Web site visit-attempts at the path level. This method enables individual pages to be categorized. If the URLs visited (as documented in the Web logs) are in any of the following formats and there is a corresponding entry in the URL List, the product will categorize the visit on the basis of the domain name and path.
    • www.mydomain.com/path/*
    • www.mydomain.com/*/path/*
    • *.mydomain.com/*/path/*
    • *.mydomain.com/path/

    Notes: For this method to work, the entry in the URL List must contain a complete path element. That is, the path element between the forward slashes must be complete and must not be augmented with an asterisk or any other character. For example, the list must not contain /path*/. As indicated at the end of the fourth example above, the asterisk is not always required, i.e., an exact path can be entered. However, as indicated in all four examples, forward slashes are always required.

  3. Wildcards With Parameter Matching.This method adds parameter matching to the two methods defined above (domain alone and domain-plus-path). It focuses more on syntax found in URL parameters than on content of the site being evaluated by the product. The parameter method works as follows. If the Web log entries are in any of the formats listed below, the product will categorize the visit on the basis of (a) the domain name plus the parameter, or (b) domain name plus path and parameter. Note that the first three bullets are examples of the former (no path included).
    • www.mydomain.com/*?keyword=value
    • www.mydomain.com/?keyword=value
    • www.mydomain.com/?id=*
    • www.mydomain.com/?id=*&sr=* (example of multiple pairs)
    • *mydomain.com/*/path/*?id=*

    Notes:  Parameter matching always requires the use of “?”. If a question mark is placed at the end of the domain or the path, the URL List will perform this matching method.The “/” is also required for this method. However the “&” is optional and is only needed when more than one “keyword=value” pairing is involved (as indicated above). Note that the “&” is added between pairs, and the pairs do not have to be in any particular order.

For additional assistance, please contact Technical Support at (321) 953-5351, ext. 4 or support@wavecrest.net.