An Outwit Hub Video Tutorial with @GlennGutmacher

Glenn Gutmacher joined us at SourceCon Fall 2013 and delivered a well-received presentation about gathering candidate information by scraping the web. Outwit Hub was one of the primary tools he covered in his presentation. A number of people were impressed with what he shared, but needed more clarification. This morning, Glenn and I recorded a quick demo to help you get started..

This is definitely for our more technical readers. A basic understanding of HTML and Javascript will help one to take full advantage of this tool.

The files discussed in the video can be downloaded here.

  • Glenn Gutmacher

    It was fun (thanks, Jeremy), and I think I made it easier to absorb by providing the
    step-by-step as text instructions as well as the actual Outwit gear file, so definitely click that link above to download the 2 files and then you can easily do everything I showed and more!

  • Randy Bailey

    Glenn, Can you link to outwit hub specific syntaxes you referenced?
    Also perhaps where to learn about the regular expressions or regex? Or specifically how you knew what you needed to change it to? Is there also a faq or something from Outwit?

  • Pingback: Internet Sourcing | King Of String Glenn Gutmacher()

  • Glenn Gutmacher

    Randy, when you load the 2 files from Dropbox attached to this
    article, you will see a lot of the syntax. A regex syntax example
    related to my demo for the “Apply If Page URL Contains” field would be
    /[-A-Za-z0-9]*$/ which would
    tell Outwit to only run the scraper on pages containing the pattern
    between the / and / slashes. Obviously, is how each partner page URL
    starts, but in regex, you have to precede each forward slash in the URL
    with a backward slash to indicate that the next character should be
    escaped (i.e., treated as is, and not interpreted as a special regex
    command character – other special characters like dollar sign, period,
    pipe, etc., should also be escaped if you want them to be recognized as
    what they are, rather than the command they also mean in regex). The
    next portion inside the square brackets is to indicate a pattern of
    characters that would follow next in the URL, which can be a hyphen
    (note it is preceded by to escape it), a capital letter (range A-Z), a
    lowercase letter (a-z), or a digit (0-9). Immediately after the right
    square bracket is the asterisk, indicating there may be multiple such
    characters, and then the final $ indicates the end of the string (i.e.,
    if the URL took on a different pattern after this point, then it would
    not be a URL that I want to process). So that hopefully illustrates one
    of the more complex parts of one of my scrapers in the attached gear
    file. Unfortunately, if you have the free version of Outwit Hub, you
    can only share scrapers. To share macros, you would need the paid
    version (about $75 lifetime license from, so you would not be able to use everything in my gear file until you did that.