I started this series by providing a list of some different types of pipes. Then, I went over the basic structure of a pipe. Now, we’ll roll up our sleeves and a dig in a little deeper to improve the results we get out of a pipe.
The quality of a pipe’s results is generally referred to as the ratio of signal to noise. Noisy pipes are full of useless results that take up valuable attention time. Improving the quality of results from a pipe requires a deeper knowledge of modules as well as knowledge about the specific data with which your pipe deals.
Filtering negative matches
Negative matches are results that do not meet the intended output of a pipe. For example, a persistent search pipe for the acronym SOA would return results for Service Oriented Architecture, Society Of Actuaries, School of the Americas, and the Sarbanes-Oxley Act. A person looking for Service Oriented Architecture would be constantly sifting through distracting results if they didn’t filter out the negative matches.
In the case of a persistent search, some engines support advanced operators such as the “-” (minus) symbol in front of a term to omit results that contain that term. However, if your source feed is not a search or it’s from an engine that doesn’t support advanced operators you can use the magical filter module.
Here are a couple of exploded views of the filter module (drop down menus are opened to reveal contents):
Filter modules can support multiple rules on the same module. The syntax for the rules is very similar to the ones used in MS Outlook (or Entourage for the Mac). In the case of our SOA example, here’s what a filter would look like for the person wanting information on Service Oriented Architecture:
I used the “any” condition because a match for any one of the three rules is enough to warrant excluding the results from my pipe’s output.
When combining data from multiple sources, it’s common to end up with duplicates in your feed. The unique module is a super easy way to eliminate duplicates and save time. I tend to use this module twice—once to filter items with the same link, and then another time to filter items with the same title. I filter items with the same link because that means the items are the same page, which I don’t need more than once. I filter items with the same title because it’s common for sites like Google News to pick up a story from another source and use the same headline, but it will be a different link. Filtering duplicates improves the signal to noise ratio of any multi-source feed.
Here’s what my dual unique module set-up looks like:
Normalizing dates to chronologically merge results for multiple sources
While it may seem strange to include this piece, the truth is that it is very important. All pipes have a single output, yet most have a multiple inputs. Thus, merging results into a single feed is a very important step. Much of the time people want the information from the various sources to be sorted in chronological order. The standards that define RSS were not put through a rigorous standards development process, so sometimes the format for the item dates differ from source to source, which means the sort module won’t work properly.
To normalize the dates, you’ll need to use the loop and string builder modules. Here is what the modules look like properly configured:
Not all aggregated feeds will need to do this step in order for the sort module to do it's magic. Only use this technique if your merged results are not collated. If you notice the results are stacking, then take a look at the source for each feed and check for discrepancies in the item publication dates. You'll need to copy the string parameters I have in the string builder module, which is %Y-%m-%dT%T (date parameters decoder)
Using Dapper to create feeds where none previously existed
Pipes is a great way to aggregate, manipulate, and mash up feeds. But, what if the data you want isn’t available as a feed, such as the Technorati Top Searches? Dapper is a great service that will allow you to pull the data from any web page into a feed. That means you are not limited to creating pipes from feeds because now any web page can be a feed source. In the next post from this series I’m going to share models for monetizing pipes, and using Dappr to create a feed of valuable information may be the competitive advantage you need to profit off of a pipe’s output.
For more information on Dapper, check out Dapper: The Quest To Unlock Web Data (Legally)
Ok, that’s 3 out of the 5 parts in this series. Don’t forget to check out the last two parts:
- Pipes for profit
- The future of pipes