Wednesday, February 28, 2007

L'Equipe French cycling news in English

Howto below. Digg it if you like it. I've created a new RSS feed for you. It takes cycling news from L'Equipe and LeMonde (French newspapers), and translates the RSS into English. Wait, there's more! I also automatically convert the WWW links so when you click to read the article at L'Equipe or LeMonde, the article is automatically translated from French to English with no other intervention required by you!
L'Equipe cycling news in English
The RSS feed is here. To preview as a web page, here's the Yahoo Pipes page for this feed.

RegEx Module HOWTO: Geek notes follow

I've used the new Regular Expression Pipes Module to rewrite the link text and automatically rebuild the link so that it passes through Google's translate service. Here's how you do this.

  1. Your source is a foreign language RSS feed. I fetch feeds from L'Equipe and LeMonde using the FETCH module.

  2. I pipe my sources to the BABELFISH module and specify "French to English." This translates the Titles and Descriptions in the feed from French to English. This is great for reading inside the feed reader, but these feeds provide summaries only. For the entire article, you must click on the link to the French-language article.

  3. I create a REGEX module and run a pipe from BABELFISH to REGEX.

  4. Create a rule within module REGEX that looks like this:
    In LINK replace ^ with|en&hl=en&ie=UTF8&u=

    "LINK" tells the REGEX module to replace something in the link text. The carat symbol (^) is the regular expression symbolizinng "beginning of the line." This means that the regex module will point to the beginning of the link line -- which will be the start of the WWW URL pointing to the article at L'Equipe's website. The rest of it is the WWW URL for Google's translate service. The original URL -- the WWW link pointing to L'Equipe -- is moved over so that it comes after Google's translate URL, with the "u=" meaning this is the URL for Google to translate.
    Yahoo Pipes: RegEx module

  5. Run a pipe from the REGEX module to the PIPE OUTPUT, Save and Test.

  6. Some RSS feeds (including L'Equipe) create redirections, which the translator has a hard time dealing with sometimes. You may need some additional regex magic to fix the redirection problem.

If this pipe is useful, if these directions are helpful, or if you've created an RSS feed of your own using Pipes and the RegEx module, please leave a comment here.

1 comment:

Ed W said...

I tried to use this as a template to translate La Gazetta della Sport, but frankly, I got lost. That line of code is for French to English and I don't know what to do in order to get it to do Italian.

Still, it may have dubious value anyway because the Google translation pages return some very odd English!


“The exoneration me has saved the family”
The cosmoses today return in park bench with the Brescia on the field of the AlbinoLeffe: “Far away from the field I have suffered, but I have understood many things. E' be a period a diserbante, also to level personal”

Maybe this is the text source for some of the spam I get!