How can I grep http and ftp addresses? (URL matching)


I want to "grep" http, and ftp, address' from newsgroup posts that I have kept BUT I cannot work out how it can be done with your program. I once tried a grep utility that seemed to do it but I have forgotten how I did things.

I think the "find" was something like .*((ht|f)tp://www[\w./#-]+).* . If something similar works in your program I don't know what the "replace" criteria should be. Also that doesn't cover links that don't have the ftp OR http start e.g. So I need to find ftp AND http AND www "strings".

I start with e.g.

a good one is http://www.xxxxx.xx/remote/ which is
the page at http://xxxx.xx/~xxxx/Python/ is my
a faster ftp site is and
don't forget is also free.

I want:


Can you help please?


You can notice that an URL in your case starts with ftp:// or http:// or www. We can distinct such URL's from common text by the suffix: :// for protocols and the dot . for domain (www). Then, first part of an URL may contain digits, letters, dots, underscores, dashes. Thus, the pattern that would match your example would be:


where {\/\S+}? means matching the relative part of an URL, for example /dir/file.ext in . This allows to find any URL's, with or without file names.

  1. Type the search expression {{http}|{ftp}\:\/\/}|{www\.}[\d\w\_\.\-]+{\/\S+}? in the Find What field. If you want to transform the URL's, modify the expression by enclosing the required URL parts in round brackets.
  2. If needed, type the text that will replace the match in the Replace With field.
  3. Check the Regular Expressions box.
  4. Click Search to find URL's. To transform them, click Replace