HandyFile Find And Replace Online Help Submit feedback on this topic   

KB0005: Grep http and ftp addresses (URL matching)

Question

I want to "grep" http, and ftp, address' from newsgroup posts that I have kept BUT I cannot work out how it can be done with your program. I once tried a grep utility that seemed to do it but I have forgotten how I did things.

I think the "find" was something like .*((ht|f)tp://www[\w./#-]+).* . If something similar works in your program I don't know what the "replace" criteria should be. Also that doesn't cover links that don't have the ftp OR http start e.g. www.somewhere.com. So I need to find ftp AND http AND www "strings".

I start with e.g.

a good one is http://www.deltasoft.hr/remote/remadmin.zip which is
the page at http://fastq.com/~sckitching/Python/wordlen.zip is my
a faster ftp site is ftp://203.131.69.75/lorma_v.3.iso and
don't forget www.burrotech.com/pa_inst.exe is also free.

I want:

http://www.deltasoft.hr/remote/remadmin.zip
http://fastq.com/~sckitching/Python/wordlen.zip
ftp://203.131.69.75/lorma_v.3.iso
www.burrotech.com/pa_inst.exe

Can you help please?

Answer

You can notice that an URL in your case starts with ftp:// or http:// or www. We can distinct such URL's from common text by the suffix: :// for protocols and the dot . for domain (www). Then, first part of an URL may contain digits, letters, dots, underscores, dashes. Thus, the pattern that would match your example would be:

{{http}|{ftp}\:\/\/}|{www\.}[\d\w\_\.\-]+{\/\S+}?

where {\/\S+}? means matching the relative part of an URL, for example /dir/file.ext in http://dom.com/dir/file.ext . This allows to find any URL's, with or without file names.