I want to "grep" http, and ftp, address' from newsgroup posts that I have kept BUT I cannot work out how it can be done with your program. I once tried a grep utility that seemed to do it but I have forgotten how I did things.
I think the "find" was something like .*((ht|f)tp://www[\w./#-]+).* . If something similar works in your program I don't know what the
"replace" criteria should be. Also that doesn't cover links that don't have the ftp OR http start e.g. www.somewhere.com. So I need to find ftp AND http
AND www "strings".
I start with e.g.
a good one is http://www.deltasoft.hr/remote/remadmin.zip which is the page at http://fastq.com/~sckitching/Python/wordlen.zip is my a faster ftp site is ftp://203.131.69.75/lorma_v.3.iso and don't forget www.burrotech.com/pa_inst.exe is also free.
I want:
http://www.deltasoft.hr/remote/remadmin.zip http://fastq.com/~sckitching/Python/wordlen.zip ftp://203.131.69.75/lorma_v.3.iso www.burrotech.com/pa_inst.exe
Can you help please?
You can notice that an URL in your case starts with ftp:// or http:// or www. We can distinct such URL's from common text by the suffix: :// for
protocols and the dot . for domain (www). Then, first part of an URL may contain digits, letters, dots, underscores, dashes. Thus, the pattern that would match your example would be:
{{http}|{ftp}\:\/\/}|{www\.}[\d\w\_\.\-]+{\/\S+}?
where {\/\S+}? means matching the relative part of an URL, for example /dir/file.ext in http://dom.com/dir/file.ext . This allows to find any URL's,
with or without file names.