HOWTO: Collect Any Text from Multiple Files In A Single File
Sometimes you need to not just find some portion of text in files, but also store the found text in a single file. HandyFile Find and Replace is capable of performing the collector
function, allowing to collect either the found text, or the result of applying a regular expression to the found text block - in other words, the replacement text, even in the Search
mode.
We shall consider a rather complex example of searching the directory for HTML files and extracting the head tags from HTML files - that is, we shall create a linked table of contents.
The Problem
We expect receiving a file with a table of contents made up using the following rules.
The Solution
- Specify the folder that contains the HTML files you want to create the contents for. For example:
C:\MyWebFiles\MyBook
- Set mask(s) of the files that you want to find. For example:
*.htm*
- As we want to search for irregular text blocks, we should use Regular Expressions. Enable them by checking the corresponding option.
Phase 1. Bookmark Headings
- Now we have to construct a search expression. The best way to accomplish this is using the Regular Expression Laboratory. It allows you to
provide any sample text, enter an expression and see how it works.
For being short, we shall omit the construction procedure. The expression that will match any heading tag with or without
attributes will look as follows:
\<(H[1-6])(.@)\>([^\<]#)\<\/H[1-6]\>
! ! !
expr1 expr2 expr3
- The Collector features using the replacement expressions even if simply searching for text. Should we simply collect headings, we could manage without any modifications. But before we
collect the contents entries, we need to
insert bookmarks in headings, so we shall now perform replacement operation.
- The replacement operator could be:
<\1\2><a name="#\R">\3</a></\1>
- Click the Replace button. Now we are through with tagging headings.
Phase 2. Generate the Contents
- Go to the Collect tab and check the Collect... option. Set the path to the collector file:
C:\MyWebFiles\toc.html. Set the Collected text to Replacement
text, and Text entry separator to New line.
- The search expression is:
\<(H[1-6]).@\>\<a name\=\"(.#)\"\>([^\<]#)\<\/a\>\<\/H[1-6]\>
! ! !
expr1 expr2 expr3
- The replace expression that in fact will form each contents entry, is:
<div class="toc\1"><a href="/pch:"c:/mywebfiles"#\2">\3</a></div>
- Click the Search button. After the contents file is generated, you can use it as a starting point to creating a richer web page.