Lets take an example of a custom logging system by avoiding the complexities of hooking in Windows Event Log. Sometimes we came across a situation where we need to log an event just for temporary information but it is not required to be recorded permanently. After a specific period we review the log file and keep a clean separate copy. We want to remove all the FOR INFORMATION blocks and keep the required text. For huge files, it is very tiresome and total wastage of time.
Notepad++ is a very simple yet extremely powerful text editor. Actually it is much more than a text editor, it has existed for long and still being updated on regular basis. Above all its an open-source freeware. The product review of Notepad++ is not the topic of the post, I shall try to explain it in a single line:
The comparison of Notepad++ to (windows) Notepad is similar to the comparison of Internet Explorer 6 to Firefox
Yes, its expandable like Firefox. You can get a lot of plugins to ease your task. A plugin manager is included in Notepad++ and you can get up-to-date information about the plugins from there.
So, back to our main point, removing duplicate data. Notepad++ allows an extending Find and Replace dialog:
The default behavior of the Find & Replace dialog is similar to the native windows experience. To go beyond that, there are other option. At the bottom of dialog is the option named Regular expression. Regular expression means a combination of characters that form a pattern. You can read more on this Wikipedia page.
In our case, we are going to use regular expressions to find the required data that meets our search pattern. Lets have a look on the most simple case.
The log file contains the repeated message Failure sending mail. which could be due to the broken link to the SMTP Server or the error with SMTS itself or any other reason. Now, this message is required for information only and we do not want to preserve it. Consider a log file of thousands of record which is revised quarterly ( or any other period ) having a huge number of records.
Now, our requirement are
- Delete the message Failure sending mail.
- Delete the time stamp associated with it AND
- Clear the blank row created after removing the text
To achieve this goal, we will open the Replace dialog ( Ctrl + H shortcut also works here ), select the Regular Expression mode, keep the replace area blank ( we want to remove the rows) and mention this expression in the find area:
(.*)(Failure sending mail.)rn
Parentheses ( ) are not compulsory, they have been used for readability. The expression has three parts
- (Failure sending mail.)
The first part specifies all the text, second part species the redundant message text and the last part specify the carriage return and line feed. rn are special characters that mark the end of the line. Their detail is beyond the scope of the article. You can check them individually on their referenced wiki pages. When these three parts are combined the search criteria becomes
Search for all the text before the redundant message and it should be at the end of line.
Lets have a look on a little complex case:
In our first regular expression we specified the condition that the redundant message should be at the end of line. Now, row 18, 21 and 25, although containing the same text, does not meet the search criteria. For this kind of situation we update our regular expression:
(.*)(Failure sending mail.)(.*)rn
Here, we have revise the search criteria and added another blind search expression after the redundant message. Now, our search criteria will cover both type of the redundant messages.
It is just a very tiny touch to the power of regular expression. Using the syntax of regular expressions, we can make as many combinations as we need. An extensive and simple getting started guide for regular expression is available here.
Please remember, the expressions used in the examples are one of the possible methods to achieve the desired goal. Other combinations can be built to achieve the same goal.