4. sed advanced Usage

Contents of this section

4.1 How do i use the Next command?

The Next command is used for multiline pattern space manipulation it allows you to juggle text that you wish to control over multiple lines. Where the next command outputs the contents of the pattern space then reads the next line... the Next command reads the contents of the next line then appends it to the pattern space separating first line from the second with a "\n" character. Let just suppose that we were about to pump out a "Surfing the Internet Guide" that we wanted to alter so that all instances of "Surfing Guide to the Internet" were to be more personalized to "DREAMWVR.COM Guide to the Net" . The pattern we are scanning for is spread over multiple lines and we would like to call upon the great powers of sed. How might we use the [N]ext command to achieve the desired results with the following content?:

We suggest using the Surfing the Internet
Guide that is provided at http://www.dreamwvr.com/webframe.htm

Checkout the Surfing the Internet Guide we provide for your usage.
The over 7 different methods to use this website include the Surfing the
Internet Guide we continue to mention here so that you might adventure here.

The Surfing the Internet Guide is provided to assist you in your travels.

Our first attempt is as follows to solve the problem we are having:

/Internet$/{
N
s/Surfing the Internet\nGuide/DREAMWVR.COM Surfing the Internet Guide\
/
}

This method scans for the occurances of Internet at the end of a line. Then it reads the next line of content and appends it to pattern space. Then we do our substitution replacing the "Surfing the Internet Guide" with "DREAMWVR.COM Surfing the Internet Guide" . Note the "\n" that occurs in our pattern search as this is important in this example. Next we escape the end of line "\n" character with "/" which allows us to have a "\n" after the substitution otherwise we would have one very long altered text line once our rules took effect. The rules are applied top down as sed always likes to read like me:-)

The above [N]ext example is good but not good enough for us to sleep at night so let's see if we can improve the situation to apply the alterations to all the above contents. What we wish to do is read line by line the contents of the file and alter it to make all instances of the "Surfing the Internet Guide" converted to "DREAMWVR.COM Surfing the Internet Guide" here is how...

/Surfing/{
N
s/Surfing *\n*the *\n*Internet *\n*Guide/DREAMWVR.COM Surfing the Internet Guide/
}

The above example searches to start the ball rolling for instances of "Surfing" which is the common denominator that would work here as it occurs in all relevant searches. We look for the pattern as above which takes into account 0 or more spaces followed by 0 or more '\n' carried on through for the pattern match. It then replaces the pattern the usual way. The 'N' allows us to do this search over a multiple pattern space as before. This changes everything we want and does a pretty good job to boot! Could we take the example further? Of course we could as you will notice if you try the example one could reduce the length of the first line as it is very long. How would you do it? Let me know ;-) as there are a number of ways that will work that i am aware of...
At any rate this will get you started... there are at least 3 more improvements that could be made let me know if you get around to solving them.
There are two different methods i have found varying performance with that i will share with you. The next method assumes that the your version of sed will allow multiple commands executed via sed while contained in curly braces. If your particular flavour of sed does not allow for this i do provide the second method which is a good exercise any how to interpreting how sed operates for your version. So with no further verbage here is the scoop.

/Surfer the Internet Guide/DREAMWVR.COM Surfing the Internet Guide/
/Surfing/{
N
/ *\n/ /
s/Surfing the Internet Guide/DREAMWVR.COM Surfing the Internet Guide\
/
}

/Surfer the Internet Guide/DREAMWVR.COM Surfing the Internet Guide/
/Surfing/{
N
/ *\n/ /
}
s/Surfing the Internet Guide/DREAMWVR.COM Surfing the Internet Guide\
/

Both will achieve the same result i have found with your mileage to vary depending on how your version of sed wants to play. What occurs here is the first substitution string scans for any lines that contain "Surfing the Internet Guide" replacing them with can you guess??? Of course you can once this global substitution occurs then things get interesting as we look for the pattern "Surfing" then we read the [N]ext line of content and for each instance we strip off the embedded '\n' character replacing it with a space. Once that has been done we naturally have a double long line to deal with as remember that 'N' causes the next input line to be appended to the line preceding. So we have gotten this far it is up to us now to replace the occurances of "Surfing the Internet Guide" with the replacement killer string of "DREAMWVR.COM Surfing the Internet Guide" which is what we do. We use two slightly different methods as i have found that some versions of sed don't like playing nice with me when i add all the sed commands in curly braces. Solution??? I simply don't place all the commands in curly braces except those i need to ;-)

4.2 What is the difference between the [d]elete command and the [D]elete command?

When using the [d]elete command the entire pattern space is deleted and then the next line of data is read with the script starting over from the top applying the commands in the script. So what happens if your intention is to have more granular control of the Deletion of the pattern space? Let say that you wish to be more selective that is where the [D]elete command comes into play. If you used the normal /^$/d then when this command encountered a odd number of blank lines it would not work as you might suspect. This is due to the fact that the [d]elete command scans and locates the first blank line and then reads the following line into its pattern space then since it applies the [d]elete to the entire pattern space both lines are removed absolutely. This would be alright if you wished to run the script a number of times redirecting the result to another file each time. But how about a situation where there is a request for an double spaced output file? Here is how using the new fangled [D]elete command...

/^$/{
N
/^\n$/D
}

The multi-line [D]elete command allows us better control of successive blank lines by reading both blank lines and only deleting the first of 2 consecutive blank lines. On the second path it is intuitive enough to know that upon reading the [N]ext line and determining that it is not a blank line it knows to simply output the entire 2 line pattern space. Hence our control of varying multiple blank lines is solved and our content is double spaced. This is one that i often forget how to do that is why i put it here in the first place:-)

4.3 What is so different about the [P]rint command and is there a example of it's use?

The [P]rint command prints the contents up to the first embedded newline then it waits patiently as the script continues to apply other commands to do work in the multi-line pattern space. Then after the last command is executed it prints the remaining manipulated pattern space. where [p]rint is very useful for situations where you wish to debug output from the pattern space [P]rint is used most often with other advanced commands like [N]ext and [D]elete so that the contents of the pattern space once they have received the once over with [N]ext, [D]elete, can be splashed to the screen. This example may help explain things better or make them muddier depending on how you look at the topic. Below is the input text i began with:

Linux has gone a long ways from the early
days of the System. These days the
system can claim some unique company.
Companies such as Intel, Netscape, IBM, Oracle,
HP, Toshiba, NEC, Packard Bell, SGI, Corel,
Informix, and SUN have entered alliances with Linux.
This all obtained from a System that began
as the academic exercise of a student named
"Linux" who spearheaded it less than a decade
ago. The beauty of the System is that it is
very different than the marketing hype generated
by some less genuine companies because it is not
a company but rather a mindset umbrella for revolutionary
System that is constantly inproving as it is the sum of
the minds involved creating a Greater Product that can be
shared by all. It's popularity has now entered the boardroom
and there is stopping this baby!

Now here is how i initially solved the problem. Notice there is a peculiar way that i worked around a issue i wished to avoid the first time around. Can you spot it? The debugging was over the span of a minute or two so it is far from perfect i am sure but hey it works! None the less it is not perfect and that is where you come in... fix the issue and email me the solution and i promise to add it to this faq:-) Here goes the code that allows me to change every instance of [sS]ystem to 'Operating System' throughout the document and spread the word literally about Linux all at the same time!

N
/[sS]ystem/{
s//Operating &/
s/Operating Operating/Operating/
P
D
}

Note as i mentioned before the the use of multiple advanced commands takes some getting used to but allows for more control of the text flow since it applies over more than one line. Above i used the [N]ext command, the [P]rint, and [D]elete commands all in that order. Learn to do this on the fly and your over half way there!

4.4 How do i use [hH]old, [gG]et, and [x]change?

The [h]old command is used to takes what you have in the pattern space and places it in a special area called the hold space. When you issue the [H]old command you are appending with a newline to the contents of the hold space. With the [g]et command you are getting the entire contents of the hold space and replacing the entire pattern space. Whereas with the [G]et command you are grabbing the contents of the hold space and appending this preceded by a newline to the contents of the pattern space. [x]change literally switches the contents of the pattern space with the contents of the hold space. Here is a example of content and how you might apply this knowledge:

hi
ho
hi
ho
it's off to switcheroo..
i go..

Now if i were to be so bold as to create a file called 'hiho.test' and populate it with the contents as above.. creating a sed script called 'switcheroo.sed' to drive the above 'hiho.test' script we might get some humourous results. so let's do that.. we need to create a script we call 'switcheroo.sed' as the program that does all the work for us so we can get all the credit:-)

/hi/ {
h
d
}
/ho/{
G
}

which will produce the results that we would suspect of a switcheroo. IOW the program was built to detect 'hi' and yank it out of the pattern space placing it into the hold space. Then for every 'ho' it detects it yanks the 'hi' out of the hold space and appends it to each 'ho' line first appending a "\n" character to it following this with the 'hi' word pattern. since [H]old have and [g]et do not do what we want cleanly this is left for you to play around with.

ho
hi
ho
hi
it's off to switcheroo..
i go..

4.5 How do i change the case of text from lower to UPPER?

At times it is convenient to have the ability to transform select lowercase areas into UPPER CASE characters for emphasis or make up your reasons and justify them later;-) where might this be a handy ability? Say that you were reading email and notices that routinely you would like to receive in CAPS certain Subject matter for your attention. Since your are hopefully aware that Internet mail follows standards unlike a certain large software giant whose mail systems routinely loose attachments;-} which is why you lose attachments b.t.w. you might proceed as follows.. Naturally this is only a example so you might need to adjust the script accordingly to make it work in the real world:-)

Subject: FYI using capitals can be attention grabbing
here is my content which is not saying much:-)

Notice the above Subject: FYI we will be using the 'FYI' as a flag to do something hopefully useful. what we want to have occur is that everytime someone sends us a email with the subject line of 'Subject: FYI...' that the ... content be changed to all CAPS to draw attention to the email being sent to us. if we just scanned for 'Subject:' although useful all email subject lines would be in CAPS which would defeat the purpose. here goes..

/Subject: FYI.*/{
#First we cp the pattern space to the hold space so we might use it below.
h
#Then we substitute everything after the 'Subject: FYI' into the pattern space.
s/Subject: FYI\(.*\)/\1/
#Then we use the transformers command to change all lower instances to UPPERCASE.
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
#Now we [G]et the original pattern in the hold space and append it with a '\n' to the new pattern space.
G
#Then we get a bit confusing but it works as we match all chars up to the newline. Follow that by matching
#the 'Subject: FYI' and then the space and everything else to the end of the pattern. Which leaves us with
#our need to create a final variable to contain null land which we append to the end of the new pattern.
#This leaves us in position where we wanted to live in the first place. If you neglect to use the '\3' we
#would still have the original transformed text append to our noew pattern space. Now you know:-)
s/\(.*\)\n\(Subject: FYI\).*\(.*\)/\2\1\3/
}

For additional verbage you might consider that this is a example that can be adjusted for your needs so that you might take it further by automating your recieves so that people that send you mail use 'Subject: FYI' to really get your attention. Some tweaking will be required to do this in a production environment but not much.

4.6 How do i convert straight ascii text to html format so i can read in a standards based browser ?

Now here is one that i could go on for miles about since i am one of those you know who's that never bothered to learn howto WYSIWYG. Enough harping here is the lowdown. Say you had a text only document as below that you had never got around to converting over to html. For Example...

Here is the best paragraph i could come up with under the circumstances. Seems that i am definately speechless and perhaps could even be accused of being the silent type. How do i get what i am writing from this ASCII stuff to a more popular web enabled version. hmmm... tough call there could be far better ways and naturally this is only a template but once you finish with this text you will be at least half way there.

Now somewhere else create a script with the below enscribed content:

${
/^$/!{
H
s/.*/ /
}
}

#So far you have a script that happily scans to the bottom of the content. Then it takes the last line if not blank and absorbs it into the [H]old space substituting whatever was there for a blank line. It then spirals down to the next routine..

/^$/!{
H
d
}

#Then we gobble up every line that is not blank appending it to the hold space..deleting whatever was there in the pattern space.

/^$/{
x
s/^\n/< HTML>< BODY>< P>/
s/$/<\/P><\/BODY><\/HTML>
G
}

#This produces the desired result of allowing us to transform the above straight ascii text into something far more nobler namely pure html:-) The 'x' switches the contents held in the hold space for the contents of the pattern space. We then do our substitutions which should be pretty self explanatory. Then we use the 'G' to append the blank space in the hold space to the newly switched pattern space.

< HTML> < BODY> < P>
Here is the best paragraph i could come up with under the circumstances. Seems that i am definately speechless and perhaps could even be accused of being the silent type. How do i get what i am writing from this ASCII stuff to a more popular web enabled version. hmmm... tough call there could be far better ways and naturally this is only a template but once you finish with this text you will be at least half way there.
< /P>< /BODY>< /HTML>


Next Chapter, Previous Chapter

Table of contents of this chapter, General table of contents

Top of the document, Beginning of this Chapter