Programming patterns in sed
Philip S Tellis
Yahoo! Inc
Opensource Bridge 2009
Stream EDiting
- First there was
ed
- interactive
Stream EDiting
- First there was
ed
- interactive
?
Stream EDiting
- First there was
ed
- interactive
ed
begat g/re/p
Stream EDiting
- First there was
ed
- interactive
ed
begat g/re/p
- which was Awesome!!!!
Stream EDiting
- First there was
ed
- interactive
ed
begat g/re/p
- which was Awesome!!!!
- but you couldn't really do
g/re/d
, g/re/s
or g/re/foobar
Stream EDiting
- First there was
ed
- interactive
ed
begat g/re/p
- which was Awesome!!!!
- but you couldn't really do
g/re/d
, g/re/s
or g/re/foobar
- So
ed
begat sed
as well - non-interactive
Stream EDiting
- First there was
ed
- interactive
ed
begat g/re/p
- which was Awesome!!!!
- but you couldn't really do
g/re/d
, g/re/s
or g/re/foobar
- So
ed
begat sed
as well - non-interactive
- And the world was good...
Stream EDiting
- First there was
ed
- interactive
ed
begat g/re/p
- which was Awesome!!!!
- but you couldn't really do
g/re/d
, g/re/s
or g/re/foobar
- So
ed
begat sed
as well - non-interactive
- And the world was good... even at 2400 baud
Enough history, where be dem Patterns???
Prerequisites
- You should already know a little sed
- You should know how to RTFM when in doubt
Is it an editor or a programming language?
flickr:mamk/2377536817
- Language to edit data (text) streams based on ed's command set
- No variables
- Primitive branching
- But it is turing-complete
Is it an editor or a programming language?
flickr:mamk/2377536817
- Language to edit data (text) streams based on ed's command set
- No variables
- Primitive branching
- But it is turing-complete
Is it an editor or a programming language?
flickr:mamk/2377536817
- Language to edit data (text) streams based on ed's command set
- No variables
- Primitive branching
- But it is turing-complete
Is it an editor or a programming language?
flickr:mamk/2377536817
- Language to edit data (text) streams based on ed's command set
- No variables
- Primitive branching
- But it is turing-complete
To write complex programs, all you need is if
and goto
A useful structured programming language
flickr:adobemac/2895835834
- Sequence
- Selection
- Iteration
- Variables
- File handling
- Debugging
1Sequence
sed scripts flow sequentially from top to bottom unless a branch is involved
1Sequence
sed scripts flow sequentially from top to bottom unless a branch is involved
2Selection - if some condition, do some thing
flickr:kt/1118569929
/pattern/ command
s/pattern/replace/
t label
s/pattern/replace/
T label
2Selection
flickr:kt/1118569929
/^hello/ s/^hello/hello world/
3 p
/^next\>/ {
N
s/\(.*\)\n\(.*\)/\2\n\1/
}
2Selection - Sample input
flickr:kt/1118569929
hello
my name is sed
print this line out twice
next line first
this is line #5
2Selection - Sample output
flickr:kt/1118569929
hello world
my name is sed
print this line out twice
print this line out twice
this is line #5
next line first
2Selection
flickr:kt/1118569929
/^hello/ s/^hello/hello world/
3 p
/^next\>/ {
N
s/\(.*\)\n\(.*\)/\2\n\1/
}
2Selection
For readability, use a condition followed by a code block:
/condition/ {
command1
command2
command3
}
3Iteration/Loops
flickr:bluesmoon/241327100
- Entry controlled loops
- Exit controlled loops
- Fixed counter iterations
- The
b
, t
and T
commands are used here
- We also use labels as branch targets
3aIteration - Entry controlled
flickr:bluesmoon/241327100
while(condition) {...}
- loop executed 0 or more times
:loopstart
/condition/ {
command1
command2
command3
b loopstart
}
3aIteration - Entry controlled
flickr:bluesmoon/241327100
while(condition) {...}
- loop executed 0 or more times
:loopstart
/==/ {
s/==//
r equals.txt
b loopstart
}
3bIteration - Exit controlled
flickr:bluesmoon/241327100
do {...} while(condition)
/ repeat {...} until(condition)
- loop executed 1 or more times
:loopstart
command1
command2
command3
/condition/ b loopstart
3cIteration - fixed counter
flickr:bluesmoon/241327100
for(counter) {...}
- loop executed exactly counter
times
- No math in sed
- This is harder and ugly
- We need to use the Hold space
3cIteration -fixed counter
flickr:bluesmoon/241327100
# Print the current line 10 times:
# 1. Grab the current line into the hold space
h
# 2. Replace the pattern space with x based on what we want to count to
c xxxxxxxxxx
# 3. Print the line as long as there are x left:
:loopstart
s/^x//
T loopend
x
p
x
b loopstart
3cIteration - fixed counter
flickr:bluesmoon/241327100
Avoid fixed counter for loops in sed. They can most often be replaced with an entry controlled loop over real data or a more general purpose programming language
4Variables
- An area in memory that can store variable data while the program executes
- The Hold space
- Only one exists
- Theoretically no size limit
- Simulate multiple variables using delimiters, key-value pairs perhaps
- Anyone want to try JSON?
4Variables
- An area in memory that can store variable data while the program executes
- The Hold space
- Only one exists
- Theoretically no size limit
- Simulate multiple variables using delimiters, key-value pairs perhaps
- Anyone want to try JSON?
4Variables
- An area in memory that can store variable data while the program executes
- The Hold space
- Only one exists
- Theoretically no size limit
- Simulate multiple variables using delimiters, key-value pairs perhaps
- Anyone want to try JSON?
4Variables
- An area in memory that can store variable data while the program executes
- The Hold space
- Only one exists
- Theoretically no size limit
- Simulate multiple variables using delimiters, key-value pairs perhaps
- Anyone want to try JSON?
4Variables
- An area in memory that can store variable data while the program executes
- The Hold space
- Only one exists
- Theoretically no size limit
- Simulate multiple variables using delimiters, key-value pairs perhaps
- Anyone want to try JSON?
4Variables
- An area in memory that can store variable data while the program executes
- The Hold space
- Only one exists
- Theoretically no size limit
- Simulate multiple variables using delimiters, key-value pairs perhaps
- Anyone want to try JSON?
4Variables
# 1. Swap the hold and pattern space
x
# 2. Set the pattern space to the value of your variable
# using the s, c, i, a, g or G commands:
s/$/\nfoo\n/
G
# 3. Swap the hold and pattern space again
x
4Variables - key/value pairs
x
s/$/\nname:/
G
s/name:\n/name:/
s/$/\n/
x
variables1.sed
5Debugging
- Two commands to make debugging possible
l
: (lowercase L) print out current hold space in a visually unambiguous way
=
: print out the current input line number
- Use the
x
command to swap the pattern and hold space to examine the hold space
5Debugging
- At any point of execution, add this to dump everything:
=
l
x
l
x
variables2.sed
6File handling
- Sed can read and write files a line at a time or in slurp mode
r/R
append the entire file or next line of the file to the pattern space
w/W
append the current pattern space/first line of the pattern space to the file
- Files are overwritten when the program starts, but appended during execution
6File handling - example
- Input consists of name of city followed by people from there
- Split into three files for PDX, SFO and others
- file-split.sed
Gaming
- No language is complete until you can write games with it
- sedtris