------------------------------------------------------------------------
Multi-line record
------------------------------------------------------------------------

In some databases, a logical record is spread over several lines.
It may well be that the record has variable number of fields. It
helps to consolidate the fields of a given record on a single line
(so it can be fed to line oriented tools such as sed or awk).

Consider the file "Kitty.txt"

$ cat Kitty.txt
1 
hello kitty
hello mitty
2 
hello ditty
3 
hello hello hello
goodbye
night
night

We define a record as starting with a number and incorporating all
the entries until a line with a number is encountered.

The sed code is easy once we get going: store lines which do not
start "/^[0-9]+/".  When a line with "/^[0-9]+/" is encountered
then exchange the pattern space with the hold buffer and delete \n
and write out the pattern space. The trick, as always, is how you
handle the first and last line. The ordering of commands matters.

$ gsed -E -n '1{h;d};${H;x;s/\n/ /gp};/^[0-9]+/!{H;b};/^[0-9]+/{x;s/\n/ /gp}' Kitty.txt
1 hello kitty hello mitty
2 hello ditty
3 hello hello hello good bye night night

Now let us dissect this command.
$ gsed -E -n '1{h;b};${H;x;s/\n/ /gp};/^[0-9]+/!{H;b};/^[0-9]+/{x;s/\n/ /gp}' Kitty.txt
              #1     #2               #3              #4

#1 .. hold the first line (assumed to be a numbered line)
      notice the use of "b" (which moves control to the top)
      makes for efficient code (no need to go through the rest of
      commands)
#2 .. deal with the last line. it is important to deal with last
      line before dealing with any other lines -- because once you
      encounter the last line SED quits!
#3 .. non-numbered line, Hold. The delete takes control to the top
      of the
      command stack (efficient!).
#4 .. numbered line, exchange patter with Hold, get rid of \n and
      print

In any case, after spending a lot of time (but in the process I
came to learn how to solve the "last line" problem and also learnt
the value of "b" for improved efficiency).

I think this may be so useful that I converted this one-liner into
a stand-aloneutility.

$ cat multi2one
#!/bin/bash

# multi2one ... build a single logical record from several lines of fields
# multi2one rectok fs file1 
# The start of a logical record is given by "rectok" (regular expression without the slashes)
# field_separator substitutes for \n when lines are combined
# rectok, if present in a line, must be at the beginning of the line
# example:
# multi2one "^[0-9]+" "\t" Kitty.txt 
# 

rectok="$1"; shift
fs="$1"; shift

head -1 $1 | awk -v rectok="$rectok" \
 	'$0!~rectok{
	print "exiting - fist line does not start with record token"; exit -1}'

[ $? = 0 ] && 
gsed -E -n \
'1{h;b};${H;x;s/\n/'"$fs"'/gp};/'"$rectok"'/!{H;b};/'"$rectok"'/{x;s/\n/'"$fs"'/gp}' $1