Anyway. The script. In the past post I showed you how to (basically) setup the urchin script and make the command line calls to it. In this part, I will show you what I did to automate the process for ytd 2007, including all the months up to the current month. I utilized the bash quite a bit here. I'm sure there is a faster way to automate this without using as many tmp files, but I like to keep my data in stages.
Here is the script:
#!/bin/bash
cd /Users/ChrisCopeland/Apps/scripts/urchin
currentMonth=`date '+%m'`
currentdate=`date '+%Y%m%d'`
currentYear=`date '+%Y'`
i=1
let loopVar=currentMonth+1
echo "Monthly Reports Available :"
while [ $i -lt $loopVar ]; do
echo "$i"
let i=i+1
done
read -p "please select the month or enter ytd for year to date: " -e input
case "$input" in
'ytd')
perl u5data_extractor.pl --begin 20070101 --end $currentdate --max 21000 --report 1201 >>fileTMP;
;;
'1')
perl u5data_extractor.pl --begin 20070101 --end 20070130 --max 21000 --report 1201 >>fileTMP;
;;
'2')
perl u5data_extractor.pl --begin 20070201 --end 20070230 --max 21000 --report 1201 >>fileTMP;
;;
'3')
perl u5data_extractor.pl --begin 20070301 --end 20070330 --max 21000 --report 1201 >>fileTMP;
;;
'4')
perl u5data_extractor.pl --begin 20070401 --end 20070430 --max 21000 --report 1201 >>fileTMP;
;;
'5')
perl u5data_extractor.pl --begin 20070501 --end 20070530 --max 21000 --report 1201 >>fileTMP;
;;
'6')
perl u5data_extractor.pl --begin 20070601 --end 20070630 --max 21000 --report 1201 >>fileTMP;
;;
'7')
perl u5data_extractor.pl --begin 20070701 --end 20070730 --max 21000 --report 1201 >>fileTMP;
;;
'8')
perl u5data_extractor.pl --begin 20070801 --end 20070830 --max 21000 --report 1201 >>fileTMP;
;;
'9')
perl u5data_extractor.pl --begin 20070901 --end 20070930 --max 21000 --report 1201 >>fileTMP;
;;
'10')
perl u5data_extractor.pl --begin 20071001 --end 20071030 --max 21000 --report 1201 >>fileTMP;
;;
'11')
perl u5data_extractor.pl --begin 20071101 --end 20071130 --max 21000 --report 1201 >>fileTMP;
;;
'12')
perl u5data_extractor.pl --begin 20071201 --end 20071230 --max 21000 --report 1201 >>fileTMP;
;;
esac
#cleans the slash for easier editing
tr "/" "_" <>fileTMP2
#start of line removal
sed '
s/_index.cfm//
s/_&safe=vss//
s/_&adlt=strict//
s/.cfm//
s/,//
s/\ $//' <>fileTMP3
cat cleanThese | while read line; do
sed -ie "/$line/d" fileTMP3
done
less supplierTMP3 | cut -c 2-500 | grep -v "^ " | grep -v "^_" | grep -v "^#" >>fileTMPws
tr -s " "
sed 's/ /,/'
#cleans temp directory
rm fileTMP*
echo "your report is complete"
Now we can go through the script. The first part should be self explanatory, I am setting the values for the current date, month, and year (year is not yet implemented in my script, but will be soon). Then I ask the user for which report (monthly) they would like to generate, and show them a numerical list of the reports available (from 1 to current month). It will be in this case function that I will implement the current year, so that a user can get the month and year data needed. At the moment we are only interested in the year 2007.
Next comes the hairy part - the cleaning of useless data. I have a file for the most common items I want filtered. The file is called cleanThese (simple name). Before I open that file though, I want to clean certain characters and items which get skipped over do to the fact that this log file which is generated, will have a list of urls and paths in it. Paths and urls have weird characters sometimes, like ";" ":" "/" etc. Try passing these into a command line sometime and you will see how troublesome they can be. So let's get them out of there.
tr "/" "_" - replaced "/" with an underscore, which will make it easier to clean the rest of the log.
Now we throw this whole thing to sed - a great program.
sed '
s/_index.cfm//
s/_&safe=vss//
s/_&adlt=strict//
s/.cfm//
s/,//
s/\ $//' <>fileTMP3
we have a lot of oler cold fusion files, and some items that have a problem in the clean file, this sed command, which is a chained command, one per line, cleans these things out and leave nothing in the pattern's place. You can see now why I cleaned out the extra "/", I would have been passing a /// to sed, which it doesn't understand.
Now the cleanThese file:
cat cleanThese | while read line; do
sed -ie "/$line/d" fileTMP3
done
This reads each line of the cleanThese file (which I can modify at my desire) and replaces the entire line of the patten with nothing, effectively removing the line.
Then I want to clean up the formatting from the output of the cleaning:
less fileTMP3 | cut -c 2-500 | grep -v "^ " | grep -v "^_" | grep -v "^#" >>fileTMPws
This line will cut certain characters out (based on the original output), pass these to grep with an inverted search 3 times looking for different patterns, then writes that out to yet another tmp file.
The next line:
tr -s " "
compresses all the space characters to one, outputs that to another file, which sed will take in, and replace the now single space, with a "," - effectively making this a csv file, which is names with the current month and set to a directory.
The next couple of lines clean the directory of tmp files and report to the user in the shell that the report is ready.
This will leave you a clean, importable, ready to query csv file just aching to be imported into a SQL engine of some sort.
Again if you want to run a different type of report, or use the case statement to generate a set of reports, you can visit my website to find a list of urchin reports available.
I want to find a good way to make this script available in a web interface at some point. I would also like to give the user a list of reports and years at the front of the application, just to help automate the process further.
Please Enjoy!
No comments:
Post a Comment