python - Spliting a sever access log string -
i have been trying time split server access log string, no avail.the string comes in format
10.223.157.186 - - [15/jul/2009:15:50:35 -0700] "get /assets/css/reset.css http/1.1" 200 1014
%h ip address of client %l identity of client, or "-" if it's unavailable %u username of client, or "-" if it's unavailable %t time server finished processing request. format [day/month/year:hour:minute:second zone] %r request line client given (in double quotes). contains method, path, query-string, , protocol or request. %>s status code server sends client. see see status codes 200 (ok - request has succeeded), 304 (not modified) , 404 (not found). %b size of object returned client, in bytes. "-" in case of status code 304i have used `str.split() , str("\t") no avail. help
you need utilize regular expression
for instance if had this:
import re line = '10.223.157.186 - - [15/jul/2009:15:50:35 -0700] "get /assets/css/reset.css http/1.1" 200 1014' regexstring = r'(?p<ip>[0-9.]+) (?p<id>[\w-]+) (?p<user>[\w-]+) (?p<time>\[.*\]) (?p<request>".*") (?p<status>\d+) (?p<size>\d+)' regex = re.compile(regexstring) match = regex.match(line) if match != none: ip = match.group('ip') id = match.group('id') # etc.
if want extract each thing time, i.e. day, month, year, etc. can either run regex on match.group('time') or can more explicit in regexstring how parse it. instance instead have: \[(?p<day>\d+)/(?p<month>[a-za-z]+)/(?p<year>\d+):(?p<hour>\d+):(?p<minute>\d+):(?p<second>\d+) -(?p<zone>\d+)\]
this you: regexstring = r'(?p<ip>[0-9.]+) (?p<id>[\w-]+) (?p<user>[\w-]+) \[(?p<day>\d+)/(?p<month>[a-za-z]+)/(?p<year>\d+):(?p<hour>\d+):(?p<minute>\d+):(?p<second>\d+) -(?p<zone>\d+)\] (?p<request>".*") (?p<status>\d+) (?p<size>\d+)'
python
No comments:
Post a Comment