As Stack Overflow grows, we're starting to look closely at our IIS logs to identify problem HTTP clients -- things like rogue web spiders, users who have a large page set to refresh chaque second, poorly written one-off web scrapers, tricksy users who essayez de increment page count a zillion times, et so forth.
J'ai come up avec a peu de LogParser queries that help us identify le plus of le oddities et abnormalities quand pointed at an IIS log file.
Top bandwidth usage by URL
SELECT top 50 DISTINCT
SUBSTR(TO_LOWERCASE(cs-uri-stem), 0, 55) AS Url,
Count(*) AS Hits,
AVG(sc-bytes) AS AvgBytes,
SUM(sc-bytes) as ServedBytes
FROM {filename}
GROUP BY Url
HAVING Hits >= 20
ORDER BY ServedBytes DESC
url hits avgbyte served
/favicon.ico 16774 522 8756028
/content/img/search.png 15342 446 6842532
Top hits by URL
SELECT TOP 100
cs-uri-stem as Url,
COUNT(cs-uri-stem) AS Hits
FROM {filename}
GROUP BY cs-uri-stem
ORDER BY COUNT(cs-uri-stem) DESC
url hits
/content/img/sf/vote-arrow-down.png 14076
/content/img/sf/vote-arrow-up.png 14018
Top bandwidth et hits by IP / User-Agent
SELECT TOP 30
c-ip as Client,
SUBSTR(cs(User-Agent), 0, 70) as Agent,
Sum(sc-bytes) AS TotalBytes,
Count(*) as Hits
FROM {filename}
group by c-ip, cs(User-Agent)
ORDER BY TotalBytes desc
client user-agent totbytes hits
66.249.68.47 Mozilla/5.0+(compatible;+Googlebot/2.1; 135131089 16640
194.90.190.41 omgilibot/0.3++omgili.com 133805857 6447
Top bandwidth by hour by IP / User-Agent
SELECT TOP 30
TO_STRING(time, 'h') as Hour,
c-ip as Client,
SUBSTR(cs(User-Agent), 0, 70) as Agent,
Sum(sc-bytes) AS TotalBytes,
count(*) as Hits
FROM {filename}
group by c-ip, cs(User-Agent), hour
ORDER BY sum(sc-bytes) desc
hr client user-agent totbytes hits
9 194.90.190.41 omgilibot/0.3++omgili.com 30634860 1549
10 194.90.190.41 omgilibot/0.3++omgili.com 29070370 1503
Top hits by hour by IP / User-Agent
SELECT TOP 30
TO_STRING(time, 'h') as Hour,
c-ip as Client,
SUBSTR(cs(User-Agent), 0, 70) as Agent,
count(*) as Hits,
Sum(sc-bytes) AS TotalBytes
FROM {filename}
group by c-ip, cs(User-Agent), hour
ORDER BY Hits desc
hr client user-agent hits totbytes
10 194.90.190.41 omgilibot/0.3++omgili.com 1503 29070370
12 66.249.68.47 Mozilla/5.0+(compatible;+Googlebot/2.1 1363 13186302
The {filename} of course would be a path to an IIS logfile, such as
c:\working\sologs\u_ex090708.log
I did a lot of web searches for good IIS LogParser queries et found precious little. These 5, above, have helped us tremendously in identifying serious problem clients. But Je suis wondering -- what are we missing?
What autre ways are there to slice et dice le IIS logs (preferably with LogParser queries) to mine them for statistical anomalies? Do you have tout good IIS LogParser queries you run on votre servers?