Caution! This message was sent from outside the University of Manitoba.

Ok, interesting, thanks.
Ya, you are onto something. I looked at a different screen and see that the asterisk means something:
hits.png

Regards,
-Montana


On Tue, Apr 22, 2025 at 3:06 PM Adam Thompson <athompso@athompso.net> wrote:
OK!  So we can make a few guesses:
  1. The bot name isn't "Unknown robot identified by bot\*", the bot name is just "bot\*".  (Actually, even this is highly suspect.)
  2. AWStats is tell you it doesn't recognize the bot ("Unknown robot identified by...")
  3. The bot name is likely bot<something>, not a literal asterisk.  I think this is AWStats telling you it matched a bot by identifying the prefix "bot", i.e. AWStats did a substring match on 'bot\*'
  4. You'll have to go awk'ing and grep'ing your access_log files (or maybe tweaking awstats?) to get the actual bot name.

If the bot name were truly "Unknown robot identified by bot\*", then
  1. you don't need the parentheses, RewriteCond expects PCRE so ( ) are only needed if grouping
  2. the backslash+asterisk combination is pretty much a worst-case scenario for correctly escaping , I would sidestep the issue by matching "Unknown robot identified by bot.." instead of "Unknown robot identified by bot\*".  A single period "." in regex is like a "?" in filename globbing, it matches any single character.

This is not a new thing with AWStats - see https://forums.classicpress.net/t/how-to-block-uknown-robots-identified-by-any-existent/2452/26 for discussion about what "bot*"  actually means.

>From that page, however, we can guess that you might be able to just write:
      RewriteCond %{HTTP_USER_AGENT} bot[\s_+:,\.\;\/\\\-] [NC]


-Adam


From: Montana Quiring <montanaq@gmail.com>
Sent: April 22, 2025 14:05
To: Continuation of Round Table discussion <roundtable@muug.ca>
Subject: [RndTbl] Re: .htaccess file: stopping robot with escape character in name
 
Sorry man, excuse my ignorance, but not sure what you are asking.
I got the bot name from AWstats, which I assume is just ASCII.

Regards,
-Montana


On Tue, Apr 22, 2025 at 1:58 PM Adam Thompson <athompso@athompso.net> wrote:
Urlencode or octal?  Or if it's a regex just use ".".
-Adam


From: Montana Quiring <montanaq@gmail.com>
Sent: Tuesday, April 22, 2025 1:47:31 PM
To: Continuation of Round Table discussion <roundtable@muug.ca>
Subject: [RndTbl] .htaccess file: stopping robot with escape character in name
 
Hello Folks,

I'm trying to stop a bot from crawling a site using the .htaccess file. The problem is that it's using the backslash character as its name. Grrr...
It's called: Unknown robot identified by bot\*
This generates an internal server error:
RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by bot\*") [NC]
I tried, this, but it didn't help:
RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by bot\\*") [NC]

Any thoughts?

Regards,
-Montana
_______________________________________________
Roundtable mailing list -- roundtable@muug.ca
To unsubscribe send an email to roundtable-leave@muug.ca
_______________________________________________
Roundtable mailing list -- roundtable@muug.ca
To unsubscribe send an email to roundtable-leave@muug.ca