Wednesday, January 19, 2011

Cisco PIX 515e dropping IPSEC tunnels to ASA 5505 over time

We have a Head-Office/Branch-Office WAN like this,

Server LAN <-> Cisco PIX 515e <-VPN tunnel-> Cisco ASA 5505 <-> Client LAN 1
                              <-VPN tunnel-> Cisco ASA 5505 <-> Client LAN 2
                              <-VPN tunnel-> Cisco ASA 5505 <-> Client LAN 3
                               ...  
                              <-VPN tunnel-> Cisco ASA 5505 <-> Client LAN 66

Problem:
5% of these VPN tunnels degrade over time.

Symptoms:

  • Clients respond to PING, but not to RPC or RDP.
  • On the ASA, VPN tunnels goes from 1 x IKE, 2 x IPSec down to 1 x IKE, 1 x IPSec.
  • A restart of the ASA resolves the problem temporarily.

This PIX has been unreliable, and will probably be replaced with a more modern bit of gear. Although usually under 10%, the CPU on the PIX periodically hits 80-90% with traffic spikes, but I can't say I've been able to correlate dropped tunnels with these loads.

I have a few specific questions, but am grateful for any and all insights.

  1. Can I monitor (via SNMP) the total IPSec tunnels on the PIX? This should always be (at least?) twice the number of branch offices, and (at least?) twice the total IKE - if it drops then I probably have a problem.

  2. Is there an event I can alarm on in the PIX's own logging, when one of these tunnels is dropped? Maybe,

    snmp-server enable traps ipsec start stop
    
  3. Is there anything I can do to keep this tunnel alive, until the PIX can be replaced? I was thinking of scriptable keep-alive traffic, PING doesn't seem to cut it. I am also looking at idle time-out values, maybe re-keying intervals, any other ideas?


PIX515E# show run isakmp
crypto isakmp identity address
crypto isakmp enable outside
crypto isakmp policy 10
 authentication pre-share
 encryption 3des
 hash sha
 group 2
 lifetime 86400
crypto isakmp nat-traversal  20


PIX515E# show run ipsec
crypto ipsec transform-set ESP-3DES-SHA esp-3des esp-sha-hmac


PIX515E# show version

Cisco PIX Security Appliance Software Version 7.2(4)
Device Manager Version 5.2(4)
  • I am seeing the same thing myself. I just setup a PIX515 running 8.04 IPSEC to my ASA5510 8.2. It works great and then the tunnel just dumps everyone. During this time, the internet keeps going just fine. So, it's just the tunnel that is having problems.

  • 1) You absolutely can monitor the number of IPSec tunnels, but we’ve found that not to be a truly reliable way of determining if connectivity is working. It’s always best to send and receive traffic via the tunnel to confirm connectivity (e.g. ping monitor).

    2) Same as #1 – it can be done, but may not give you usable information. Tunnels will start and stop in the normal course of operation depending on timeout intervals.

    3) While it’s not supposed to be necessary, we have seen improvement with tunnel connectivity in some situations by running a ping at frequent intervals (3-5 minutes). Hard to say whether that would help in this situation without in-depth analysis.

    Generally speaking, issues like this occur frequently due to VPN config mismatches between the head end and remote end VPN peers. Differing ACLs are often a problem.

    From Jim B
  • Do the tunnel ever come back up by itself or do you manually intervene to get the tunnel back up?

    What is the lifetime set on the ASA?

    And do you have keepalives enabled/disabled on both devices?

    I've seen this issue before between a Cisco 6500 running IOS and an ASA where the IOS is happy to run without an SA (if it expires for whatever reason) where a ASA is not and the tunnel dies for a random period of time until it renegotiates and the tunnel comes back up until the SA expires again.

    Ewan Leith : I've seen the same kind of thing as Conor, it certainly sounds like an SA timeout is different somewhere along the line
    From Conor

WebDav; Restrict to directory on a per-user or per-group basis?

Using Apache / WebDAV / mod_auth_mysql, is it possible to restrict users' (or groups') access to specific directories as defined in the MySQL auth table?

More specifically, I would like to specify which directories a WebDAV user is allowed to access based on data that exists inside the MySQL table. For example, if I had a user management interface built in Python or PHP in that allowed me to specify the directory a user would be allowed to access to, I would save this on the same row as the user's user / pass pair that mod_auth_mysql uses to auth WebDAV users. How do I configure the mod_auth_mysql block to specify the directory a user is allowed to access? Or would this be done elsewhere?

  • If you're using Apache, you can just use Apache's standard access control mechanisms ("require user", "require group", etc. inside of of blocks). There's nothing particularly special about WebDAV.

    As for your second paragraph, that I'm not sure about. An ugly solution, if your permissions aren't changing all that often, would be to have an external script general an Apache configuration from the database and then restart Apache.

    From larsks

How can I configure MySQL data folder in a Red Hat cluster suite running in VMware

Hi,

I have configured Red Hat cluster in VMware. I created two nodes Node00 and Node01 which are running CentOs 5.2. I have added MySQL service to my cluster. When I suspend Node00 then instance is moved to Node01. The problem is when MySQL instance is moved to Node01 it uses the data folder of that local system.

I have installed Openfiler in another VMware machine called Node02 and configured NFS share. The NFS share is working fine. I want that the data folder to be kept in the NFS share and MySQL to use the common data folder and configuration files from the NFS share.

Can anyone please help me to configure my cluster for the same.

Warm Regards

Supratik

  • change in my.cnf datadir path you can change to your nfs mountpoint.

    [mysqld] user=mysql_owner datadir=/path/to/datadir/mysql socket=/path/to/datadir/mysql/mysql.sock skip-innodb

    [mysql.server] user=mysql_owner basedir=/path/to/datadir

    [client] user=mysql_owner socket=/path/to/datadir/mysql/mysql.sock

    [safe_mysqld] err-log=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid

    From Rajat
  • Where shared NFS storage is mounted on Node01?

    I think it is better to mount NFS share in /var/lib/mysql (default for RedHat) instead of configuring paths for server and clients.

    I don't know what do you want to achieve. But if you try clustering mysql this way you are wrong and could couse data inconsistency. This setup is only valid, if mysql service on Node01 is started after Node00 failure.

    If you want to have 2 instances of mysql accesing data concurently, you should use NDB cluster. If you start two mysql instances accesing the same datadir, you get data inconsistency.

    For stand-by purpose you should better use DRBD replication or mysql replication (probably with MMM).

    From ms

Change network settings based on which network I'm connected to

I need a program which, when I connect to different networks (wired, through eth0), will switch my network profile according to the MAC of the router I'm connecting to. Any ideas?

  • If the laptop is IBM/Lenovo, the "Access Connections" software does an acceptable job managing the connections. It can run programs, remember proxy settings, set default printer, and other things based on the network to which it is connected. I'm not sure if it uses the MAC address of the device or some kind of "port" identification.

    womble : EINCORRECTOPERATINGSYSTEM
    tomjedrz : @womble .. incorrect comment .. there is an Access Connections for Linux.
    From tomjedrz

How much memory can SQL Server 2005 x86 use when installed on Windows Server 2008 x64?

If I install SQL Server 2005 x86 on Windows Server 2008 x64 how much memory will SQL be able to use be default? How much after setting SQL server's AWE switch?

This post talks about using /3gb, /PAE, and AWE to utilize SQL Server 2008 x86 memory appropriately on Windows Server 2008 x86.

My hypothesis based on that post and related information I have seen elsewhere is that by default a SQL Server 2005 x86 instance will be able to use 4GB of memory on Windows Server 2008 x64 and if I enable AWE then SQL will be able to use as much memory as the OS's sees.

PS: Please note if your answer generalizes to other version of SQL and or Windows Server.

Thanks

  • x86 processes with the LargeAddressAware bit set (like sqlservr.exe) get a full 4 GB VAS for themselves. Enabling AWE (on OS and SQL editions that support it) allows the SQL server instance to map extra pages (up to 64GB) in and out of its VAS and use them for the buffer pool.

    That being said, x86 is a dead end and you should switch to an x64 instance ASAP.

    ObligatoryMoniker : What does enabling AWE on the OS mean? I haven't seen a way to do this and google isn't showing me any examples of enabling AWE in the context of the OS, only in the context of the SQL instance.
    Remus Rusanu : AWE is an OS concept, not a SQL one. http://msdn.microsoft.com/en-us/library/aa366527(VS.85).aspx
  • AWE is not needed for 64-bit systems; see this, SQL Server Standard or Enterprise will see whatever the operating system presents to it.

    You'll need Windows Server 2008 Enterprise edition to see more than 32GB of RAM, Standard is limited to 32GB. Refer here

    Remus Rusanu : The link reffers to the x64 SQL. An x86 SQL process running in Wow64 still needs `sp_configure 'awe enabled', 1`
    SqlACID : Right you are, I misread the question. I would have to ask why install x86 SQL on x64 OS?
    ObligatoryMoniker : See the my comment on the original question.
    From SqlACID
  • 4 GB per instance for SQL 2005 Standard/Enterprise.

    An x86 process on x64 can use a max of 2GB of RAM, or 4 GB if the application is compiled/linked with the /LARGEADDRESSAWARE switch.

    See: Memory Limits for Windows Releases:

    http://msdn.microsoft.com/en-us/library/aa366778%28VS.85%29.aspx

    http://www.wintellect.com/CS/blogs/jrobbins/archive/2009/04/02/link-32-bit-native-c-exes-with-largeaddressaware.aspx

    From Greg Askew

mysql.sock problem on Mac OS X, all Zend products

Hi folks. I posted this on the Zend forum, but I'm hoping I can get a speedier reply here. I've tried every solution provided on this forum with no luck. When I restart mysql, everything appears ok.

sudo /usr/local/zend/bin/zendctl.sh restart 
Password:
/usr/local/zend/bin/apachectl stop [OK]
/usr/local/zend/bin/apachectl start [OK]
Stopping Zend Server GUI [Lighttpd] [OK]
spawn-fcgi: child spawned successfully: PID: 7943
Starting Zend Server GUI [Lighttpd] [OK]
Stopping Java bridge [OK]
Starting Java bridge [OK]
Shutting down MySQL
. SUCCESS! 
Starting MySQL
. SUCCESS!

Pinging locahost is also OK and resolve dns to IP.

ping localhost
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.048 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.076 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.064 ms

But when I attempt to access the local url for my app, I get the dreaded:

Message: SQLSTATE[HY000] [2002] Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2).

This is a show-stopper for me. I appreciate any assistance. Thank you.

  • MySQL's idea of localhost isn't the IETF's idea of localhost. To MySQL, localhost means "a Unix socket of random location", which is the cause of endless frustration (and, in all likelihood, anal warts and global warming). If you want to use a Unix socket, then you'll need to configure Zend to use the correct socket (I don't know how, nobody seems to be able to decide on a standard location for the socket, nor a standard way to tell everything else where it is). Honestly just changing your server address to 127.0.0.1, which means "localhost, no really, I'm not kidding, yeah, the real one on the network" is a far less frustrating way to go (just make sure you've told MySQL itself to listen on the network by removing skip-networking).

    From womble
  • Check your /etc/mysql/my.cnf and see what socket is set to under your [mysqld] section. Then, you can either change that to /tmp/mysql.sock or just make a symlink to where it puts it, e.g.

    ln -s /var/run/mysqld/mysql.sock /tmp/mysql.sock
    

    Of course most *nix OS's wipe /tmp on shutdown so the first method is probably best, unless you're me and think that the socket has no business being in /tmp

    And on a side note, you should check out the Python framework Django, it made me happily leave Zend Framework in the dust.

    Michael Stelly : thx for the comment. but when the boss says "zend framework", then that's what we do -- nowatimsayin'. ;-)
    SleighBoy : Indeed I do. :)
    From SleighBoy
  • The default Mac Zend install creates the socket file here:

    /usr/local/zend/mysql/tmp/mysql.sock

    Take a look at the "socket" variable in your "my.cnf" file located in "/usr/local/zend/mysql/data" to identify where it is or move it to a more convenient spot.

    From

Unexplained spike in web traffic

Question

I am suspicious of an unexplained 1600% increase in traffic and massive slow-down that lasted about 10 minutes. I'm not sure if it was an attempted DoS attack, dictionary login attack, etc. Regardless, what actions should I take to monitor my server (which logs should I look at, what tools should I use, etc.) to make sure nothing nefarious happened? What steps should I take during future slowdowns such as these? Is there a standard way to have the server alert me during such a surge in traffic?

All the gory details:

One of my clients reported an unresponsive website (Ruby on Rails via Apache, Mongrel, and mongrel_cluster on a CentOS 5 box.) around 1:00 today.

I was in full troubleshooting mode when I got the email at 1:15. It was indeed exceptionally slow to ssh and load web pages, but ping output looked fine (78 ms), and traceroute from my workstation in Denver showed slow times on a particular hop mid-way from Dallas to the server in Phoenix (1611.978 ms 195.539 ms). 5 minutes later, the website was responsive & traceroute was now routing through San Jose to Phoenix. I couldn't find anything obviously wrong on my end--the system load looked quite reasonable (0.05 0.07 0.09) and I assumed it was just a networking problem somewhere. Just to be safe, I rebooted the machine anwyay.

Several hours later, I logged to Google Analytics to see how things looked for the day. I had a huge spike in hits: Usually this site averages 6 visits/hour, but at 1:00 I got 130 (a 1600% increase)! Nearly all of these hits appear to come from 101 different hosts spread across the world. Each visitor was on the website for 0 seconds and each visit was direct (i.e. it's not like the web page got slashdotted) and each visit was a bounce.

Ever since about 1:30, things are running smooth and I'm back to the average 6 visits per hour.

Disclaimer:

I am a code developer (not a sysadmin) who must maintain web servers for machines that run the code that I write.

  • it's unclear what you were pinging/tracing and from where. But if that was a hop in a middle of a traceroute's output, then jump from 190 ms to 1600 ms probably means network congestion. If this correlates to your event and switching of a routing path, it is possible that a part of your providers network was attacked including your server.

    There is no single solution to your problem. There are many tools and approaches, like Scout, Keynote, New Relic, Nagios, etc. It all depends. Whatever you decide to do, just don't forget one thing, that if you monitor something on a server and from that server, and that server becomes unavailable you loose any means to notify yourself that it is down :)

    From monomyth

Require authoritative answer for DNS resolution

We have two forward lookup zones (intranet.com and mayberry.com) that aren't actually registered to us. Sometimes, our MS DNS server forwards the queries for network resources within these domains to OpenDNS, who is our forwarder.

OpenDNS then responds with the IP address of their "Not Found" page, therefore creating a problem until we flush the client's DNS and try again.

Is there any way to insure that these domains are only resolved by our DNS server? Perhaps a way to block forwards for these domains or only allow an authoritative answer for them?

Thanks for the help!

  • Yeah, use a real DNS server (instead of MS).

    Graeme Donaldson : -1, that's not very helpful. This isn't Slashdot.
    Alnitak : it's actually a sane answer, although it could have been put better. A "real" DNS server (e.g. BIND or unbound) would although you to configure authoritative answers for your local zones whilst forwarding queries for other zones to offsite recursive servers.
    Evan Anderson : @Alnitak: And the Microsoft DNS server doesn't allow you to do that how? What you're saying sounds like "BIND lets you host forward lookup zones and use forwarders", which the MS DNS server does, too. Aside from a lack of "views", Microsoft's DNS server has a pretty reasonable set of functionality. For the average DNS server on a LAN serving recursive resolution requests for client computers it does fine.
    phoebus : @Alnitak MS DNS will allow you to do the same thing.
  • If OpenDNS is returning an answer (its "Not Found" page) instead of saying there's no answer then it is speaking lies, but you cannot control that.

    If your DNS server is authoritative for a domain (a zone), it will only return what it knows. I've never seen any DNS server forward requests when it is authoritative.

    If your client has multiple DNS servers that include your DNS servers and also other DNS servers, then it is possible for the client to pick one of those other servers and thus get back answers when your servers would have said "no name" or similar.

    All of the above is true for all DNS servers, MSFT or otherwise.

    From Beau Geste
  • You can turn that feature off (somewhat) in your OpenDNS control panel, or change to a forwarder that doesn't do that. Google DNS works well (reportedly), or you could just run your own recursive server that doesn't rely on an upstream.

    From Bill Weiss
  • Whilst changing your forwarders to ones that don't do NXDOMAIN rewriting may achieve your ends, that would only be addressing the symptoms of the problem and not the root cause.

    To fix the root cause you need to prevent MS DNS from forwarding queries for your internal domains offsite.

    If that proves to be00 impossible, there are several nameservers that will happily (and reliably) serve local data authoritatively, whilst forwarding queries for other domain names offsite.

    My personal favourites are BIND and Unbound, both of which are available for Windows servers.

    From Alnitak
  • This seems to be a name resolution issue in your DNS server. For some reason when someone queries your DNS Server for addresses in the two zones above, your DNS server is replying with 'I don't know, go ask Open DNS' this could be caused because your forward lookup zones are not completely current with all of the addresses in the two domains in your question. I would compare your zone listings with their domains to insure that your information is current.

    You can do a bit of double redundancy to allow for multiple query paths for the domains in question. You have a couple of forward look-up zones, and that is good, but if those domains have their own DNS Servers you might also try putting forwarders into your DNS server specifically for those domains.

    If you go to the forwarders tab in DNS you can hit the 'New' button under the 'DNS Domain' box and add a specific entry for the domains you are trying to hit. This will add those entries under the 'For all other DNS domain' listing. You can then go to each one and specify the IP address of a DNS server in those domains.

    From Laranostz
  • The Microsoft DNS server won't forward requests for domains it's authoritative for. I suspect that you've specified a "secondary" DNS server on your client computers that refers to another DNS server (like, say, OpenDNS) and you're periodically getting resolution from this secondary DNS server.

    If you're in an Active Directory environment no domain-joined computer should have any DNS server specified in its NIC properties (either hard-set or delivered via DHCP) that refers to a DNS server that isn't running on one of your domain controllers. Your DNS servers running on your DCs should be resolving external-to-the-forest names either via forwarders to another DNS server, or via root hints.

    Edit:

    It sounds like you're saying that you have a DNS server specified on the clients that's not a domain controller (i.e. "my gateway").

    It's unclear what you mean by "is a slave for DC". Assuming the IP address of the IP address of the DC is "X.X.X.X", the IP address of the "gateway" specified as a secondary DNS server is "Y.Y.Y.Y", and one of the internal domain names that isn't resolving properly is "foo.com", run the following commands and compare the output:

    nslookup foo.com X.X.X.X
    nslookup foo.com Y.Y.Y.Y
    

    The output should match. If it doesn't, then the "gateway" is resolving the internal domain name differently than the domain controller and that's your problem.

    As long as the "gateway" resolves names exactly like a domain controller it's not a problem to use it as a secondary DNS server. If it doesn't resolve names exactly the same way, though, you shouldn't be using it as a secondary DNS server. Every time you add an AD-integrated DNS zone to your DC you'll need to configure the "gateway" to resolve names in that zone the same way.

    Scott Forsyth - MVP : I agree. As long as the zones are entered on all of the DNS servers, MS DNS shouldn't look elsewhere for the answer. There must be something else in play.
    blank3 : re: no domain-joined computer should have any DNS server specified in its NIC properties... -- I have only one DC and I push through DHCP my gateway as secondary DNS (which is as a slave for DC and a forwarder).
    Evan Anderson : @blank3: I'll drop on an edit asking for some clarification.

Force Fresh Images on IIS 7

How do I force IIS 7 to not cache images on a certain page?

  • I don't think the IIS web server is the one caching pages - it's the client's browser.

    You can add a meta tag to the pages you don't want the client side to cache, and there are ways to do this for different older browsers and such.

    If you write in ASP and want the same non-cache effect, here's the header information.

    <% Response.CacheControl = "no-cache" %>>
    <% Response.AddHeader "Pragma", "no-cache" %>
    <% Response.Expires = -1 %>
    
    Chris W. Rea : Those Response directives will only set the headers for ASP pages, not for static images. IIS settings need to be adjusted for static images.
    From Mike
  • The thing you are looking for is cache-control header value (note that this only works for browsers that respect http 1.1)

    For asp the code is:

    <% @Language="VBScript" %>
    <% Response.CacheControl = "no-cache" %>
    

    You can also set this directly on a folder using the metabase:

    Here's how you would set the folder pix on the default website: Open a command prompt and change to your C:\InetPub\AdminScripts folder. Run the following command: CSCRIPT ADSUTIL.VBS SET W3SVC/1/ROOT/pix/CacheControlCustom "no-cache"

    Note the possible values are "no-cache" , "Public", "Private"

    Yo can also set this via ADSI:

    Option Explicit
    Dim objCache
    Set objCache = GetObject("IIS://localhost/w3svc/1/root/pix")
    objCache.CacheControlCustom = "no-cache"
    objCache.SetInfo
    

    So far these approaches will work on IIS6 and IIS7 so long as you have the IIS6 admin tools installed. For a pure IIS7 environment here are the appcmd commands:

    First unlock the config section

    appcmd unlock config /section:staticContent
    

    Now you're good to change the caching options for static content. Make static content non-cacheable by setting "Cache-Control: no-cache":

    appcmd set config "Default Web Site/<Application>/<Folder>" /section:staticContent /clientCache.cacheControlMode:DisableCache
    

    Where <Application>/<Folder> is the path to your folder

    See also IIS 7.0: clientCache Element for staticContent (IIS Settings Schema)

    From Jim B

Cannot connect to FTP 7 server other than via localhost.

I've just built an FTP site using the FTP 7 package on Windows Server 2008. I've configured it to use IIS Manager Authentication following this article. When at the console of the Windows Server 2008 machine, I can FTP to localhost, login using an account I created in the IIS Management tool, and get to a user isolated directory. When I try to connect to the FTP site from any other computer, whether it is on the local network (trying ftp 10.1.10.2) or from a public computer (trying ftp ), I cannot even get to a login prompt. Instead I get "ftp: connect :Connection timed out". What might I need to configure on the FTP server so that at least a machine on the local network with no routers in between the client and server can connect?

  • Sounds like a firewall problem to me. Is windows firewall running on the machine?

Unusual Apache->Tomcat caching issue.

Right now, I have an Apache setup sitting in front of Tomcat to handle caching. This setup has been given to an external service to manage, and since the transition, I've noticed odd behavior. Specifically, when I request a swf file from the web server, I hit the Apache cache (good), but occasionally I'll receive a truncated file. Once I receive this truncated file, the cache will NOT refresh until I manually delete the cache and let the swf pull down from tomcat again.

The external service claims that the configuration is fine, but I don't see any way this could be happening aside from improper configuration. Now, there are two apache and two tomcat servers under a load balancer, and occasionally one apache cache will break while another does not (leading to 50% of all requests getting bad, truncated data).

Where should I start looking to debug this issue? What could POSSIBLY be causing this odd behavior?

Edit: Inspecting the logs, tomcat throws this:

java.io.IOException: Bad file number
        at java.io.FileInputStream.readBytes(Native Method)
        at java.io.FileInputStream.read(FileInputStream.java:199)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at java.io.FilterInputStream.read(FilterInputStream.java:90)
        at org.apache.catalina.servlets.DefaultServlet.copyRange(DefaultServlet.java:1968)
        at org.apache.catalina.servlets.DefaultServlet.copy(DefaultServlet.java:1714)
        at org.apache.catalina.servlets.DefaultServlet.serveResource(DefaultServlet.java:809)
        at org.apache.catalina.servlets.DefaultServlet.doGet(DefaultServlet.java:325)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:209)
        at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:347)
        at org.terracotta.modules.tomcat.tomcat_5_5.SessionValve55.invoke(SessionValve55.java:57)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:283)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767)
        at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697)
        at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889)
        at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
        at java.lang.Thread.run(Thread.java:619)

followed by 

access_log.2009-12-14.txt:1.2.3.4 - - [14/Dec/2009:00:27:32 -0500] "GET /myApp/mySwf.swf HTTP/1.1" 304 -
access_log.2009-12-14.txt:1.2.3.4 - - [14/Dec/2009:01:27:33 -0500] "GET /myApp/mySwf.swf HTTP/1.1" 304 -
access_log.2009-12-14.txt:1.2.3.4 - - [14/Dec/2009:01:39:53 -0500] "GET /myApp/mySwf.swf HTTP/1.1" 304 -
access_log.2009-12-14.txt:1.2.3.4 - - [14/Dec/2009:02:27:38 -0500] "GET /myApp/mySwf.swf HTTP/1.1" 304 -

So apache is caching the bad file size. What could possibly be causing this, and possibly separate, how do I ensure that this exception does not get written to cache?

  • wget, curl or file save as this file?

    Do other large files work?

    If there's a support contract, I would consider requesting the vendor debug this collaboratively.

    Barry : This seems to be the answer: http://mail-archives.apache.org/mod_mbox/tomcat-dev/200808.mbox/%3Cbug-45601-78@https.issues.apache.org/bugzilla/%3E
    iftrue : Is that still not fixed in 6.0.20?
    iftrue : wget and save as both pull down the cached apache file and fail. There aren't many other files on the server that get hit often as large as this file (in fact this file probably gets hit most frequently).
    From Barry
  • Tomcat's inability to serve large files has been fixed in trunk, 5.5.x, and the fix fixing the bug is committed to 6.0.27.

    From iftrue

Weird htaccess file

I am working on a client's site, and I spot this htaccess file.

# -FrontPage-

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit POST PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName [cut]
AuthUserFile  /web/u354/www70620/www/_vti_pvt/service.pwd
AuthGroupFile /web/u354/www70620/www/_vti_pvt/service.grp

Can someone explain what is going on?

So, in the first limit, they are allowing all post, but in the second limit they deny it?

And what is up with the Auth stuff at the bottom? I know it deals with basic web auth, but the site doesn't require a login to view it.

  • This is the crud microsoft frontpage apache module puts in. You can safely remove it.

Number of effective spindles in a RAID array?

Ok, simple question here. When doing performance calculations on RAID arrays that involve number of spindles (such as measuring disk queue length), how many "spindles" do I use?

The array in question is RAID 6. Should I use N-2 spindles? N-1 if it were RAID 5?

  • Depends on your formula, assuming its a good one you should use the total number of spindles in the array, minus hot-spares.

    Here's a good calculator btw: http://wmarow.com/strcalc/

    xeon : Nice calculator! Thanks for that link.
    Boden : What if the formula isn't taking RAID level into account? It seems like I wouldn't include parity drives in my spindle count... ?
    cagenut : Then its not much of a formula. MikeyB's answer below is a great explanation of why it matters. Also remember battery backed write cache makes a massive difference for intermittent/bursty writes. So a RAID-6 with bbwc will likely do much better on a relatively write-heavy database than a RAID-10 without it.
    From cagenut
  • You can't simply subtract the number of parity disks from number of total spindles and end up with useful results if you're doing IOP transaction calculations. There's more going on behind the scenes:

    RAID 10:

    • 1 frontend read translates into 1 backend IOP (d0 read)
    • 1 frontend write translates into 2 backend IOPS (d0 write, d1 write)

    RAID 5:

    • 1 frontend read translates into 1 backend IOP (d0 read)
    • 1 frontend write translates into 4 backend IOPS (d0 read, parity read, d0 write, parity write)

    RAID 6: (not 100% sure on these numbers - someone please correct me if they are off)

    • 1 frontend read translates into 1 backend IOP (d0 read)
    • 1 frontend write translates into 6 backend IOPS (d0 read, parity read, qarity read, d0 write, parity write, qarity write)

    So for example in a RAID set with 8 disks:

    • RAID10: 100 frontend writes translates to 200 IOPS on the backend (or 25/drive)
    • RAID5: 100 frontend writes translates to 400 IOPS on the backend (or 50/drive)
    • RAID6: 100 frontend writes translates to 600 IOPS on the backend (or 75/drive)

    Note that for the RAID10 calculations the method of "subtract number of parity spindles from total number of spindles" gives you the right answer. However, this falls over in RAID5/RAID6 calculations.

    From MikeyB

Does a file system lose performance as it fills up?

The context of the question is a Windows computer (so the filesystem in question is NTFS) that is filling up with data that can probably be removed. But I don't know whether it's worth the time to weed through it, or whether we should just defrag and move on.

Basically, does the 'fullness' of the filesystem cause a decrease in performance, or is it just fragmentation that slows things down? And if it does, does it make a meaningful difference?

  • Fragmentation would lead to some slowness. Overall it probably won't be anything your user notices unless they're doing a lot of video work or working with huge files.

    Actually, I think it would slow down if there's a ton of seek operations, thousands of tiny files that are hit a lot.

    In most cases with decent memory and a routine of only a few files in use, the OS will cache things in memory and you won't notice too much difference. Only benchmarks will tell.

    In the end...this is another "it depends" questions. Depends on large files vs. small, usage patterns on the computer, and just how fragmented fragmented is and how perceptive your users are to a few seconds difference in performance.

    Won't hurt anything if you run MyDefrag. Freeware; it also tries to "optimize" some of the layout of files to areas of the disk where access will be a bit faster.

  • Defrag and move on. It isn't worth the time to save a few dozen GB. But to answer your question, the only thing that a new disk has is all the files at the start so seek times are less. But once it has been used, files can be anywhere, so defrag will help.

  • Many things can impact a server's file-serving performance. Fullness of the file-system is but one of many things that can contribute.

    • Raw disk throughput. If the numbers of I/Os being thrown at your disks exceeds their ability to keep up, it'll get slow.
    • Disk I/O patterns. Some disks behave better with massively random I/O than others. SATA, for instance, doesn't perform as well with massively-random I/O as SAS or SCSI drives.
    • Disk controller resource exhaustion. Whatever you're using for RAID (presuming you are, and this isn't just a single disk) has its own resources. If you're using a parity RAID, it's controller CPU that limits how fast you can commit data to disk. Also, most hardware controllers have their own onboard cache. This is used for many things, but includes reordering writes for improved efficiency. If I/O gets too random, your RAID card may not be able to optimize as well.
    • File-cache memory resources. File-servers perform best when they can fully cache 100% of the open files in memory. This allows them to accept writes from clients and reorder commits to disk in such a way as to make them more efficient. If you can't fit your entire open file set in memory, it'll have to go direct to disk for those I/Os and you'll lose this performance enhancement.
    • Client-local memory resources. Through the use of OpLocks, clients can cache open files locally on themselves. Once more than one client opens the same file, the server tells the client to flush its cache, and this goes away. However, for some workloads it can be a real savings. If the client doesn't have enough file-cache space to cache open files, performance can degrade noticeably when opening files exclusively.
    • File-system fragmentation. A massively fragmented file-system by its very nature induces a massively random I/O pattern on the disk subsystem. If that sub-system can't tolerate that sort of I/O pattern, things get real slow.
    • User-generated I/O patterns. If your users are working on millions of office documents (generally under 2MB in size) your access patterns are going to be very random. If your users are working on large files such as video files, geospatial data, or AutoCAD files, your users will be generating a lot of sequential operations.

    Some of these interrelate and many times it'll be multiple issues driving a performance problem. In general, NTFS filesystem fragmentation does have an impact. The impact is worst when doing large sequential reads from such a file-system, such as happens during a backup. The impact to general file-serving performance is not as significant for typical office-server loads since those are largely random I/O anyway; and in some cases you can even see some performance improvements with a fragmented system over a fully defragged one.

    For a file-server storing a lot of AutoCAD files, NTFS fragmentation will be perceptible to the end users. That user-generated I/O pattern is significantly sequential, and is therefore vulnerable to degradation by fragmentation. How much it'll be really impacted is dependent upon how much RAM the server has for caching, and how fast the underlaying storage is regarding random I/O patterns. It could very well be that the underlaying storage is fast enough that end-users won't notice a volume with 60% fragmentation. Or it could cause I/O saturation with only 15% frag.

    For a file-server storing a lot of plain old office files, NTFS fragmentation will not be as perceptible to end users. That user I/O pattern is significantly random as it is, and is minimally impacted by fragmentation. Where the problems will emerge is in the backup process, as the time to backup each GB will increase as fragmentation increases.

    Which brings me to my final point. The one I/O operation that is most affected by fragmentation is sequential I/O. Most servers undergo large scale sequential I/O patterns as part of the backup process. If you're having trouble fitting your backup into your backup window, defragging can help make things go faster. Your underlaying storage systems will determine how much of an impact fragmentation can have, and your fragmentation numbers will determine how much of an impact it actually has. Know your storage.

  • TL;DR: Not till you get more than 75% full.

    For most intents and purposes, filling up a drive has no performance implications until you get over 75% full. This can be off a bit depending on usage, but for a typical workstation load this is true.

    Fragmentation is minimized when all files have space to be placed. The only types of files that get fragmented on a largely empty NTFS partition are Logfiles and directory metadata, because they are constantly expanding. If you frequently search through logs or have a large throughput of created and deleted files, regular defragmentation may be beneficial even when the drive is less full.

  • If you're under 80% usage or so, don't worry, just defrag.

    When it starts to get close to 100%, any filesystem will start to slow down.

  • if you are using Windows 2008, then you can use Deduplication facility that can free up some unnecessary files that file up your hard disk

    From jakarta512

VB6 Scheduled tasks on Windows Server 2008 Standard

Hello, this is my first time using this forum. Here is my situation:

We are having issues with specific tasks written in VB6 it would seem. I am not a developer, but I am told these tasks exe are written in VB6.

The task is initiated by task scheduler, the process begins to run (you can view the task in task manager, but no resources are used, 00 CPU, 760 K RAM), but nothing occurs. In a normal operating situation, the task will use 25% CPU and up to 20 MB RAM. When the task fails to run, you can still end and start it via Task Scheduler, but nothing happens. If you run just the process via the exe, it runs fine. The problem just seems to be when it is initiated via Task Scheduler. And this is a random issue, which always disappears after a server reboot. All of these tasks are VB 6 applications on Windows Server 2008 Standard, some servers are SP1, some are SP2, but both versions experience the issue. The task has been configured to run with highest priviledges, and to run whether logged on or not. Setting compatibility mode on the exe to 2003 does not make a difference.

Situation 1: 51 - ERROR - Program did not appear to complete, check server!! (Desc: Input past end of file) in this situation, the task is running in task scheduler and you can view the process in task manager. . In the log file, all that is logged is: 12/17/2009 03:16 Starting T2 Populator version - 1.0.12 You can just end the task via task scheduler and start it via task scheduler and away it goes

Situation 2: 36 - ERROR - Program last ran on 16-Dec-2009 in this situation the task is running in Task Scheduler and you can view the process in task manager, but no resources are used, 00 CPU, 760 K RAM. Nothing is logged in the log file. You end the task via task scheduler, but you must manually run the exe for it to complete.

I was wondering if anyone else has experienced issues with VB6 tasks, or any tasks for that matter, on Server 2008?

  • I am also facing the same situation. Please post if you find a solution to it.

  • Are the error values in your examples from the VB App of Windows?

    51 & 36 are network sharing errors in Windows (net helpmsg ##), where are the exe's located?

    Codezy : What parts of the code did you find poorly written?

SQL Server 2008 Cluster Installation - First network name always fails

I'm testing failover clustering in Windows Server 2008 to host a SQL Server 2008 installation using this installation guide. My base cluster is installed and working properly, as well as clustering the DTC service. However, when it comes time to install SQL Server, my first attempt at installation always fails with the same message and seems to "taint" the network name.

For example, with my previous cluster attempt, I was installing SQL Server as VSQL. After approximately 15 attempts of installation and trying to resolve the errors, e.g. changing domain accounts for SQL, setting SPNs, etc., I typoed the network name as VQSL and the installation worked. Similarly on my current cluster, I tried installing with the SQL service named PROD-C1-DB and got the same errors as last time until I tried changing the name to anything else, e.g. PROD-C1-DB1, SQL, TEST, etc., at which point the install works. It will even install to VSQL now.

While testing, my install routine was:

  1. Run setup.exe from patched media, selecting appropriate options
  2. After the install fails, I'd chose "Remove node from a SQL Server failover cluster" and remove the single, failed, node
  3. Attempt to diagnose problem, inspect event logs, etc.
  4. Delete the computer account that was created for the SQL Service from Active Directory
  5. Delete the MSSQL10.MSSQLSERVER folder from the shared data drive

The error message I receive from the SQL Server installer is:

The following error has occurred: The cluster resource 'SQL Server' could not be brought online. Error: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F)

Along with hundreds of the following errors in the Application event log:

[sqsrvres] checkODBCConnectError: sqlstate = 28000; native error = 4818; message = [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'.

System configuration notes:

  • Windows Server 2008 Enterprise Edition x64
  • SQL Server 2008 Enterprise Edition x64 using slipstreamed SP1+CU1 media
  • Dell PowerEdge servers
  • Fibre attached storage
  • What user account are you running the cluster under? I wonder if it's credentials are causing issue with SQL.

    From Tatas
  • While reading your installation routine, what struck me, was that you were removing just one node from the failed cluster installation. In my experience, if a clustered setup fails, it better to clean it up and start fresh. So when your first setup failed, and you removed the node and deleted the account and files, that left your clustered instance in an invalid state and caused problems for the next setup run. When you mistyped the virtual server name, you have started fresh and your setup succeeded.

    From
  • You aren't installing as a domain admin and your account you are using to run SQL services has no rights except on the SQL server are you? At least this was my situation.

    I found that delegating the ability to modify serviceprincipalnames on Computer objects in the Computers OU + granting Read/Write all attributes to the the service account by the service account resolves these problems with establishing the correct SPNs for cluster failover. This allows the SQL service account to dynamicly register and deregister SPNs as instances are taken off line and brought on line.

  • I am having a very similar issue. And yes my installation account is domain admins and the rights for the sql service is domain account, thats all it needs, it doesnt need anything else (i.e. local admins) as the installation makes sure the account has the right policy rights etc. Interestingly after the installation has failed it is actually possible to start sql server outside of the cluster.Its the cluster aservce that is having problems starting the instance. In windows 2008 the cluster service doesnt use a domain account instead the concept of virtual named objects is used in AD. Interestingly when making the installtion as a default instance I didnt get a problem. It was only when I came to install a named instance i am having installtion problems. Registering SPN's etc should all be hnadled with the installtion process I dont believe anything manual needs to be dones. Have you made any progress?

  • I am having a similar issue, I also start the instance outside the cluster manager, than I connect to the instance, add the sqlagent and the sqlserver service accounts to the sysadmin roles, and after that everthing works fine. Only I don't think from that moment I am using the services sid (like nt servcie\Mssql accounts anymore and I would like to use this functionality. So who knows why SQL server is not capable of mapping the windows SQL Server service service-accounts to the SQL service sid accounts? Thes service sid accounts are available (after installation) in the instance, and they have sysadmin rights, but it seems the mapping is not working.

  • I am also having same problem, I used domain account with admin privilage to install cluster, but never it creat virtual sql name, and always it failed in one or the other steps, I do not see any complete documentation for this. Anyone has any idea where I can find complete guide.

    From John

How do I limit the number of remote desktop connections per user (Win 2008)

When I remote desktop into my server I would like to be connected to an existing session or be prevented from connecting as that user so I don't have multiple unknown sessions floating around.

I have set inactive session to expire after 30 minutes but I am not willing to end session on disconnect.

Is there a way to specify a limit per user?

Thanks.

  • I'm pretty sure that this isn't possible. Most of the time you will connect with the existing session if you are the same user, although I have not figured out the pattern of when you don't.

Firewall or other solution for automatic fail-over to a second server?

Assume a situation like this:

  • Server 1 - FreeBSD, Apache -- serves all web traffic

  • Server 2 - FreeBSD, Apache -- just sits there idling

Is that an easy way to set things up so that if Server 1 fails, traffic is automatically routed to Server 2 instead?

A quick brain-storm about it makes me think there must be some sort of trivial firewall or hardware appliance I can set up in front of both boxes that would do a:

  • Receive request on port XXX
  • Try to forward request to Server 1
  • If SUCCESS Return response
  • Otherwise try to forward request to Server 2, return response

Additional question: I'm familiar with pfSense... can this be done in pfSense?

  • you can get a barracuda load balancer or setup load balancing on your firewall.

    : OK, to clarify though - I *do not* want to load-balance here. I want to only have anything hit Server 2 *if Server 1 has failed*. Is your answer still applicable?
    From Rob
  • You have a few options:

    • Establish a "floating" ip address and a mechanism for moving it from one host to another in the event of a failure. This sort of feature is provided by "high availbility" solutions such as the Linux HA project and Pacemaker.

      This solution requires no extra hardware.

    • Put a load-balancing proxy in front of the two servers. This is a system that accepts connections from clients and then passes them on to the backend server. Typically, a proxy can be configured to either balance the load between the two or two treat one as a failover target (to be used only if the primary system fails). You have lots of options in this category:

      • Apache includes a load balancer; see the mod_proxy_balancer documentation.
      • Pound is a simply-to-configure and flexible HTTP/HTTPS proxy.
      • Balance is a simple TCP proxy (which means it will work for protocols other than HTTP).

      And there are many, many others. In general, most software that can act as a reverse HTTP proxy (Squid, nginx, varnish, etc) can do this sort active/passive web cluster.

    • The Linux Virtual Server Project provides a kernel-level load balancing solution.

    Something in this list should help you out or at least get you headed in the right direction.

    You asked specifically about pfSense. From the pfSense web site:

    Limitations

    • Equally distributes load between all available servers - unable to unequally distribute load between servers at this time.
    • Only checks if the server responds to pings or TCP port connections. Cannot check if the server is returning valid content.

    So unless the docs are out of date pfSense will not do what you want.

    From larsks
  • If you dont want to invest any extra cash in this project, Linux HA would be an excellent fit. It works the best as it doesn't require you to build out extensive and expensive extra infrastructure(like a separate DB cluster)

    Also, speaking of DB - unless your webserver hosts some static data, you need to make sure two servers stay in sync. Can you describe your setup in little more detail

    From Vitaliy