Friday, February 2, 2007

Panel Comm errors -> C-Cure crash

One goal for this blog is to help people learn from the experience of others. This tip is based on a recent painful incident at a high profile. The site was down for several hours event thought they have redundant servers that are functioning properly.

When C-Cure person records are edited, imported, or purged, the changes are downloaded to online panels. If a panel is online, but not communicating due to a hardware or line failure, C-Cure stores the changes for that panel in a download table so they can be sent when comm is restored.

Over time, these records can take up a lot of space in the database, and ultimately kill the driver. If you have a redundant system, the same database will exist on the backup system, so the failure will occur there as well. As far as I know, there is no clear indication of why the system won't work, and to recover, you need restore a backup of a good database or have SH TSG do some database magic.

Moral of story - communication failures should be dealt with immediately, and panels (or comm ports) should be set offline if the fault cannot be repaired promptly.


Anonymous said...


How many apCs need to be in comm fail and for how long for this to be a concern? What was the case for the indcident in your example? How much activity does that system get/see?

Craig Delgado

Jeff Bennett said...

It is related to both the number of apCs and number of cards being downloaded, and I think the retry frequency. I don't recall the exact numbers, but usually the problem crops up because of large or frequent imports. I'm not sure recent current C-Cure versions, but importing from a text file would trigger an import for each card, even if the records were identical.