HAAGE&PARTNER Computer GmbH  HAAGE&PARTNER

Sawmill Analytics

Analyse und Reporting
für Web | Netzwerk | Sicherheit

Zugriffs- und Datenanalyse von Server-Logs (Proxy, Mailserver, Firewall, Webserver) und Überwachung der Sicherheit & Performance, Schwachstellenanalyse.

Sawmill Analytics 8 | Loganalyse

Sawmill-Tutorial

Creating A Rolling 30-day Database


Typically, a profile in Sawmill imports data from a growing log source, periodically (most often, daily). This is simple, and fine for many purposes, but as the size of the log data increases, so does the size of the database, and this will eventually consume all available disk space. For smaller datasets, the time to consume all disk space may be so long that it causes no problem (if your disk will fill up in 350 years, it's probably not a pressing issue for you), but if the dataset is very large, it may be necessary to restrict the size of Sawmill's database.

There are various ways of reducing the size of the database, but one of the simplest is to restrict data to a certain age. If a database covers only the past 30 days, it will be about 1/10th the size of a database covering the past 300 days (assuming no growth in daily data size). In this article, we will discuss ways to create a database which always shows the past 30 days of data. The same techniques can be used with any other age, for instance to create a 90-day database, or a 60-day database.

If database updates are scheduled to occur daily, then each day, the latest day of data will be added to the database. That's the easy part--the hard part is getting rid of the oldest day of data (the 31st day of data). There are two ways to do this: with a "remove database data" action, or by rebuilding the database with a log filter.


Using "Remove Database Data" To Discard Data Older Than 30 Days From An Existing Database

The "Remove Database Data" action removes data matching a certain filter from an existing database. It is most often used to remove data older than a certain number of days, and the Scheduler has an easy option for using it this way. This section describes how to set up a scheduled task to remove data older than 30 days from the database, every night at midnight.

In the Admin page of the web interface, click Scheduler, to look at the scheduled tasks, and then click New Action in the upper right to create a new action. Choose "Remove database data" for the action; choose the profile name for the profile you want to limit, from the "Profile" menu; and enter "30" in the "Remove database data older than" field. It will look like this:

Remove Database Data Screenshot

Now click Save and Close to save the task. From now on, at midnight, all data older than 30 days will be removed from the database for the selected profile. (If you want it to occur more frequently, or less frequently, or at another time of day, you can change it at the bottom of the window).


Using A Log Filter To Reject Data Older Than 30 Days During Log Processing

If the data is not yet in the database (e.g., if you're rebuilding the database), you can also remove it as you process it, using a Log Filter. This section describes creating a log filter to reject log data older than 30 days.

Go to Config -> Log Data -> Log Filters, and click New Log Filter in the upper right. Click the Filter tab, and give it a name in the Name field like, "Remove data older than 30 days." Click New Condition, and set up the following condition, to detect log data older than 30 days:

New Log Filter Screenshot

Click OK, and click New Action, and set up the following action, to reject the log data detected by the condition above:

Reject Action Screenshot
Click OK, and use the Sort Filters tab to move this Log Filter to the top of the list, by clicking Sort Filters, and then clicking the Up button until the new filter is at the top. It will probably work fine at the bottom, but it's faster to have it at the top, and it could give different results if there is a filter higher in the list which accepts log entries (which is rare). The final filter should look like this:

Final Filter Screenshot
From now on, whenever you rebuild the database for this profile, this filter will reject all log entries older than 30 days. This means that you could actually do a nightly rebuild, instead of a nightly update together with a nightly "remove database data," to maintain a rolling 30-day database. This is reasonable unless that dataset is too large to rebuild nightly. Database updates are much faster than rebuilds, and "remove database data" operations are faster than rebuilds (though not that much faster, since they still have to rebuild all xref tables and indices), so you'll generally get better performance with an update+remove every night, versus a rebuild.


Advanced Topic: Removing Data From The Command Line

It is also possible to run a "remove database data" action at any time from the command line, using the "-a rdd" option. Any report filter can be specified with the -f option, to remove all events matching that filter. The following command removes all data older than 30 days, from the database for the profile profilename:

  SawmillCL -p profilename -a rdd -f "(date_time < now() - 60*60*24*30)"

(on non-Windows systems, use the name of the Sawmill binary, e.g. "sawmill", instead of "SawmillCL").





Professionelle Dienstleistungen

Sollten Sie die Anpassung von Sawmill Analytics nicht selbst vornehmen wollen, können wir Ihnen dies als Dienstleisung anbieten. Unsere Experten setzen sich gerne mit Ihnen in Verbindung, um die Reports oder sonstige Aspekte von Sawmill Analytics an Ihre Gegebenheiten und Wünsche anzupassen. Kontakt

Zur Tutorial-Übersicht

Weitere Informationen

      Live-Demonstrationen »    
© 1995-2011 HAAGE & PARTNER Computer GmbH · Impressum · Datenschutz · www.haage-partner.de