Sawmill-Tutorial
Creating A Rolling 30-day Database
Typically, a profile in Sawmill imports data from a growing log source,
periodically (most often, daily). This is simple, and fine for many
purposes, but as the size of the log data increases, so does the size
of the database, and this will eventually consume all available disk
space. For smaller datasets, the time to consume all disk space may be
so long that it causes no problem (if your disk will fill up in 350
years, it's probably not a pressing issue for you), but if the dataset
is very large, it may be necessary to restrict the size of Sawmill's
database.
There are various ways of reducing the size of the database, but one of
the simplest is to restrict data to a certain age. If a database covers
only the past 30 days, it will be about 1/10th the size of a database
covering the past 300 days (assuming no growth in daily data size). In
this article, we will discuss ways to create a database which always
shows the past 30 days of data. The same techniques can be used with
any other age, for instance to create a 90-day database, or a 60-day
database.
If database updates are scheduled to occur daily, then each day, the
latest day of data will be added to the database. That's the easy
part--the hard part is getting rid of the
oldest day of data
(the 31st day of data). There are two ways to do this: with a "remove
database data" action, or by rebuilding the database with a log filter.
Using "Remove Database Data" To Discard Data Older Than 30
Days From An Existing Database
The "Remove Database Data" action removes data matching a certain
filter from an existing database. It is most often used to remove data
older than a certain number of days, and the Scheduler has an easy
option for using it this way. This section describes how to set up a
scheduled task to remove data older than 30 days from the database,
every night at midnight.
In the Admin page of the web interface, click Scheduler, to look at the
scheduled tasks, and then click
New Action in the upper right
to create a new action. Choose "Remove database data" for the action;
choose the profile name for the profile you want to limit, from the
"Profile" menu; and enter "30" in the "Remove database data older than"
field. It will look like this:
Now click
Save and Close to save the task. From now on, at
midnight, all data older than 30 days will be removed from the database
for the selected profile. (If you want it to occur more frequently, or
less frequently, or at another time of day, you can change it at the
bottom of the window).
Using A Log Filter To Reject Data Older Than 30 Days During Log
Processing
If the data is not yet in the database (e.g., if you're rebuilding the
database), you can also remove it
as you process it, using a
Log Filter. This section describes creating a log filter to reject log
data older than 30 days.
Go to Config -> Log Data -> Log Filters, and click
New Log
Filter in the upper right. Click the
Filter tab, and give
it a name in the Name field like, "Remove data older than 30 days."
Click
New Condition, and set up the following condition, to
detect log data older than 30 days:
Click
OK, and click
New Action, and set up the
following action, to reject the log data detected by the condition
above:
Click
OK, and use the
Sort Filters tab to move this Log
Filter to the top of the list, by clicking
Sort Filters, and
then clicking the
Up button until the new filter is at the top.
It will probably work fine at the bottom, but it's faster to have it at
the top, and it
could give different results if there is a
filter higher in the list which
accepts log entries (which is
rare). The final filter should look like this:
From now on, whenever you rebuild the database for this profile, this
filter will reject all log entries older than 30 days. This means that
you could actually do a nightly
rebuild, instead of a nightly
update together with a nightly "remove database data," to maintain a
rolling 30-day database. This is reasonable unless that dataset is too
large to rebuild nightly. Database updates are much faster than
rebuilds, and "remove database data" operations are faster than
rebuilds (though not
that much faster, since they still have to
rebuild all xref tables and indices), so you'll generally get better
performance with an update+remove every night, versus a rebuild.
Advanced Topic: Removing Data From The Command Line
It is also possible to run a "remove database data" action at any time
from the command line, using the "-a rdd" option. Any report filter can
be specified with the -f option, to remove all events matching that
filter. The following command removes all data older than 30 days, from
the database for the profile
profilename:
SawmillCL -p
profilename -a rdd -f "(date_time < now() -
60*60*24*30)"
(on non-Windows systems, use the name of the Sawmill binary, e.g.
"sawmill", instead of "SawmillCL").
Professionelle Dienstleistungen
Sollten Sie die Anpassung von Sawmill Analytics nicht selbst vornehmen wollen, können wir Ihnen dies als Dienstleisung anbieten. Unsere Experten setzen sich gerne mit Ihnen in Verbindung, um die Reports oder sonstige Aspekte von Sawmill Analytics an Ihre Gegebenheiten und Wünsche anzupassen.
Kontakt
Zur Tutorial-Übersicht