Sawmill-Tutorial
Using Cross-Reference Tables
Cross-reference tables (sometimes called cross-reference groups, or
xrefs), are tables created by Sawmill in its back-end database.
Cross-reference tables are generated during database builds and
updates, and contain aggregated information from the main table of the
database. For instance, a particular cross-reference table might
contain one row for each day in a media server log dataset, with the
number of accesses, play duration, bytes transferred, unique IPs,
sessions, etc. for that day. When Sawmill generates an unfiltered Days
report, it can generate it directly from this table, which is much
smaller than the main table of the database; this allows it to generate
this report, and other top-level reports, very quickly.
Cross-reference tables are created by default for each non-aggregating
field in the database, which means there is roughly one cross-reference
table for each report, so all top-level reports are boosted by
cross-reference tables. Furthermore, every default cross-reference
table also contains the date/time field, so any report can be filtered
by date, and still use a cross-reference table. So all default
unfiltered reports, and all default date-filtered reports, are
accelerated using cross-reference tables.
If you add a new database field, however, or create a new report, or if
you often use a particular combination of filters on a particular
report, you may need to modify the cross-reference tables to ensure
that this custom or filtered report is also fast. This newsletter gives
an example of this.
Building The Report
We'll start with a spam filtering server dataset. Clicking Senders
shows the top sender domains in the data:
Senders Report
If we also want to know, in a single report, the top Recipients for
each Sender, we can use a Pivot Table, by clicking Customize, then the
Pivot Table tab, then "Show pivot table", and then selecting Recipients
as the drill-down:
Customize Report Element: Add A Pivot
Clicking OK begins to generate this report. But it takes several
minutes, and while it's thinking, it displays this progress bar:
Progress Of Report Using Main Table
The key phrase to watch for is "Querying main table." The main table is
the primary table of the database, with one row for each, even in the
log data. It can be millions or billions of lines long, depending on
the dataset. Querying the main table can take many minutes, especially
for an unfiltered report (where every single row must be aggregated to
create the report), so it is to be avoided for any reports you use
frequently. That's what xref tables are for, so let's cancel this
report and head to Config to make it faster. In Config, click More
Options, then Cross Reference Groups, and duplicate the existing
Recipient group, calling it "Sender x Recipient", and add the Sender
field to it:
Adding A Cross-Reference Group
After adding a cross-reference group, we need to rebuild the database.
After the build is complete, we can return to the reports, and again do
the Recipients x Senders pivot table report. This time, the report
comes up quickly:
The Final Report
That's it--from now on, that report will be fast. As an added benefit,
this also speeds up Recipient reports filtered on a particular Sender,
or Sender reports filtered on a particular Recipient.
Which Fields Do You Need?
In order to get a particular report to use an xref group, you need all
the fields in the report, plus all the fields in the filters. So if you
have a report with three columns: Sender, Messages, and Bytes; and if
you also have two filters on that report, a Date/time (date range)
filter and a Recipient filter, you will need all those fields in the
xref table: Sender, Recipient, Date/time, Messages, and Bytes.
Professional Services
This newsletter describes optimizing reports with cross-reference
tables. This is one example of many types of optimization which are
possible to make Sawmill build databases faster, and generate reports
faster. Other possibilities include selective indexing, log splitting,
field simplification, horizontal shrinkage (eliminating fields),
vertical shrinkage (eliminating rows), and hierarchical cross-reference
optimization. For large environments where performance is important, we
recommend Sawmill Professional Services to help you quickly optimize
your installation.
[Article revision v1.0]
Professionelle Dienstleistungen
Sollten Sie die Anpassung von Sawmill Analytics nicht selbst vornehmen wollen, können wir Ihnen dies als Dienstleisung anbieten. Unsere Experten setzen sich gerne mit Ihnen in Verbindung, um die Reports oder sonstige Aspekte von Sawmill Analytics an Ihre Gegebenheiten und Wünsche anzupassen.
Kontakt
Zur Tutorial-Übersicht