Sawmill Analytics

Analyse und Reporting
für Web | Netzwerk | Sicherheit

Zugriffs- und Datenanalyse von Server-Logs (Proxy, Mailserver, Firewall, Webserver) und Überwachung der Sicherheit & Performance, Schwachstellenanalyse.

Sawmill Analytics 8 | Loganalyse


Using Cross-Reference Tables

Cross-reference tables (sometimes called cross-reference groups, or xrefs), are tables created by Sawmill in its back-end database. Cross-reference tables are generated during database builds and updates, and contain aggregated information from the main table of the database. For instance, a particular cross-reference table might contain one row for each day in a media server log dataset, with the number of accesses, play duration, bytes transferred, unique IPs, sessions, etc. for that day. When Sawmill generates an unfiltered Days report, it can generate it directly from this table, which is much smaller than the main table of the database; this allows it to generate this report, and other top-level reports, very quickly.

Cross-reference tables are created by default for each non-aggregating field in the database, which means there is roughly one cross-reference table for each report, so all top-level reports are boosted by cross-reference tables. Furthermore, every default cross-reference table also contains the date/time field, so any report can be filtered by date, and still use a cross-reference table. So all default unfiltered reports, and all default date-filtered reports, are accelerated using cross-reference tables.

If you add a new database field, however, or create a new report, or if you often use a particular combination of filters on a particular report, you may need to modify the cross-reference tables to ensure that this custom or filtered report is also fast. This newsletter gives an example of this.

Building The Report

We'll start with a spam filtering server dataset. Clicking Senders shows the top sender domains in the data:


Senders Report

If we also want to know, in a single report, the top Recipients for each Sender, we can use a Pivot Table, by clicking Customize, then the Pivot Table tab, then "Show pivot table", and then selecting Recipients as the drill-down:


Customize Report Element: Add A Pivot

Clicking OK begins to generate this report. But it takes several minutes, and while it's thinking, it displays this progress bar:

Progress, Main Table

Progress Of Report Using Main Table

The key phrase to watch for is "Querying main table." The main table is the primary table of the database, with one row for each, even in the log data. It can be millions or billions of lines long, depending on the dataset. Querying the main table can take many minutes, especially for an unfiltered report (where every single row must be aggregated to create the report), so it is to be avoided for any reports you use frequently. That's what xref tables are for, so let's cancel this report and head to Config to make it faster. In Config, click More Options, then Cross Reference Groups, and duplicate the existing Recipient group, calling it "Sender x Recipient", and add the Sender field to it:


Adding A Cross-Reference Group

After adding a cross-reference group, we need to rebuild the database. After the build is complete, we can return to the reports, and again do the Recipients x Senders pivot table report. This time, the report comes up quickly:


The Final Report

That's it--from now on, that report will be fast. As an added benefit, this also speeds up Recipient reports filtered on a particular Sender, or Sender reports filtered on a particular Recipient.

Which Fields Do You Need?

In order to get a particular report to use an xref group, you need all the fields in the report, plus all the fields in the filters. So if you have a report with three columns: Sender, Messages, and Bytes; and if you also have two filters on that report, a Date/time (date range) filter and a Recipient filter, you will need all those fields in the xref table: Sender, Recipient, Date/time, Messages, and Bytes.

Professional Services

This newsletter describes optimizing reports with cross-reference tables. This is one example of many types of optimization which are possible to make Sawmill build databases faster, and generate reports faster. Other possibilities include selective indexing, log splitting, field simplification, horizontal shrinkage (eliminating fields), vertical shrinkage (eliminating rows), and hierarchical cross-reference optimization. For large environments where performance is important, we recommend Sawmill Professional Services to help you quickly optimize your installation.

[Article revision v1.0]

Professionelle Dienstleistungen

Sollten Sie die Anpassung von Sawmill Analytics nicht selbst vornehmen wollen, können wir Ihnen dies als Dienstleisung anbieten. Unsere Experten setzen sich gerne mit Ihnen in Verbindung, um die Reports oder sonstige Aspekte von Sawmill Analytics an Ihre Gegebenheiten und Wünsche anzupassen. Kontakt

Zur Tutorial-Übersicht

Weitere Informationen

      Live-Demonstrationen »    
© 1995-2011 HAAGE & PARTNER Computer GmbH · Impressum · Datenschutz · www.haage-partner.de