Create graphs of website users - full example with Google Charts

Using:

- The offline package Google Charts (see other page)

- The access.log from apache webserver (I use a json format version, but with some tweaking you could use the default apache logging)

- A SQL database like mysql/mariadb

- Some scripting and a scheduler (like cron) for having it automated 

 

1. Create a new config for writing the apache logging in json format (which I also use for ingesting into ElasticSearch).

vi /etc/apache2/conf-available/access-log.confLogFormat "{\"index\":{}}\n { \"@timestamp\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"apache\": { \"date\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"process\":\"%D\", \"filename\":\"%f\", \"remoteIP\":\"%a
\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\" }}" jsonlog

CustomLog /var/log/apachelog/jsonlog.log jsonlog

 

2. [Optional] Create a filesystem in memory (because of the possible high number of IO's)

# Temporary for apache logs (also think of cleanup after a few days)
mount -t tmpfs -o size=50M tmpfs /var/log/apachelog

 

3. Enable the Apache config

a2enconf access-log
systemctl restart apache2

Check the logfile, you should see something like this:

{"index":{}}
 { "@timestamp":"2022-09-14T16:55:00.463Z", "apache": { "date":"2022-09-14T16:55:00.463Z", "process":"90", "filename":"-", "remoteIP":"2.168.xxx.xxx", "host":"192.168.xxx.xxx", "request":"-", "query":"", "method":"-", "status":"408", "userAgent":"-", "referer":"-" }}
{"index":{}}
 { "@timestamp":"2022-09-14T16:55:16.994Z", "apache": { "date":"2022-09-14T16:55:16.994Z", "process":"61422", "filename":"/usr/local/nagios/sbin/tac.cgi", "remoteIP":"2.168.xxx.xxx", "host":"192.168.xxx.xxx", "request":"/nagios/cgi-bin/tac.cgi", "query":"", "method":"GET", "status":"200", "userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.33", "referer":"https://192.168.xxx.xxx/nagios/cgi-bin/tac.cgi" }}

4. Create the database in mysql/mariadb

mysql -u root --password=<PASSWORD>
CREATE DATABASE apachelog;
CREATE USER 'alread'@'localhost' IDENTIFIED BY '<A SELECT PASSWORD>';
GRANT SELECT ON apachelog.* to 'alread'@'localhost';

use apachelog;
CREATE TABLE apachelog (id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATETIME, ip TINYTEXT, status INT , continent TINYTEXT, country TINYTEXT,province TINYTEXT,city TINYTEXT,postal TINYTEXT,request TEXT );

CREATE TABLE etlapachelog (id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATETIME , ip TINYTEXT, country TINYTEXT, sumip INT, sumipnok INT );

MariaDB [apachelog]> describe apachelog;
+-----------+------------+------+-----+---------+----------------+
| Field     | Type       | Null | Key | Default | Extra          |
+-----------+------------+------+-----+---------+----------------+
| id        | bigint(20) | NO   | PRI | NULL    | auto_increment |
| date      | datetime   | YES  |     | NULL    |                |
| ip        | tinytext   | YES  |     | NULL    |                |
| status    | int(11)    | YES  |     | NULL    |                |
| continent | tinytext   | YES  |     | NULL    |                |
| country   | tinytext   | YES  |     | NULL    |                |
| province  | tinytext   | YES  |     | NULL    |                |
| city      | tinytext   | YES  |     | NULL    |                |
| postal    | tinytext   | YES  |     | NULL    |                |
| request   | text       | YES  |     | NULL    |                |
+-----------+------------+------+-----+---------+----------------+

5. Create a script which is scheduled once a day (after midnight at f.e. 04:00)

Copy the next code and make a script (chmod 755 executable;-). Also put in your own password..

I am using mmdblookup from MaxMind for GeoIP location information which I update once a week via:

-------------------------------------------------------------------------------

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LICENSEKEY="<Your own license code...you get it for free after registration>"

cd /tmp
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-City.tar.gz 2>/dev/null
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-ASN&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-ASN.tar.gz 2>/dev/null
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-Country.tar.gz 2>/dev/null

for FILE in GeoLite2-City.tar.gz GeoLite2-ASN.tar.gz GeoLite2-Country.tar.gz
do
        FILENAME=`tar tvzf ${FILE} |grep mmdb |awk '{ print $6 }'`
        tar xvzf ${FILE} --strip-components=1 ${FILENAME} >/dev/null 2>&1
        EXTFN=`basename ${FILENAME}`
        find /tmp/${EXTFN} -size +5M -exec mv {} /usr/share/GeoIP/${EXTFN} \;
        rm ${FILE}
done

exit

----------------------------------------------------------------------------------------------

#!/bin/bash
#
# Description: Script to have apachelog converted to sessions by geo location in database per day

# variables
PATH=/sbin:/usr/sbin:/usr/ccs/bin:/bin:/usr/bin:/etc:/usr/contrib/bin:/users/root:/usr/lib:/usr/lib/acct:/usr/ucb:/usr/dt/bin:/usr/bin/X11
DAY=`date --date=yesterday +%Y-%m-%d`
#DAY="2022-09-09"
DBTABLE=apachelog

ls /var/log/apachelog/jsonlog.log_ES_${DAY}T* >/dev/null 2>&1
if [ "$?" -gt "0" ]
then
        echo "The imputfiles do not exist!!!!"
        exit 0
fi

zcat /var/log/apachelog/jsonlog.log_ES_${DAY}T* |grep remoteIP | sed "s/\\\/[ESC]/g" |sed "s/'/./g" |sed "s/;/./g" | while read LINE
do
                DT=`echo ${LINE} | sed 's/.*@timestamp\":\"//g'| cut -d "," -f 1,1 | sed 's/[TZ]/\ /g' |tr -d '@' | tr -d '"'`
                DATE=`date +%Y-%m-%d\ %H:%M:%S --date "${DT}"`
                IPLOOKUP=`echo ${LINE} | sed 's/.*remoteIP\":\"//g'| cut -d "," -f 1,1 | tr -d '"'`
                STAT=`echo ${LINE} | sed 's/.*status\":\"//g'| cut -d "," -f 1,1 | tr -d '"'`
                REQ=`echo ${LINE} | sed 's/.*request\":\"//g'| cut -d '"' -f 1,1 |sed "s/,/./g" | tr -d '"'`
                CONT=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb continent names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
                if [ -z "${CONT}" ]
                then
                        CONT="NULL"
                else
                        CONT="'${CONT}'"
                fi
                COUNTRY=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb country names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
                if [ -z "${COUNTRY}" ]
                then
                        COUNTRY="NULL"
                else
                        COUNTRY="'${COUNTRY}'"
                fi
                PROV=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb subdivisions 2>/dev/null |grep "\"" |grep "utf8_string" |head -n 1 |cut -d "\"" -f 2,2 `
                if [ -z "${PROV}" ]
                then
                        PROV="NULL"
                else
                        PROV="'${PROV}'"
                fi
                CITY=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb city names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 |sed "s/'/\ /g"`
                if [ -z "${CITY}" ]
                then
                        CITY="NULL"
                else
                        CITY="'${CITY}'"
                fi
                POSTAL=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb postal code 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
                if [ -z "${POSTAL}" ]
                then
                        POSTAL="NULL"
                else
                        POSTAL="'${POSTAL}'"
                fi
                SQLDATE="'${DATE}'"
                IPADDRESS="'${IPLOOKUP}'"
                STATUS="'${STAT}'"
                REQUEST="'${REQ}'"
                echo "INSERT INTO ${DBTABLE} VALUES (NULL,$SQLDATE,$IPADDRESS,$STATUS,$CONT,$COUNTRY,$PROV,$CITY,$POSTAL,$REQUEST);" |  mysql -u root --password=<PASSWORD> ${DBTABLE} >/dev/null 2>&1
done

sleep 5

# Making the Database ETL exports FOR graphs fast retrieval!!!!

QUERYDB=apachelog

TABLE=apachelog.etlapachelog
CLEANUP="DELETE FROM ${TABLE} "
echo "${CLEANUP} ;" | mysql -u root --password=<PASSWORD> ${QUERYDB}
QUERY="INSERT IGNORE INTO ${TABLE} (date, ip, country, sumip, sumipnok ) SELECT date, ip, country, count(if(status>='200' and status < 300,1,null)) AS sumip, count(if(status>='400',1,null)) AS sumipnok FROM ${QUERYDB} WHERE date_format(date, '%Y-%m-%d') <> CURDATE() AND country IS NOT NULL GROUP BY date_format(date, '%Y-%m-%d'),ip"

exit

6. Now create a index.php on your website

Copy and paste the next code into this index.php

--------------------------------------------------------------------

<?php
                $con = mysqli_connect('localhost','alread','<PASSWORD>','apachelog');
                $minpv = 0;
                $maxpv = 999;
?>
<!DOCTYPE HTML>
<html>
<head>
                <meta charset="utf-8">
                <link rel="shortcut icon" href="/images/favicon.ico">
                <link rel="stylesheet" href="/stylesheets/base.css">
                <link rel="stylesheet" href="/stylesheets/skeleton.css">
                <link rel="stylesheet" href="/stylesheets/layout.css">
             <?php
                                $query = "SELECT COUNT(*) FROM apachelog";
                                $exec = mysqli_query($con,$query);
                                $numsel = mysqli_fetch_row($exec);

             ?>
                <title>
                               Apache Users at Unix4Life
                </title>

<script type="text/javascript" src="/charts/loader.js"></script>
<script type="text/javascript" src="/charts/maps/api/js?key=:293"></script>
<script type="text/javascript">
  google.charts.load('current', {packages: ['geochart']});
  google.charts.setOnLoadCallback(drawRegionsMap);
      function drawRegionsMap() {
        var data = google.visualization.arrayToDataTable([
                ['Country', 'Total unique users'],
             <?php
                                if (isset($_POST['period']))
                                 {
                                $period = $_POST['period'];
                                 }
                                else
                                 {
                                  $period = 365;
                                 }
                                  $query = "SELECT country, COUNT( DISTINCT(ip) ) AS cip FROM etlapachelog WHERE sumip > $minpv AND date > DATE_SUB(NOW(), INTERVAL $period DAY) GROUP BY country";
                                  $users = "SELECT COUNT( DISTINCT(ip) ) FROM etlapachelog WHERE sumip > $minpv AND date > DATE_SUB(NOW(), INTERVAL $period DAY)";
                                $texec = mysqli_query($con,$users);
                                $tusers = mysqli_fetch_row($texec);

                                $exec = mysqli_query($con,$query);
                                while($row = mysqli_fetch_array($exec)){

                                               echo "['".$row['country']."',".$row['cip']."],";
                                }
                   ?>
        ]);
        var options = {
          resolution: 'countries'
        };
        var chart = new google.visualization.GeoChart(document.getElementById('mapperiod'));
        chart.draw(data, options);
      }
    </script>

<script type="text/javascript">
  google.charts.load('current', {packages: ['corechart']});
  google.charts.setOnLoadCallback(drawChart);
      function drawChart() {
        var data = google.visualization.arrayToDataTable([
                ['Day', 'Total users'],
             <?php
                                $query = "SELECT date_format(date, '%Y-%m-%d') as day, COUNT(ip) AS cip FROM etlapachelog WHERE sumip > $minpv GROUP BY date_format(date, '%Y-%m-%d')";
                                $exec = mysqli_query($con,$query);
                                $numrules = 0;
                                while($row = mysqli_fetch_array($exec)){
                                $numrules = $numrules + 1;

                                               echo "['".$row['day']."',".$row['cip']."],";
                                }
                   ?>
        ]);

        var options = {
             <?php
                echo "title: 'Total website unique users per day',";
             ?>
          titleTextStyle: {
            color: '#1a237e',
            fontSize: 24,
            bold: true
          },
        hAxis: {
          title: 'Date (yyyy-mm-dd)',
          textStyle: {
            color: '#01579b',
            fontSize: 12,
            fontName: 'Arial',
            bold: true,
            italic: true
          },
          titleTextStyle: {
            color: '#01579b',
            fontSize: 20,
            fontName: 'Arial',
            bold: false,
            italic: true
          }
        },
        vAxis: {
          title: 'Users',
          textStyle: {
            color: '#1a237e',
            fontSize: 20,
            bold: false
          },
          titleTextStyle: {
            color: '#1a237e',
            fontSize: 20,
            bold: true
          }
        },
        colors: ['#a52714']
        };

        var chart = new google.visualization.LineChart(document.getElementById('totalusers'));

        chart.draw(data, options);
      }
    </script>

</head>
<body>
        <div class="container">
                <div class="sixteen columns">
                        <h1 class="remove-bottom" style="margin-top: 40px"><img src="/./images/UnixPrompt.png"> Unix4Life</h1>
                        <?php
                                                  $thisyear = date('Y');
                                                  echo "<pre>At this moment..(samples: $numsel[0])</pre>";
                        ?>
        <hr/>
                </div>
        </div><!-- container -->
   <?php
   echo "<H3><pre>Map of period: $period days from today - Total unique users: $tusers[0]</pre></H3>";
   ?>
   <div id="mapperiod" style="width: 1024px; height: 768px;"></div>
   <div id="totalusers" style="width: 1500px; height: 500px;"></div>
</body>
</html>

------------------------------------------------------------------------------------------------

Why use a second table etlapachelog next to the apachelog? =>Think of millions of lines per year in the apachelog and then doing a query for information per day....that will take some time ;-)

With the ETL/extraction (doing the work once a day instead of every time you look at the overview) on the raw table you get less lines to query.

By-the-way: You see some things like sumip&sumipnok being stored..you can use that for making a graph about the pages being viewed per day...