Create graphs of website users - full example

Using:

- The offline package Google Charts (see other page)

- The access.log from apache webserver (I use a json format version, but with some tweaking you could use the default apache logging)

- A SQL database like mysql/mariadb

- Some scripting and a scheduler (like cron) for having it automated 

 

1. Create a new config for writing the apache logging in json format (which I also use for ingesting into ElasticSearch).

vi /etc/apache2/conf-available/access-log.confLogFormat "{\"index\":{}}\n { \"@timestamp\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"apache\": { \"date\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"process\":\"%D\", \"filename\":\"%f\", \"remoteIP\":\"%a
\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\" }}" jsonlog

CustomLog /var/log/apachelog/jsonlog.log jsonlog

 

2. [Optional] Create a filesystem in memory (because of the possible high number of IO's)

# Temporary for apache logs (also think of cleanup after a few days)
mount -t tmpfs -o size=50M tmpfs /var/log/apachelog

 

3. Enable the Apache config

a2enconf access-log
systemctl restart apache2

Check the logfile, you should see something like this:

{"index":{}}
 { "@timestamp":"2022-09-14T16:55:00.463Z", "apache": { "date":"2022-09-14T16:55:00.463Z", "process":"90", "filename":"-", "remoteIP":"2.168.xxx.xxx", "host":"192.168.xxx.xxx", "request":"-", "query":"", "method":"-", "status":"408", "userAgent":"-", "referer":"-" }}
{"index":{}}
 { "@timestamp":"2022-09-14T16:55:16.994Z", "apache": { "date":"2022-09-14T16:55:16.994Z", "process":"61422", "filename":"/usr/local/nagios/sbin/tac.cgi", "remoteIP":"2.168.xxx.xxx", "host":"192.168.xxx.xxx", "request":"/nagios/cgi-bin/tac.cgi", "query":"", "method":"GET", "status":"200", "userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.33", "referer":"https://192.168.xxx.xxx/nagios/cgi-bin/tac.cgi" }}

4. Create the database in mysql/mariadb

mysql -u root --password=<PASSWORD>
CREATE DATABASE apachelog;
CREATE USER 'alread'@'localhost' IDENTIFIED BY '<A SELECT PASSWORD>';
GRANT SELECT ON apachelog.* to 'alread'@'localhost';

use apachelog;
CREATE TABLE apachelog (id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATETIME, ip TINYTEXT, status INT , continent TINYTEXT, country TINYTEXT,province TINYTEXT,city TINYTEXT,postal TINYTEXT,request TEXT );

CREATE TABLE etlapachelog (id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATETIME , ip TINYTEXT, country TINYTEXT, sumip INT, sumipnok INT );

MariaDB [apachelog]> describe apachelog;
+-----------+------------+------+-----+---------+----------------+
| Field     | Type       | Null | Key | Default | Extra          |
+-----------+------------+------+-----+---------+----------------+
| id        | bigint(20) | NO   | PRI | NULL    | auto_increment |
| date      | datetime   | YES  |     | NULL    |                |
| ip        | tinytext   | YES  |     | NULL    |                |
| status    | int(11)    | YES  |     | NULL    |                |
| continent | tinytext   | YES  |     | NULL    |                |
| country   | tinytext   | YES  |     | NULL    |                |
| province  | tinytext   | YES  |     | NULL    |                |
| city      | tinytext   | YES  |     | NULL    |                |
| postal    | tinytext   | YES  |     | NULL    |                |
| request   | text       | YES  |     | NULL    |                |
+-----------+------------+------+-----+---------+----------------+

5. Create a script which is scheduled once a day (after midnight at f.e. 04:00)

Copy the next code and make a script (chmod 755 executable;-). Also put in your own password..

I am using mmdblookup from MaxMind for GeoIP location information which I update once a week via:

-------------------------------------------------------------------------------

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LICENSEKEY="<Your own license code...you get it for free after registration>"

cd /tmp
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-City.tar.gz 2>/dev/null
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-ASN&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-ASN.tar.gz 2>/dev/null
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-Country.tar.gz 2>/dev/null

for FILE in GeoLite2-City.tar.gz GeoLite2-ASN.tar.gz GeoLite2-Country.tar.gz
do
        FILENAME=`tar tvzf ${FILE} |grep mmdb |awk '{ print $6 }'`
        tar xvzf ${FILE} --strip-components=1 ${FILENAME} >/dev/null 2>&1
        EXTFN=`basename ${FILENAME}`
        find /tmp/${EXTFN} -size +5M -exec mv {} /usr/share/GeoIP/${EXTFN} \;
        rm ${FILE}
done

exit

----------------------------------------------------------------------------------------------

#!/bin/bash
#
# Description: Script to have apachelog converted to sessions by geo location in database per day

# variables
PATH=/sbin:/usr/sbin:/usr/ccs/bin:/bin:/usr/bin:/etc:/usr/contrib/bin:/users/root:/usr/lib:/usr/lib/acct:/usr/ucb:/usr/dt/bin:/usr/bin/X11
DAY=`date --date=yesterday +%Y-%m-%d`
#DAY="2022-09-09"
DBTABLE=apachelog

ls /var/log/apachelog/jsonlog.log_ES_${DAY}T* >/dev/null 2>&1
if [ "$?" -gt "0" ]
then
        echo "The imputfiles do not exist!!!!"
        exit 0
fi

zcat /var/log/apachelog/jsonlog.log_ES_${DAY}T* |grep remoteIP | sed "s/\\\/[ESC]/g" |sed "s/'/./g" |sed "s/;/./g" | while read LINE
do
                DT=`echo ${LINE} | sed 's/.*@timestamp\":\"//g'| cut -d "," -f 1,1 | sed 's/[TZ]/\ /g' |tr -d '@' | tr -d '"'`
                DATE=`date +%Y-%m-%d\ %H:%M:%S --date "${DT}"`
                IPLOOKUP=`echo ${LINE} | sed 's/.*remoteIP\":\"//g'| cut -d "," -f 1,1 | tr -d '"'`
                STAT=`echo ${LINE} | sed 's/.*status\":\"//g'| cut -d "," -f 1,1 | tr -d '"'`
                REQ=`echo ${LINE} | sed 's/.*request\":\"//g'| cut -d '"' -f 1,1 |sed "s/,/./g" | tr -d '"'`
                CONT=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb continent names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
                if [ -z "${CONT}" ]
                then
                        CONT="NULL"
                else
                        CONT="'${CONT}'"
                fi
                COUNTRY=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb country names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
                if [ -z "${COUNTRY}" ]
                then
                        COUNTRY="NULL"
                else
                        COUNTRY="'${COUNTRY}'"
                fi
                PROV=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb subdivisions 2>/dev/null |grep "\"" |grep "utf8_string" |head -n 1 |cut -d "\"" -f 2,2 `
                if [ -z "${PROV}" ]
                then
                        PROV="NULL"
                else
                        PROV="'${PROV}'"
                fi
                CITY=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb city names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 |sed "s/'/\ /g"`
                if [ -z "${CITY}" ]
                then
                        CITY="NULL"
                else
                        CITY="'${CITY}'"
                fi
                POSTAL=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb postal code 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
                if [ -z "${POSTAL}" ]
                then
                        POSTAL="NULL"
                else
                        POSTAL="'${POSTAL}'"
                fi
                SQLDATE="'${DATE}'"
                IPADDRESS="'${IPLOOKUP}'"
                STATUS="'${STAT}'"
                REQUEST="'${REQ}'"
                echo "INSERT INTO ${DBTABLE} VALUES (NULL,$SQLDATE,$IPADDRESS,$STATUS,$CONT,$COUNTRY,$PROV,$CITY,$POSTAL,$REQUEST);" |  mysql -u root --password=<PASSWORD> ${DBTABLE} >/dev/null 2>&1
done

sleep 5

# Making the Database ETL exports FOR graphs fast retrieval!!!!

QUERYDB=apachelog

TABLE=apachelog.etlapachelog
CLEANUP="DELETE FROM ${TABLE} "
echo "${CLEANUP} ;" | mysql -u root --password=<PASSWORD> ${QUERYDB}
QUERY="INSERT IGNORE INTO ${TABLE} (date, ip, country, sumip, sumipnok ) SELECT date, ip, country, count(if(status>='200' and status < 300,1,null)) AS sumip, count(if(status>='400',1,null)) AS sumipnok FROM ${QUERYDB} WHERE date_format(date, '%Y-%m-%d') <> CURDATE() AND country IS NOT NULL GROUP BY date_format(date, '%Y-%m-%d'),ip"

exit

6. Now create a index.php on your website

Copy and paste the next code into this index.php

--------------------------------------------------------------------

<?php
                $con = mysqli_connect('localhost','alread','<PASSWORD>','apachelog');
                $minpv = 0;
                $maxpv = 999;
?>
<!DOCTYPE HTML>
<html>
<head>
                <meta charset="utf-8">
                <link rel="shortcut icon" href="/images/favicon.ico">
                <link rel="stylesheet" href="/stylesheets/base.css">
                <link rel="stylesheet" href="/stylesheets/skeleton.css">
                <link rel="stylesheet" href="/stylesheets/layout.css">
             <?php
                                $query = "SELECT COUNT(*) FROM apachelog";
                                $exec = mysqli_query($con,$query);
                                $numsel = mysqli_fetch_row($exec);

             ?>
                <title>
                               Apache Users at Unix4Life
                </title>

<script type="text/javascript" src="/charts/loader.js"></script>
<script type="text/javascript" src="/charts/maps/api/js?key=:293"></script>
<script type="text/javascript">
  google.charts.load('current', {packages: ['geochart']});
  google.charts.setOnLoadCallback(drawRegionsMap);
      function drawRegionsMap() {
        var data = google.visualization.arrayToDataTable([
                ['Country', 'Total unique users'],
             <?php
                                if (isset($_POST['period']))
                                 {
                                $period = $_POST['period'];
                                 }
                                else
                                 {
                                  $period = 365;
                                 }
                                  $query = "SELECT country, COUNT( DISTINCT(ip) ) AS cip FROM etlapachelog WHERE sumip > $minpv AND date > DATE_SUB(NOW(), INTERVAL $period DAY) GROUP BY country";
                                  $users = "SELECT COUNT( DISTINCT(ip) ) FROM etlapachelog WHERE sumip > $minpv AND date > DATE_SUB(NOW(), INTERVAL $period DAY)";
                                $texec = mysqli_query($con,$users);
                                $tusers = mysqli_fetch_row($texec);

                                $exec = mysqli_query($con,$query);
                                while($row = mysqli_fetch_array($exec)){

                                               echo "['".$row['country']."',".$row['cip']."],";
                                }
                   ?>
        ]);
        var options = {
          resolution: 'countries'
        };
        var chart = new google.visualization.GeoChart(document.getElementById('mapperiod'));
        chart.draw(data, options);
      }
    </script>

<script type="text/javascript">
  google.charts.load('current', {packages: ['corechart']});
  google.charts.setOnLoadCallback(drawChart);
      function drawChart() {
        var data = google.visualization.arrayToDataTable([
                ['Day', 'Total users'],
             <?php
                                $query = "SELECT date_format(date, '%Y-%m-%d') as day, COUNT(ip) AS cip FROM etlapachelog WHERE sumip > $minpv GROUP BY date_format(date, '%Y-%m-%d')";
                                $exec = mysqli_query($con,$query);
                                $numrules = 0;
                                while($row = mysqli_fetch_array($exec)){
                                $numrules = $numrules + 1;

                                               echo "['".$row['day']."',".$row['cip']."],";
                                }
                   ?>
        ]);

        var options = {
             <?php
                echo "title: 'Total website unique users per day',";
             ?>
          titleTextStyle: {
            color: '#1a237e',
            fontSize: 24,
            bold: true
          },
        hAxis: {
          title: 'Date (yyyy-mm-dd)',
          textStyle: {
            color: '#01579b',
            fontSize: 12,
            fontName: 'Arial',
            bold: true,
            italic: true
          },
          titleTextStyle: {
            color: '#01579b',
            fontSize: 20,
            fontName: 'Arial',
            bold: false,
            italic: true
          }
        },
        vAxis: {
          title: 'Users',
          textStyle: {
            color: '#1a237e',
            fontSize: 20,
            bold: false
          },
          titleTextStyle: {
            color: '#1a237e',
            fontSize: 20,
            bold: true
          }
        },
        colors: ['#a52714']
        };

        var chart = new google.visualization.LineChart(document.getElementById('totalusers'));

        chart.draw(data, options);
      }
    </script>

</head>
<body>
        <div class="container">
                <div class="sixteen columns">
                        <h1 class="remove-bottom" style="margin-top: 40px"><img src="/./images/UnixPrompt.png"> Unix4Life</h1>
                        <?php
                                                  $thisyear = date('Y');
                                                  echo "<pre>At this moment..(samples: $numsel[0])</pre>";
                        ?>
        <hr/>
                </div>
        </div><!-- container -->
   <?php
   echo "<H3><pre>Map of period: $period days from today - Total unique users: $tusers[0]</pre></H3>";
   ?>
   <div id="mapperiod" style="width: 1024px; height: 768px;"></div>
   <div id="totalusers" style="width: 1500px; height: 500px;"></div>
</body>
</html>

------------------------------------------------------------------------------------------------

Why use a second table etlapachelog next to the apachelog? =>Think of millions of lines per year in the apachelog and then doing a query for information per day....that will take some time ;-)

With the ETL/extraction (doing the work once a day instead of every time you look at the overview) on the raw table you get less lines to query.

By-the-way: You see some things like sumip&sumipnok being stored..you can use that for making a graph about the pages being viewed per day...

 

 

What do you think or any questions?

Send us feedback!

Enter the sum of the numbers.

Menu

  If you like my website, feel free to donate via the Paypal button... A small amount for a cup of coffee is enough ;-) Thank you!