Hacking ElasticSearch python client to work with AWS OpenSearch

Because of various disagreements between AWS and ElasticSearch, AWS released a fork called OpenSearch but and Elastic updated their clients to throw errors if you try to use them with this product.

This is obviously really annoying if you are using 3rd party software which uses the Elastic libraries, but trying to run them against AWS managed services. The following hack fixes this for Python elasticsearch v7.17 at least, by disabling the unnecessary version check:

# Hack elasticsearch to work with AWS
from elasticsearch import Transport
Transport._do_verify_elasticsearch = lambda self, headers, timeout: None

Screen corruption on KDE with Ubuntu 20.04

At some point in the past couple of weeks I guess my laptop updated libraries or something, because when I had to reboot my laptop yesterday I started seeing massive screen corruption in some applications. It manifested itself in horizontal white/black lines remaining especially when selecting text. This was especially visible in konsole. I couldn’t see any package in particular which had updated recently and I tried looking at a few different Xorg or kernel options but to no avail.

Eventually I looked at the KDE compositor settings. I noticed that the “Rendering Backend” was set to XRender, which as far as I understand it is very old. Updating it to ‘OpenGL 3.1 instantly fixed the issue.

I’m leaving this mostly as a note to myself for if it happens again in the future, but at the same time perhaps it is a wider regression in the ubuntu 20.04 KDE packages so it would help someone else.

Best wordle starter words

I recently, like pretty much everyone else got into Wordle. One of the most important things in getting the correct answer is to find the best first word or two to start with which will help guide you to the correct answer. The ideal first word(s) should use one each of the most common letters so for example in the first 2 guesses you can test the top 10 characters.

My first (relatively uneducated) guesses based on what I vaguely remembered about letter frequency in English were ‘spear’ and ‘mount’ – 4 vowels and some of the most common consonants. However it’s pretty much a random guess so I was wondering if we could figure out a better approach.

It’s pretty straight forward to look at the source code of Wordle, which contains two word lists. The first one contains 2315 5-letter words which can be the answer, the second contains a further 10,000 of all possible 5 letter words in English.

So, I wrote a small script to analyse the frequency of letters in the list of possible answers, and then based on that filter the possible words to find the best starting (and subsequent) guesses which would work.

I’ve put the simple python script I used at the bottom of the article, but the output is:

Matching 5 new letters (39%) are: [‘arose’]
Matching 5 new letters (66%) are: [‘unlit’, ‘until’]
Matching 4 new letters (81%) are: [‘duchy’]
Matching 3 new letters (89%) are: [‘pygmy’]

What this means is that if you start with the word ‘arose’, and then ‘until’ (or ‘unlit’), even though it’s only 10 unique letters (38% of the alphabet) because they are the most frequent ones they will cover 2/3 (66%) of the possible words.

In terms of letter frequency overall we get the following ordered detail:

[(‘e’, 1233), (‘a’, 979), (‘r’, 899), (‘o’, 754), (‘t’, 729), (‘l’, 719), (‘i’, 671), (‘s’, 669), (‘n’, 575), (‘c’, 477), (‘u’, 467), (‘y’, 425), (‘d’, 393), (‘h’, 389),
(‘p’, 367), (‘m’, 316), (‘g’, 311), (‘b’, 281), (‘f’, 230), (‘k’, 210), (‘w’, 195), (‘v’, 153), (‘z’, 40), (‘x’, 37), (‘q’, 29), (‘j’, 27)]

The script I wrote is not perfect but it’s at least a start at finding some optimum words

import sys
with open(sys.argv[1]) as fh:
    words = [l.strip() for l in fh]

chars = {}
for char in ''.join(words):
    chars[char] = chars.get(char, 0) + 1
frequency = sorted(chars.keys(), key=lambda c: -chars[c])
print(sorted(chars.items(), key=lambda c: -c[1]))

total_freq = 0
while len(frequency) > 5:
    matching = words
    letters = []
    for char in frequency:
        new_matching = [w for w in matching if char in w]
        if new_matching:
            matching = new_matching
            letters.append(char)
        if len(letters) == 5:
            break

    total_freq += sum([chars[c] for c in letters])
    print("Matching %d new letters (%d%%) are: %r" % (len(letters), total_freq / sum(chars.values()) * 100, matching))
    frequency = [c for c in frequency if c not in letters]

Simple mitigation for the new DNS cache poisoning attack

As reported in many places, a new attack has been presented which can allow an attacker to poison caching and forwarding DNS server entries. The PDF is an interesting read and contains many different ideas which chained together can lead to this attack. I believe the following firewall rule should defend against the attack on caching servers with very little side effect by preventing sending of ICMP messages saying that the given UDP port was unreachable:

iptables -I OUTPUT -p icmp --icmp-type port-unreachable -m u32 --u32 '34 & 0xFF = 17' -j DROP

Running systemd inside a Centos 8 Docker container

Support for systemd in Docker has improved a lot since this 2016 article, but it’s still not obvious quite how to make it work. Why would you want this? Mostly for testing full-server deploys (for example we test ansible deployments against various docker containers to ensure there are no bugs). Here’s a systemd-based centos 8 Dockerfile that also includes an ssh server:
FROM centos:8

# Set up base packages that are expected
RUN dnf -y install openssh-server crontabs NetworkManager firewalld selinux-policy

RUN systemctl mask dev-mqueue.mount dev-hugepages.mount \
     systemd-remount-fs.service sys-kernel-config.mount \
     sys-kernel-debug.mount sys-fs-fuse-connections.mount \
     graphical.target systemd-logind.service \
     NetworkManager.service systemd-hostnamed.service

STOPSIGNAL SIGRTMIN+3

# SSHd setup
EXPOSE 22
COPY docker.pub /root/.ssh/authorized_keys
RUN chmod 600 /etc/ssh/ssh_host* /root/.ssh/authorized_keys

CMD ["/sbin/init"]
You can then launch this like:
docker run -v /sys/fs/cgroup:/sys/fs/cgroup:ro --tmpfs /run container-name
For the latest centos 7 you can use the following Dockerfile:
FROM centos:7

RUN yum -y install openssh-server NetworkManager firewalld && \
    systemctl disable NetworkManager && systemctl enable sshd

EXPOSE 22
COPY docker.pub /root/.ssh/authorized_keys
RUN chmod 600 /etc/ssh/ssh_host* /root/.ssh/authorized_keys

STOPSIGNAL SIGRTMIN+3

CMD ["/sbin/init"]

Complex network setup on Centos/Redhat 8 via Network Manager

Here are some notes, mostly for my future self about how to set up bond + vlan + bridge networks in Centos 8 in order to create a highly resilient virtual machine host server.

Firstly, create the bond interface and set the options:

nmcli con add type bond ifname bond0 bond.options "mode=802.3ad,miimon=100" ipv4.method disabled ipv6.method ignore connection.zone trusted

Note the 802.3ad option – this says we are going to be using the LACP protocol which requires router support, however you can look at the different mode options in the kernel documentation.

Then, we add the required interfaces into the bond – in this case enp131s0f0 and enp131s0f1

nmcli con add type ethernet ifname enp131s0f0 master bond0 slave-type bond connection.zone trusted
nmcli con add type ethernet ifname enp131s0f1 master bond0 slave-type bond connection.zone trusted

If you want to create a standard vlan interface over the top of the bond for vlan tag 123 we can do this just with 1 more command:

nmcli con add type vlan ifname bond0.123 dev bond0 id 123 ipv4.method manual ipv4.addresses a.b.c.d/24 ipv4.gateway a.b.c.d ipv4.dns 8.8.8.8 connection.zone public

Alternatively if you want to set up a bridge on this interface:

nmcli con add ifname br.123 type bridge ipv4.method manual ipv4.addresses a.b.c.d/24 ipv4.gateway a.b.c.d ipv4.dns 8.8.8.8 \
        bridge.stp no bridge.forward-delay 0 connection.zone public
nmcli con add type vlan ifname bond0.123 dev bond0 id 123 master br.123 slave-type bridge connection.zone trusted

If you don’t want the default route or DNS, simply remove the ipv4.gateway and ipv4.dns options above.

If you want to create a bridge which doesn’t have any IP address on the host (just bridges through to the host vm’s) then replace all the ipv4 settings with: ipv4.method disabled ipv6.method ignore.

Ripping unfinalized DVDs from Linux

Recently we had some DVD’s with old family videos that had been recorded directly but never finalized, the device that recorded them then broke so it was impossible to finalize them, and it seemed to be pretty much impossible for anything else to read them. So, I figured out a way to recover them using linux…

First I installed the dvd+rw-tools package on ubuntu and used that to get various info and prove that the DVD itself was readable even though nothing on the system could see it as a video disk or filesystem.

$ dvd+rw-mediainfo /dev/sr0 
INQUIRY:                [HL-DT-ST][DVDRAM GTB0N    ][1.00]
GET [CURRENT] CONFIGURATION:
 Mounted Media:         11h, DVD-R Sequential
 Media ID:              CMC MAG. AE1
 Current Write Speed:   8.0x1385=11080KB/s
 Write Speed #0:        8.0x1385=11080KB/s
 Write Speed #1:        4.0x1385=5540KB/s
 Speed Descriptor#0:    00/0 R@8.0x1385=11080KB/s W@8.0x1385=11080KB/s
 Speed Descriptor#1:    00/0 R@8.0x1385=11080KB/s W@4.0x1385=5540KB/s
READ DVD STRUCTURE[#10h]:
 Media Book Type:       00h, DVD-ROM book [revision 0]
 Legacy lead-out at:    2298496*2KB=4707319808
READ DVD STRUCTURE[#0h]:
 Media Book Type:       25h, DVD-R book [revision 5]
 Last border-out at:    8390653*2KB=17184057344
READ DISC INFORMATION:
 Disc status:           appendable
 Number of Sessions:    1
 State of Last Session: incomplete
 "Next" Track:          1
 Number of Tracks:      19
READ TRACK INFORMATION[#1]:
 Track State:           reserved
 Track Start Address:   0*2KB
 Next Writable Address: 0*2KB
 Free Blocks:           2544*2KB
 Track Size:            2544*2KB
READ TRACK INFORMATION[#2]:
 Track State:           complete incremental
 Track Start Address:   2560*2KB
 Free Blocks:           0*2KB
 Track Size:            240*2KB
 Last Recorded Address: 2591*2KB

I then pulled the entire DVD into a file on the local computer for easier processing later. The command will produce lots of errors (as there are parts of the DVD that are not readable as they were never written to), but the output file (image.iso) will contain a full dump of the DVD eventually

dd if=/dev/sr0 of=image.iso bs=2048 conv=noerror,notrunc iflag=nonblock

I then put together a short perl script to search through this file for 1kb blocks beginning with the magic tag DVDVIDEO – these seemed to be the starts of individual chapters, which avconv (also called ffmpeg on some distributions) can then extract into proper video/audio.

#!/usr/bin/perl
use v5.16;
use strict;
use warnings;
my $off = 0;
my $file = $ARGV[0];
open my $fh, '<:bytes', $file or die;
my $buf;
my @pos;

# Search through each block for one beginning with the header text and store these in array of offsets - I think it's one for each track
while( my $len = read $fh, $buf, 1024 ) {
        die if $len != 1024;

        if( $buf =~ /^DVDVIDEO/ ) {
                push @pos, $off;
        }
        $off++;
}
push @pos, $off;

my $chap = 0;
for( my $i = 0; $i < @pos - 1; $i++ ) {
        my $length = $pos[$i+1] - $pos[$i];
        next if $length < 1000;
        $chap++;
        say "dd if=$file bs=1024 skip=$pos[$i] count=$length | avconv -i - -acodec copy -vcodec copy out$chap.mp4";
}

Save that as extract_dvd_tracks.pl and then run it to extract them as files named like out0.mp4. Note that it won’t process tracks that are less than about 1Mb because there seemed to be a number of small sections like this which we wanted to skip over.

perl extract_dvd_tracks.pl image.iso

Job done! Note that there are probably many different formats and layouts for unfinalized DVD’s, this may be just one of many but hopefully the principle remains the same.

Extracting all PHP code from a file

In checking over a project recently, I wanted to extract all PHP code from a set of files and combine it into a single output so I could easily assess what was being used. The eventual command I ended up with was as follows, hopefully it will be useful to someone else in the future:

find -name \*.php | xargs perl -nE 'BEGIN{ undef $/ } say for /<\?php\s*((?:(?!\s*\?>).)+)/sg'

Implementing correct modal navigation with react-router

I’ve recently been converting a big project from jquery mobile into React and Material-UI. One area that seems pretty weak compared to other frameworks is with regards to having a proper mobile-focused infrastructure for navigation. One of the big issues for me was that of modals (for example a dialog, alert or popup page components). For example when you see an alert you want to be able to click on the ok button to dismiss it. If you are on android you also expect to be able to press the back button to dismiss. However if you have separate routes for your modals such that the url changes when they are open, if you refresh the page at that point you will have a modal but no previous state to explore. There seem to be certain hacks with react-router-dom which allow you to change the page but keep the URL the same, assuming you are using BrowserRouter, however because this was a legacy project I want to keep on using HashRouter. So, I whipped up a quick HOC hack to wrap around a modal which will allow both the back navigation to work as expected, and also for refreshes to go to the main page rather than opening the modal.

I already had a standard base class which had the handleClose() method to close off a dialog and signal to the parent that it was done, so I expanded it to include a state listener as below

function getDisplayName(WrappedComponent) {
    return WrappedComponent.displayName || WrappedComponent.name || 'Component';
}

// Shows some sort of dialog which has a handleClose method to close it,
// optionally calling props.onClose and also handles history appropriately
export const DialogMixin = Base => class extends Base {
    displayName = `withDialogMixin(${getDisplayName(Base)})`;

    __close = () => {
        if( window.location.hash == this.__start_hash ) {
            window.removeEventListener('popstate', this.__close);
            this.setState({ closed: true });
            if( this.props.onClose )
                this.props.onClose();
        }
    }

    handleClose = () => window.history.back();

    render() {
        // A component could be mounted but not showing any dialog - only hook
        // the history if actually showed something.
        const render = super.render();
        if( !this.state.closed && !this.__start_hash && render ) {
            this.__start_hash = window.location.hash;
            window.location.hash += '?dialog';

            window.addEventListener('popstate', this.__close);
        }
        return render;
    }
}

You then just wrap your component with this – for example

export const Alert = DialogMixin(_Alert);

This works because react-router-dom doesn’t attempt to parse query strings etc, so when the app is first loaded you want to just have something to remove any of the ?dialog path hacks:

window.location.hash = window.location.hash.replace(/\?.*/, '');

Dumping all remote mysql queries to a server

For a project recently we needed to see which users were accessing which databases on a remote mysql server. After a little bit of reading and experimenting I came up with this command – hopefully it will be useful for someone in the future.

tshark -i any -n \
    -d tcp.port==3306,mysql \
    -T fields -e timestamp -e ip.src -e mysql.user -e mysql.schema -e mysql.query \
    -f "dst port 3306" \
    -Y "len(mysql.user) > 0 || len(mysql.query) > 0" 

This uses the excellent wireshark tool to capture traffic. The -f flag specifies a filter to only inspect inbound traffic to the server (as we don’t care about the responses).

The line beginning -T specifies the fields we wish to dump (in a tab-separated fashion), -d ensures that all traffic will attempt to be decoded as mysql. The final -Y flag ensures that only packets with queries or logins will be dumped, rather than TCP overhead or placeholder binding etc.

Note that user information is only given at the beginning of a session. Schema (database name) information may be on that same line, or you may have to scan the queries to look for a USE database type command as there are two different ways of setting the current database that will be accessed.