Figuring out which session is blocking a query on Microsoft SQL Server (query suspended)

I was recently debugging a case where a customer’s installation was hanging due to a database lock. The issue turned out to be that the database was not set to a default transaction level of READ COMMITTED SNAPSHOT, which the product expects. Regardless, while troubleshooting the issue, I stumbled upon a very useful SQL query that I needed to save somewhere, so here goes.

The following query lists which SQL session is blocked by which other session:

SELECT
db.name DBName,
tl.request_session_id,
wt.blocking_session_id,
OBJECT_NAME(p.OBJECT_ID) BlockedObjectName,
tl.resource_type,
h1.TEXT AS RequestingText,
h2.TEXT AS BlockingTest,
tl.request_mode
FROM sys.dm_tran_locks AS tl
INNER JOIN sys.databases db ON db.database_id = tl.resource_database_id
INNER JOIN sys.dm_os_waiting_tasks AS wt ON tl.lock_owner_address = wt.resource_address
INNER JOIN sys.partitions AS p ON p.hobt_id = tl.resource_associated_entity_id
INNER JOIN sys.dm_exec_connections ec1 ON ec1.session_id = tl.request_session_id
INNER JOIN sys.dm_exec_connections ec2 ON ec2.session_id = wt.blocking_session_id
CROSS APPLY sys.dm_exec_sql_text(ec1.most_recent_sql_handle) AS h1
CROSS APPLY sys.dm_exec_sql_text(ec2.most_recent_sql_handle) AS h2
ORDER BY tl.request_session_id
GO

…and by following the tree (X blocked by Y, blocked by Z, etc..) one can see which session is responsible for the mess.

Other interesting queries:

sp_who2 Shows sessions connected to the database, how much time they’ve spent, as well as several other stats
select cmd,* from sys.sysprocesses
where blocked > 0
Shows currently blocked queries

 

Using IPv6 SLAAC with IP forwarding on Debian Stretch (“connect: network is unreachable”, no default gateway)

When configured as a router, for instance for hosting virtual machines with KVM, a Debian machine with a somewhat recent kernel will not listen to router advertisements from others. This makes sense in many cases, but when you have a VM host in your internal network, which only router function is to allow communication to and from the virtual machines it hosts, the default behaviour is less than optimal.

ping6 reports the network as unreachable. Looking at ip -6 route reveals the problem. There’s no “default” line configured.

# ping6 google.com
connect: Network is unreachable
# ip -6 route
2001:4fa0:baaa:1::/64 dev eth0 proto kernel metric 256 expires 86394sec pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium

After verifying with tcpdump -i eth0 ip6 that router advertisements were indeed being received, I looked around for a bit and found this post by Andy Smith that explains the problem in detail.

In short, accepting router advertisements is governed by /proc/sys/net/ipv6/conf/$IFACE/accept_ra. These values are documented in ip-sysctl.txt (local archive here).

accept_ra - INTEGER
	Accept Router Advertisements; autoconfigure using them.

	It also determines whether or not to transmit Router
	Solicitations. If and only if the functional setting is to
	accept Router Advertisements, Router Solicitations will be
	transmitted.

	Possible values are:
		0 Do not accept Router Advertisements.
		1 Accept Router Advertisements if forwarding is disabled.
		2 Overrule forwarding behaviour. Accept Router Advertisements
		  even if forwarding is enabled.

	Functional default: enabled if local forwarding is disabled.
			    disabled if local forwarding is enabled.

The solution is thus to set accept_ra to 2 to allow for accepting router advertisements even if forwarding is set to 1.

Since I use Shorewall to enable the forwarding, I ended up with a somewhat different config than Andy, but I still use the pre-up statements in /etc/network/interfaces to resolve the issue.

allow-hotplug eth0
iface eth0 inet static
	address 192.168.6.10
	netmask	255.255.255.0
	gateway 192.168.6.1
	dns-nameservers 192.168.6.1
	pre-up echo 2 > /proc/sys/net/ipv6/conf/$IFACE/accept_ra

The rest of the config is only shown for reference. The important part is the pre-up line. Setting this, and then manually writing to /proc/sys/net/ipv6/conf/eth0/accept_ra to apply the configuration immediately, I had a default route within seconds, and ping to Google worked like a charm.

# ip -6 route
2001:4fa0:baaa:1::/64 dev eth0 proto kernel metric 256 expires 86394sec pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::1:1 dev eth0 proto ra metric 1024  expires 50sec hoplimit 64 pref medium
# ping6 -c4 google.com
PING google.com(arn09s11-in-x0e.1e100.net (2a00:1450:400f:807::200e)) 56 data bytes
64 bytes from arn09s11-in-x0e.1e100.net (2a00:1450:400f:807::200e): icmp_seq=1 ttl=55 time=19.0 ms
64 bytes from arn09s11-in-x0e.1e100.net (2a00:1450:400f:807::200e): icmp_seq=2 ttl=55 time=18.6 ms
64 bytes from arn09s11-in-x0e.1e100.net (2a00:1450:400f:807::200e): icmp_seq=3 ttl=55 time=22.3 ms
64 bytes from arn09s11-in-x0e.1e100.net (2a00:1450:400f:807::200e): icmp_seq=4 ttl=55 time=17.8 ms

--- google.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 17.823/19.474/22.359/1.725 ms

Exim4 line length in Debian Stretch – “Mail delivery failed: returning message to sender”

Exim4 introducted new behaviour in the versions included in Debian Stretch. Suddenly, the RFC max line length of 998 characters is enforced, and emails with lines exceeding this length are returned to sender. For the interested, part of the discussion about this feature is over here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=839147

Anyway, the solution to this issue in Debian is to add the following to /etc/exim4/update-exim4.conf.conf

IGNORE_SMTP_LINE_LENGTH_LIMIT='true'

..then restart Exim4

service exim4 restart

You can test the new configuration by sending an email to ‘root’ with an absurdly long line in it:

$ { echo "Test begins"; for ((i=0; i<1500; i++)); do echo -n "0"; done; echo -e "\nTest ends"; } | mail -s "Very long line" root

If you get that email, you did this correctly 🙂

Understanding mcelog ECC errors – Which stick of RAM is broken?

Before reading this, please note that much of the information in mcelog is hardware dependent. Your mileage may vary.

Memory gone bad

So one of the servers, running an X99-WS/IPMI board from Asus, began putting errors into /var/log/mcelog. Thankfully, they were all the same, telling me the following:

mcelog: failed to prefill DIMM database from DMI data
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 11 
MISC 90840080008228c ADDR 9ce494000 
TIME 1499161840 Tue Jul 4 09:50:40 2017
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER MS_CHANNEL2_ERR
Transaction: Memory scrubbing error
MemCtrl: Corrected patrol scrub error
STATUS 8c000051000800c2 MCGSTATUS 0
MCGCAP 7000c16 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 79

So, what does this mean?

The first few lines tell us this happened from CPU 0 on BANK 11. This wasn’t much help, as the board only has 8 memory banks. It was suggested in #debian on freenode that the high bank number might be due to dual channel memory, but then how do I pinpoint the physical stick?

Enter dmidecode

mcelog tells us a weird bank number, but it has something else that’s vitally important; the address. ADDR 9ce494000 is a memory address on the faulty stick, and dmidecode can tell us which stick is responsible for that address:

# dmidecode -t 20
[...snip...]
Handle 0x005E, DMI type 20, 35 bytes
Memory Device Mapped Address
    Starting Address: 0x00800000000
    Ending Address: 0x00BFFFFFFFF
    Range Size: 16 GB
    Physical Device Handle: 0x005D
    Memory Array Mapped Address Handle: 0x0058
    Partition Row Position: 1
[...snip...]

This should be the problematic RAM stick, as address 0x009CE494000 is between 0x00800000000 and 0x00BFFFFFFFF. The stick has “Physical Device Handle” 0x005D. dmidecode can show us more information about this handle:

# dmidecode -t 17
[...snip...]
Handle 0x005D, DMI type 17, 40 bytes
Memory Device
    Array Handle: 0x0057
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 72 bits
    Size: 16384 MB
    Form Factor: RIMM
    Set: None
    Locator: DIMM_B1
    Bank Locator: NODE 1
    Type: DDR4
    Type Detail: Synchronous
    Speed: 2133 MHz
    Manufacturer: Samsung
    Serial Number: 32BFE65D
    Asset Tag: DIMM_B1_AssetTag
    Part Number: M393A2G40DB0-CPB 
    Rank: 2
    Configured Clock Speed: 2133 MHz
    Minimum Voltage: Unknown
    Maximum Voltage: Unknown
    Configured Voltage: Unknown
[...snip...]

Here you can look at the Locator, or the Asset Tag fields. Both show the memory slot as DIMM_B1. Now that’s something we can use! Looking in the motherboard manual, available online, one can see where DIMM_B1 is:

So that’s the bad stick, which will be going back to the supplier with an RMA.

Finding duplicate files with Bash

Someone recently asked, in #bash on Freenode, how to find duplicate files with Bash. Several options were suggested, and the user ended up installing and running “fdupes“. However, this sort of thing should be reasonably easy to do using “find” and a few pipes.

As a quick overview, what you want to achieve is to find all the files, list their sizes and names followed by a NUL ($’\0′) separator to allow for wonky filenames. You start with this, as sizes are the quickest values to compare. You then get all the duplicates that have at least one other file with the same size, and discard the rest of them. Then we want to run some sort of a checksum tool. I chose sha1sum. Again we discard the files with no duplicates. And that’s really all there is to it.

Feeling bored one night, I decided to try this. As usual, the whole thing grew entirely out of proportion, and I ended up duplicating most of fdupes’ functionality in Bash. Still, the resulting script is surprisingly fast, looks nice, has a decent help menu and doesn’t rely on anything not found on virtually every GNU/Linux machine out there. It also supports a few things fdupes does not, such as (as of 2017-07-01) null terminated output. Thus it found its way into my toolbox.

Let me present “cdupes”:

$ cdupes
cdupes: no directories specified

Description:
  Bash script that's functionally similar to the tool "fdupes"

Usage: cdupes [options] 
Options:
  -0
    Null termination. Allows piping filenames with weird characters into other tools
  -5
    md5sum. Uses md5sum instead of the default sha1sum
  -A
    NoHidden. Excludes files which names start with --> . <--
  -c
    Checksum. Show checksum of duplicate files
  -f
    Omit first match from each group. Useful with -m for cleanup scripts
  -m
    Machine readable. No empty lines between groups
    Probably only useful in conjunction with -c or -f
  -n
    NoEmpty. Ignore empty files (size 0)
  -p
    Permissions. Don't consider files with different owner/group or permission bits as duplicates
  -r
    Recurse. For every directory given, follow subdirectories encountered within.
  -S
    Size. Show size of duplicate files
  -q
    Quiet. Hides progress indicators
  -Q
    Quiet errors. Errors will not be printed. Does not hide progress indicators

$ cdupes -r /tmp
Files: 21
Same size: 13
Checksum: 13

/tmp/.fp1
/tmp/fp3

/tmp/foo/arse
/tmp/oh/file-OICAwX
/tmp/oh/file-WUJvo7

Duplicate search exited with error status: 1

ERRORS:
find: ‘/tmp/nonono’: Permission denied
find: ‘/tmp/root’: Permission denied
find: ‘/tmp/.cathedral’: Permission denied
sha1sum: /tmp/.startup.lock: Permission denied
sha1sum: /tmp/dabba: Permission denied
sha1sum: /tmp/fabba: Permission denied

$ sudo cdupes -rcSq /tmp
2ab06f95377aecc42e5a0e85573a3e7e3efa0961 157286400 /tmp/.fp1
2ab06f95377aecc42e5a0e85573a3e7e3efa0961 157286400 /tmp/fp3
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/.startup.lock
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/dabba
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/fabba
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/foo/arse
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/test/file-3muc5s
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/test/file-8ZjpE9
da39a3ee5e6b4b0d3255bfef95601890afd80709 0 /tmp/root/file-ox8qJb

If you’re interested in trying my script, or including it in your own set of SysOp tools,
you can –> find it here <–
Have fun!

Cygwin for Windows XP – fast mirror

Cygwin stopped being compatible with Windows XP in 2016. Following this post on stackoverflow, you can find the setup files and a (very slow) mirror that’s compatible.

I made my own, considerably faster, mirror of the last Windows XP compatible Cygwin release, at http://cygwinxp.cathedral-networks.org/

The mirror date is 2016-08-30, the 30th of August 2016 AD.

The setup file is available in the cathedral subdirectory, http://cygwinxp.cathedral-networks.org/cathedral/, along with a batch file that contains the command line options required to install from that mirror. I suggest you download both and run the batch file. If you only want the setup file, the essential option is the –no-verify one, which disables the use of package signatures. The signatures were not available in the Cygwin archive mirror, so I don’t have them.

Enjoy.

The Intel 82579V on Hyper-V Server 2016, installing unsupported network adapters

Repurposing an old developer machine to run some light VM’s, I was tasked with getting Hyper-V Server running on the Asus Maximus V GENE motherboard. Simple enough. Download Hyper-V for free from Microsoft, format a USB stick with FAT32, put all the files from the .ISO on the stick, install.

Diskpart Cheat Sheet

 1) Open the command prompt as Administrator
 2) "diskpart"
 3) "list disk"
 4) Enter the number of the USB drive
 5) "clean"
 6) "create partition primary"
 7) "active"
 8) "format fs=fat32 quick"
 9) "assign"
10) "exit"

But hey, no network! 🙁


“No active network adapters found.” eh?

Tuns out Intel isn’t a particularly big fan of people running servers on their consumer gaming boards, but not wanting to wait for a new NIC to be delivered, it was time to hit Google.

I came up with these links:
https://www.iarp.ca/blog/8-hyper-v-and-the-intel-82579
https://jayrbarrios.com/2014/11/19/intel-nuc-d54250wykh-installing-lan-driver-on-windows-hyper-v-server-2012-r2/

So I followed the instructions, but switched out the drivers for the latest ones and the OS for Hyper-V 2016.

NDIS reference

A short reference to Intel’s Windows driver naming scheme

Directory Desktop OS Server OS
NDIS62 Windows 7 Server 2008
NDIS63 Windows 8 Server 2012
NDIS64 Windows 8.1 Server 2012 R2
NDIS65 Windows 10 Server 2016

Driver Installation

1 => The latest Intel networking drivers, intended for Windows Server 2016, are available here (local archive). Downloaded and extracted with 7-Zip.

2 => Edited the file PROWinx64\PRO1000\Winx64\NDIS65\e1c65x64.inf, removing the lines under [ControlFlags] and copying the E1502NC lines to the next section as described in the links I found. It ended up looking like this:

3 => Disabled the driver signing, as the signature will now be invalid

bcdedit /set LOADOPTIONS DISABLE_INTEGRITY_CHECKS
bcdedit /set TESTSIGNING ON
bcdedit /set NOINTEGRITYCHECKS ON

4 => Rebooted (IMPORTANT)

5 => Put the driver folder on a USB stick, and installed it from the shell

X:
cd \PROWinx64\PRO1000\Winx64\NDIS65
pnputil -i -a e1c65x64.inf

6 => Agreed to the popup warning about installation of unsigned drivers

7 => Rebooted (IMPORTANT)

8 => Considered restoring the driver signing options with bcdedit, and then ignored it as I had better things to do

That’s it. If you’re uncomfortable modifying the .inf file yourself, you can download my already “unlocked” copy here: PROWinx64.zip

Killing a process tree, killing a PID and all children

Here are a couple of useful functions I wrote to kill a process tree. It’s useful when dealing with tcpserver, which refuses to disconnect a client for as long as a process it started, or a child of that process, still runs. It’s also good for tackling runaway unintentional fork bombs, rogue daemons and other mishaps.

The following script provides two functions:

sf_killtree

sends SIGSTOP to a process, kills all its children recursively, kills the process, sends SIGCONT and waits for it, to avoid “Terminated:” messages

sf_killchildren

kills all children of the current script without killing the script itself

 

Script:

function sf_killtree
{
    local ppid=$1 sig=${2:-"TERM"}
    if ! kill -0 "$ppid"; then
        builtin echo "killtree: ($ppid) - No such process" >&2
        return 1
    fi
    (( ppid == $$ )) || kill -STOP "$ppid" 2>/dev/null || :
    local pids=( $({ pgrep -P $ppid || ps -o pid= --ppid $ppid || :; } 2>/dev/null ) )
    if [[ -n "${pids[@]:-}" ]]; then
        for pid in "${pids[@]}"; do
            sf_killtree "$pid" "$sig" 2>/dev/null || :
        done
    fi
    kill "-${sig}" "$ppid" 2>/dev/null || :
    (( ppid == $$ )) || kill -CONT "$ppid" 2>/dev/null || :
    wait "$ppid" 2>/dev/null || :
}

function sf_killchildren
{
    local _term=$(builtin trap -p TERM)
    builtin trap : TERM
    sf_killtree $$ TERM;
    builtin trap - TERM
    builtin eval "$_term"
}

Addendum

Please also note the following from the KILL (2) man page:
“If pid equals 0, then sig is sent to every process in the process group of the calling process.”
While this does kill the script itself, unlike sf_killchildren, it can be a useful thing to end a script with, or to put in an EXIT trap.

Making a telnet server with Bash and tcpserver

Telnet servers are everywhere. They are simple, insecure by nature, outdated, and often badly implemented. Let’s do that, shall we? 🙂

A simple way to make any command line interface listen to incoming TCP connections is tcpserver, provided as part of the ucspi-tcp or ucspi-tcp-ipv6 package on Debian. Using this, we can make any Bash script available on the network.

Take a look at the following command:

# /usr/bin/tcpserver -c 10 -HR -u 65534 -g 65534 0.0.0.0 2048 /usr/local/bin/script

Here, we’re running tcpserver as root. This is to allow it to later change to a lower privilege level by switching to UID 65534 and GID 65534, nobody:nogroup. We’re also limiting the number of simultaneous connections to 10 and skipping a few reverse lookups. We’re listening on all available interfaces on port 2048.

Each incoming connection will start one instance of the script /usr/local/bin/script, which will be running as the user and group provided. If you do not specify -u and -g, the script will be running as the user that started tcpserver. Be careful with this.

Everything printed by the script to stdout will go to the client, and attempting to read from stdin will read from the client. We can test this with a short script:

#!/usr/bin/env bash
echo -n "Please enter your name: "
read -r name
echo "Hello, $name"
echo

Running this script through tcpserver, I can now connect to it with telnet:

$ telnet localhost 2048
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Please enter your name: Testy McTest
Hello, Testy McTest

Connection closed by foreign host.

That was easy. Now for the annoying stuff. RFC 854 specifies all sorts of things we should pay attention to, such as option negotiation and escape sequences. Most of this isn’t important, but we do need to pay attention to some of it if we want our telnet server to be smarter than a bag of hammers. Handling ^C (ctrl+c) would be good, as would proper telnet line endings, as if you try to connect to the above script with telnet on Windows, it won’t be pretty.

Responding properly to escape sequences requires reading incoming characters one by one – we can’t wait for a line ending. Properly formatting our line endings requires either printing every line like printf “Hello!\r\n”. Alternatively, all output could be run though another command that replaces Linux style newlines (\n) with Telnet style line endings (\r\n).

I wrote a set of functions to handle all of this for me. I included them in the Cathedral ShellFunc library, but they can run on their own if a few supporting functions are exported as well.
You can get shellfunc_telnet here if you’re interested.

With it, our script will work with Windows clients, line editing functions correctly, ctrl+c will disconnect properly, line endings are corrected, and we have login support with characters displayed as asterisks.

Here’s an example script that uses shellfunc_telnet:

#!/usr/bin/env bash
source shellfunc_telnet

echo "$TCPREMOTEIP:$TCPREMOTEPORT ++ ($$)" >&2
trap 'sf_killchildren || :; echo "$TCPREMOTEIP:$TCPREMOTEPORT -- ($$)" >&2' EXIT

# the password is "test"
testhash="9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"

sf_tn_init echo

if ! sf_tn_read -t 120 -P "Please enter your name: " name || [[ -z "$name" ]]; then
    echo -e "\nNo name, no entry"
    exit 0
fi
echo "Hello, $name"

if ! sf_tn_read -t 120 -P "Password: " -p pass; then
    echo
    exit 0
fi
echo "Your pass hash is $pass"

if [[ "$pass" != "$testhash" ]]; then
    echo "Password incorrect. Go away."
    exit 0
fi

echo "Password correct. Welcome."
echo "I will now loop and print your lines back to you."
echo "Press ctrl+c or give me an empty line to quit"

while true; do
    sf_tn_read -t 120 line
    if (( $? == 69 )); then # interrupt
        echo "Interrupted!"
    elif (( $? > 128 )); then # timeout
        echo "Too slow!"
    fi

    [[ -n "$line" ]] || break

    echo "Your line: $line"
done

echo "Bye."
echo
sleep 1

Here’s a sample session:

$ telnet localhost 2048
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Please enter your name: Testy McTest
Hello, Testy McTest
Password: ****
Your pass hash is 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
Password correct. Welcome.
I will now loop and print your lines back to you.
Press ctrl+c or give me an empty line to quit
line 1
Your line: line 1
foo
Your line: foo
Q
Your line: Q
Interrupted!
Bye.

Connection closed by foreign host.

It even works from Windows!

We can make the script run at boot time by adding it to root’s crontab as such:

@reboot /usr/bin/tcpserver -c 10 -HR -u 65534 -g 65534 0.0.0.0 2048 /usr/local/bin/script 2>&1 | logger -t tcp-2048 -p user.notice

Here, we’re running our tcpserver at boot time, we’re taking anything our script prints to standard error, which is currently a log of new connections and disconnects, and putting it in the system log.

If you’re curious about sf_killchildren, you can read more about it here.

Have fun. Drop me a line if you do something cool with this.