Archive for the 'Technical Guidance' Category

Ubuntu 14.04 upgrade

Posted in Technical Guidance on August 21st, 2014 by MrCranky

After the kerfuffle with Heartbleed earlier in the year, and finding out our server installation was way out of date, I resolved to keep it more current. That meant upgrading from 12.04 to 14.04. Not as painless as previous upgrades sadly, and left me with three notable problems. I’m posting my notes here in case they’re helpful to anyone else upgrading. Basically our server box acts as a DHCP server, file server (using Samba) and gateway for the internal network, as well as hosting a couple of websites which we use both internally and externally. After upgrading, I noted:

  1. Two of the hosted websites were no longer working: they were giving 404 not found messages.
  2. A persistent message being posted at startup and various points during shell sessions: “no talloc stackframe at ../source3/param/loadparm.c:4864, leaking memory”
  3. DHCP server stopped responding properly the day after the upgrade

I’d had to merge a few configuration files, of which Samba and dhcpd were one, so my initial thought was that I’d botched the merge. However there wasn’t anything obvious in the merge results that would explain why. Anyway, issue by issue:

No talloc stackframe

This one was the most easily resolved. This post points the finger at libpam-smbpass, a Samba module which seems to have an outstanding bug. Fine, it’s not functionality we rely on, so uninstalling the module makes the problem go away:

sudo apt-get remove libpam-smbpass

DHCP server stopped responding properly

This one didn’t bite me until mid-way through I was trying to diagnose the Apache issues, my DHCP lease ran out and suddenly the laptop I was remoting in to the server was without network. Super annoying. Setting the IP address / DNS of the laptop manually got me network connectivity to search for solutions, otherwise it would have been guessing blindly from the server terminal (which doesn’t have web browsing).

While there wasn’t anything hinting at a DHCP problem in the logs, I noted at server reboot time a line along the lines of the following:

Apparmor parser error for /etc/apparmor.d/usr.sbin.dhcpd at line 69 Could not open /etc/apparmor.d/dhcpd.d

I couldn’t find that line anywhere in my logs, but I probably wasn’t looking in the right place. The configuration file that is complaining is basically trying to #include the dhcpd.d subfolder, and failing. Still, it suggested some sort of permissions or configuration problem with AppArmor and dhcpd. Oddly though, DHCP had been working after the upgrade the afternoon before, and I could see successful DHCP negotiation going on from this morning, but it all ceased an hour or two before my DHCP lease expired. All of my searches were throwing up results for the package isc-dhcp-server though, whereas I was pretty sure the package was dhcp3-server. On checking, isc-dhcp-server was not installed. Installing it:

apt-get install isc-dhcp-server

Lo and behold, DHCP was functional again, using our already existing configuration. So, I’m guessing, the packages on our legacy machine (upgraded using do-release-upgrade from 10.10) aren’t properly handled by the release upgrade procedure, and were left with folder permissions set incorrectly; which was fixed by installing the correct DHCP server package.

Apache website issues

Ubuntu 14.04 brings with it a fairly major upgrade from Apache 2.2 to 2.4. While the web server was still functional, and I could access pages resting directly under the DocumentRoot, our two sites set up using Alias directives were no longer accessible. Both returned 404 errors. Using a symbolic link in the filesystem under the DocumentRoot would allow them to be accessed, but that wouldn’t allow us to enable/disable the site at will. While there are changes to the permissions system in 2.4, we don’t use those with our sites. So all very odd.

Our setup was very simple: each site had a configuration file that only contained a single Alias line, remapping the appropriate site folder to the folder on the local disk. Further experimentation showed that we could shift the same Alias line into the default site configuration, and have it work. It gave a 403 Forbidden error, but not a 404 any more. Adding an appropriate Directory element with a “Require all granted” directive inside fixes the 403. So presumably the default permissions for an aliased directory have changed to deny by default instead of grant.

So from that I can only conclude that Apache 2.2 was more forgiving of having Alias directives standing alone in their own site .conf files, for whatever reason. I’m probably missing some nuance of the setup as to why it worked before. Rather than spend too much time figuring it out, I’m going to just go with having the sites as a part of the main site instead of as sites on their own.

Conflicting ideas about the size of STL strings

Posted in Coding, Technical Guidance on July 18th, 2012 by MrCranky

This post is one of those “I couldn’t find it when I was Googling, so here’s a succinct description of the problem / solution so other people can avoid the same round-about research.”

Symptom:

You have one bit of code (perhaps a library or a DLL) which thinks that sizeof(std::string) is 28 bytes, and another bit of code which thinks that it is 32 bytes. In Release mode they both agree that the size is 28 bytes. In our case it was actually std::wstring, but both string objects are actually the same size and exhibit the same problem.

Diagnosis:

You have a mismatch in your configuration between the two projects, essentially you’re trying to mix Debug code and Release code, which is just fundamentally not allowed. This much information is readily available on the Internet with some basic searching, but crucially most of those places don’t tell you the one piece of information you really need: exactly what setting is different? Which one of the dozens of settings that typically differ between Debug and Release is the STL code actually paying attention to?

The real answer lies in the details. It is not a Debug vs Release problem (well it is, but only indirectly). If you’re like me, the first thing you checked was the presence (or absence) of the _DEBUG or NDEBUG pre-processor directives. After all, they’re the defines most often used to get differing behaviour between the debug and release builds. You’ll find however that those definitions have no bearing at all on the size of std::string.

Now is probably a good time to visit this Stack Overflow question which links to good information on the subject.

In fact, the root cause is the presence and value of the preprocessor definitions _SECURE_SCL and/or _HAS_ITERATOR_DEBUGGING. If these are defined and set to 1, then sizeof(std::string) will be 32. If they are defined and set to 0, sizeof(std::string) will be 28.

More troubling is that even if those definitions aren’t explicitly listed in the set of pre-processor definitions, I believe the compiler (the Visual Studio compiler at least) will define them for you, based on its own internal logic. _SECURE_SCL will always be 1 for both debug and release builds, but _HAS_ITERATOR_DEBUGGING will be 1 for debug builds, 0 for release builds (as it has a tangible performance impact). You can explicitly turn off _SECURE_SCL to get more performance if you want, but you should understand the drawbacks before you do so.

I will update this post if I find out more about the internal setup of those definitions, but simply knowing that they are the cause of the size difference is usually enough to get to a resolution. I would certainly recommend adding some logging to both code modules that spits out the value of these two defines so it’s clear to you what the values are on both sides.

Resolution:

For most, an immediate solution is to simply manually define iterator debugging to be on or off in both projects so that they are consistent. To do that, simply add _HAS_ITERATOR_DEBUGGING=1 (or 0) to your project’s preprocessor definitions.

You may want to avoid setting it explicitly (ideally you’d simply rely on the compiler defaults), in which case you’ll need to figure out why iterator debugging is enabled for one module but not the other. For that I’m afraid you need more information about how the compiler decides to set those defines, but presumably another one of your project settings is indirectly making the compiler decide that iterator debugging should be enabled or not, and it is that setting which is different between your two modules.

Migrating drives

Posted in Technical Guidance on June 17th, 2011 by MrCranky

So one of the most annoying things about the internet for computer fixing is that a) a lot of the people asking questions aren’t technical, so the problem reports are spotty at best, and b) a lot of the people providing answers think they know more than they do, so the answers often either conflict, or are just plain misleading. Worse, they’re usually just a list of commands, without any context as to why you’re doing these things, so it’s hard to know if they’re even appropriate. Often-times what might appear to be the same situation is in fact cause by a completely different underlying problem, and following instructions blindly will just make things worse.

So here is a guide, intended for those readers who want to try to understand exactly what is going on with their computer, and why it’s gone wrong. You’ll have to be prepared to stomach a bit of technical jargon, but I’ll try to be clear. I can’t claim full knowledge on this, but I’ve been working with PCs for over 15 years, and I’m pretty confident I understand what is going on.

My problem arose when I was shifting my existing stuff to a new hard drive, as the only one was reaching the end of its life. I’d like to write up my situation and how I fixed it, in the hope it will be more useful for others than the internet search results I came across while figuring out what I needed to do.

The Situation

Some time ago, I upgraded from Windows XP to Windows 7, and at the same time bought a small Intel SSD to put it on. I’d heard bad things about the upgrade procedure, and felt it was time for a clean install anyway, so I installed W7 on the SSD from clean, no upgrade. The fact that it was an SSD isn’t relevant here, this would happen with regular hard disks too. But what that meant was that I had two operating systems available on the computer. To its credit, the W7 installer was fine with this, and once installed, I had the option of booting either operating system. I got my W7 installation set up the way I liked it, and eventually deleted the XP install. Again, all was well.

I have several disks on my machine, but only two are relevant here: 1) the SSD with a single partition on it (C:), and 2) an HD with two partitions (D and F). Crucially, the D partition was where the XP installation used to reside, and the C partition is where the W7 installation lives. I wanted to migrate the D and F partitions to a larger new disk, and simply remove the old drive. This I did, with the help of Norton Ghost and it’s drive copying functionality. So at this point I had partitions C (SSD), D & F (HD1) and K & M (HD2). I would then reassign drive letters such that the new drive would have partitions called D & F, and the old drive would have no letters at all (and could be quietly removed from the system).

The Problem

As soon as I removed the old HD (HD1) with the D and F partitions on it, the machine would no longer boot, prompting me to insert a system disk. Explicitly choosing the SSD from the machine’s boot menu or reordering the boot order made no difference. Re-connecting the old disk, everything was fine again.

Diagnosis

To boot from a hard disk, a computer needs an ‘active’ partition (active or not being a property set in the partition, usually when it’s created). Normally there is only one active partition on a machine, but if there is more than one, the order in which the computer looks at the disk becomes important (hence the setting in the BIOS to change the boot order). On an active partition, the computer expects to find a Master Boot Record (MBR), which will tell it where it should look for a program which can start an operating system. In a typical, simple setup, the MBR lives on partition 0, the C drive, along with the operating system. But there’s no reason it has to. You can have an MBR on one partition pointing to another partition altogether. And that is what happened here.

Originally, the MBR lived on the partition now called D (it was C back then), with the XP operating system. When it came time to install Windows 7, the W7 installer put itself on C, but it didn’t make a new MBR (because there was already one available). Instead, it simply modified the existing MBR so that it could boot either W7 or XP. Whenever the machine booted, it would look at the SSD, find no active partition, and move on to HD1, where it would find an active partition and MBR, which then pointed it back towards partition C and the W7 install. Everything happy.

When I removed HD1, I was left with the SSD and HD2, neither of which had an active partition or an MBR. So the system did not know where it could boot from, and complained.

Solution

I needed to make the W7 partition active and bootable again, so that the system would operate even if the old disk was disconnected.

To do that, I dug out my W7 installation CD, and put that in the drive (making sure that the BIOS will boot from the CD before trying the HDDs). After starting and selecting a language, you are presented with the installer, but under that is a repair mode. Selecting that, I could tinker with the existing setup. It tried to find a viable OS to repair, but said there were none available (even though I knew the W7 install was still there). I’ve deduced this is because an OS not installed on an active partition doesn’t count. However it still lets you click Next, and gives you various options to work with. Startup Repair (the user friendly option) didn’t work, basically because there wasn’t anything to repair because the repair system didn’t realise the W7 install was there. Again, advice on the internet seems to be just ‘run Startup Repair a few times and it will fix it.’  That’s bad advice, you’re much better off trying to understand what’s currently wrong, because that will guide you as to how to fix it.

From the command prompt, you can run various tools to interrogate the current setup. With other MBR related problems, the advice is usually to just run ‘bootrec /fixmbr’ and ‘bootrec /fixboot’. For me, /fixmbr did nothing (presumably because there was no MBR to fix), and /fixboot gave the error ‘Element not found’. I think the former problem is because there wasn’t an MBR available to fix, and the latter was because bootrec relies on knowing which partition to put the new MBR on. Because the W7 install hadn’t been detected, bootrec had a choice of several partitions, and didn’t know which one to use. It may be that it would work just fine if the W7 install had been detected (if the partition was marked active).

However, from the command prompt you get access to the diskpart and bootsect tools, which are more helpful, even if they do require more technical savvy. I had two immediate problems, 1) the C partition wasn’t active, and 2) there was no MBR on the C partition even if it was active. Both problems needed fixed before I could progress.

Bear in mind, when running from the installation/repair CD, the drive letters your drives are assigned may not correspond to their normal assignments. So I’d advise the following steps:

  • At the repair command prompt, run ‘diskpart’
  • Type ‘list volume’ to get a list of volumes. One of these will be the CD/DVD drive, note which one (for me it was H); another will be the partition you want to boot from (for me it was D), note that one as well.
  • Type ‘list disk’ to get a list of disks. One of them will be the disk you want to boot from (you’ll have to recognise it based on size / brand).
  • Type ‘set disk X’ (replace X with the correct disk number).
  • Type ‘list partition’ to get a list of partitions. Again, one of them you want to boot to.
  • Type  ‘set partition X’ (replace X with the correct partition number).
  • Type ‘active’ to make the right partition active.
  • Type ‘exit’.

Now your boot partition should be active, but it doesn’t yet have an MBR on it. To get that, you need the bootsect tool. That tool is on the installation DVD, but in a subfolder.

  • Type ‘H:’ (or whatever your DVD drive was called.
  • Type ‘cd boot’ to move to the subfolder containing the bootrect tool.
  • Type ‘bootsect /nt60 D: /mbr’. This will write a new boot sector / MBR to the partition called D. The /nt60 is for Vista or later operating systems.

This should result in the computer now finally being able to boot from the local disk rather than the CD.

New problem

When booting, the message ‘BOOTMGR not found’ is displayed, if you try to boot from the disk you just made active / bootable.

New diagnosis

Now we have progressed a stage. Instead of the BIOS telling us that it didn’t even know which drive to boot from, instead now it is telling us that the drive we told it to boot from isn’t as bootable as we claimed it was. Booting a drive is really just running a particular program – the information in the MBR is not just ‘what partition do I boot from’, it’s also ‘what program do I run from that drive’. For Windows, that program is BOOTMGR, which it expects to find in the root of the bootable partition.

So when I installed W7, not only did it not make a new active partition or MBR, it also didn’t put the bootable software in the new partition. Instead it just modified the configuration for the old (XP) BOOTMGR which used to live on D, and told it about the W7 installation on C instead.

New solution

We need to get a copy of the boot software onto the bootable partition. Thankfully, this is the job of the operating system, and if we boot into the repair disk one last time, we can get it to help us.

Boot from the installation CD, and go to the repair menu. Now we’ve made the W7 partition bootable and active, it should be correctly found by the repair option, and show up in the list of operating systems. For me it was marked as ‘recovered’. There was also another ‘recovery partition’ recovered as well (I believe this is used for other sorts of system recovery, although it was useless in this situation), which I ignored.

On selecting Next, we get the same list of repair options as before. This time, we can select ‘Startup Repair’, and let it do its thing. If you click on the option to view more details about what the Startup Repair is going to do, it should list all the things it checked. For me, the file system and various other things were fine (reported as error code 0x0), but it correctly detected that the boot software was damaged/missing, and needed replaced. Allowing it to proceed, and restart, and hey presto: after a reboot, the W7 partition is correctly booted from, and normal operation is resumed.


Email: info@blackcompanystudios.co.uk
Black Company Studios Limited, The Melting Pot, 5 Rose Street, Edinburgh, EH2 2PR
Registered in Scotland (SC283017) VAT Reg. No.: 886 4592 64
Last modified: February 06 2020.