CALIFORNIA.SYSTEMS
California, IT, Vintage Computing, Earthquakes
This is a personal website unaffiliated with any office, agency, or service of the State of California.
Challenging IT Questions
[ Top ] [ Home ] [ Site Map ]Here are some challenging IT questions to stretch the brain of any technician. The questions are rooted in either concepts which seem difficult for most technicians (e.g.: subnetting math... even if most understand it conceptually, few can do it on paper), or in the strangest real-world challenges I've faced. These questions were meant initially to foster growth and critical thinking in my employer's service desk, which was comprised of aspiring sysadmins and network engineers. They're here now for the pleasure of sharing, in the hope they will be interesting and useful, to force myself to brush up this knowledge, and so that my knowledge might be refined should I get any of this wrong.
Check back periodically, as more are coming.
1) Subnetting.
Why is it 172.16.0.1/12 and 172.31.255.255/12 are in the same subnet, but 172.32.0.1/12 is in a different subnet? Ignore for a moment the inanity of creating a /12 subnet and accept the mask length.
Show answer
172. 16. 0. 1
10101100.00010000.00000000.00000001 <--Subnet mask zeroes or "masks" the host ID, giving us the subnet ID
11111111.11110000.00000000.00000000 <--255.240.0.0, 12 mask bits, subnet ID is therefore the first 12 bits
172. 31. 255. 255
10101100.00011111.11111111.11111111 <--"Masked" (zeroed) part is changed, but subnet portion is the same as above
11111111.11110000.00000000.00000000 <--255.240.0.0, 12 mask bits
172. 32. 0. 1
10101100.00100000.00000000.00000001 <-- "Unmasked" part is different, this is another subnet
11111111.11110000.00000000.00000000 <--255.240.0.0, 12 mask bits
Directly examining binary IP addresses and subnet masks makes this far simpler to understand than does reading an explanation, but I will attempt to clarify this anyway. I'll try to be more economical with my words than Houghton-Mifflin.
An IP address consists of a subnet ID and a host ID. A subnet mask tells you which portion of the address to read as a subnet ID, and which to read as a host ID. A subnet mask is traditionally written as the length, in bits, from the high order bit (e.g.: a subnet mask of /1 is written 128.0.0.0, a subnet mask of /2 is 192.0.0.0, and so on). A computer logically ANDs the subnet mask with its own IP address, and then again with the IP address of a communication peer, zeroing the host ID so it can compare the subnet IDs. If they match, the computer attempts to contact its communication peer on layer 2; if not, it contacts its router.
In the above case, the subnet ID is 10101100.0001, which includes addresses between 172.16.0.0 and 172.31.255.255. 172.32.0.0 is a new subnet ID, because logically anding the mask with the IP address you get 10101100.0010.
I mainly mention this because I've known techs to struggle with how shortening the subnet mask by 1 bit creates a larger subnet. Using round masks (255.255.255.0, 255.255.0.0, 255.0.0.0) is sensible to avoid confusion, though subnet trickery has its uses.
As a final casual observation, I almost never see the /12 private IP block fully utilized. Most techs seem to think it's limited to the historic class B (172.16.0.0/16) private address range; today you can utilize 172.17.0.0/16, 172.18.0.0/16, all the way up to 172.31.0.0/16).
2) Anomalous Disk Usage.
You are copying 100GB of legacy application data to a separate folder on the same volume. This you do as a precaution before making a risky change to the legacy application. The system has 150GB free. But, about 90% of the way through the copy, it fails with the error, "there is not enough space on the disk." You checked and there are still several tens of gigabytes free, plenty to accommodate the remaining dataset. How would you troubleshoot this? What are some potential causes?
Show answer
Under the circumstances there are many things to check; finding the cause is a process of elimination, or troubleshooting. My process won't match yours, and it's almost certainly missing some great ideas. What matters here is the ability to reason.
Naturally the below steps are mainly useful in a Windows environment, but the concepts are applicable universally.
- Run a disk usage analysis, preferably with a visual analyzer like WinDirStat or SpaceMonger.
- If you find shadow copies, inspect shadow copy health with
vssadmin list writers
andvssadmin list providers
. VSS troubleshooting is its own topic, but for our purposes, let's assume shadow copies are not the problem and move on. - Check for a disk quota. There are a few ways to do this.
- Check the filesystem type. It's unlikely today, but FAT32 filesystems put a 4GB limit on file size (you may run into this if using an external flash drive).
- Check the filesystem table (master file table/MFT on Windows, inode table on *nix systems). The MFT contains a list of the files allocated, their size, all their attributes, ACL, and SACL. Each entry requires at least 1024 bytes; NTFS caps the MFT at 12.5% the total volume size, but it can expand further when/if needed.
My last suggested troubleshooting step delves into the exact - and extremely unusual - situation I encountered. We had a Windows application which stored its data in millions and millions of files mere bytes in size. 10 bytes here, three bytes there. From a filesystem perspective this is a highly inefficient way to store things, as every file created occupies a record in the filesystem table (256 bytes on ext4, 512 bytes on XFS, and 1024 bytes on NTFS), plus one sector (today usually 4096 bytes) on the disk. A combination of extreme disk fragmentation, overallocation, and sector exhaustion culminated in our inability to create new files on the disk in spite having several gigabytes free.
3) Network Loop.
Why do network loops bring down simple networks? Explain in detail.
Show answer
To understand why a loop crashes a simple network, we need to examine how layer 2 works.
Layer 2 handles communication between nodes on the same physical network, e.g.: two computers in the same building. Unless a router exists between the networks, computers use MAC addresses as sources and destinations. Since it's the switch's job to forward frames to specific MAC addresses, it watches traffic to learn which MAC address is connected to which switchport.
All devices on the network must learn the MAC addresses of their peers, however. This is accomplished through address-resolution protocol (ARP) by way of ethernet broadcasts. Say computerA, 172.24.24.16 wants to communicate with computerB, 172.24.24.18. computerA sends out a frame to FF:FF:FF:FF:FF:FF asking who has 172.24.24.18. All connected systems receive the request, but most ignore it. computerB (if it actually wants to talk to computerA) responds with its MAC address. computerA stores this in its ARP table, and from here out sends its frames directly to computerB's MAC address.
The switch also maintains its own MAC address table, usually by observing traffic. If it receives a frame with a source MAC address on interface 12, it records that MAC address as using interface 12, and any frames destined for that MAC address are now sent over interface 12. However, should the switch receive a frame to an unknown MAC address, the switch floods the network. Much like computerA above, it sends a frame out to all switches asking who has the needed MAC address.
There are other ethernet broadcast sources, but flooding and ARP are the main ones. The crucial thing to understand, however, is that they are repated on ALL ports connected to the switch, and if the switch is connected to another switch, these broadcasts are repeated to ALL ports on the neighboring switch.
It's when a broadcast is repeated to the interface which sent it that the situation runs away with itself. Suppose you have an 8-port network switch, with interfaces 1 and 8 connected to eachother, and 2 through 6 connected to endpoints. A computer on 2 sends a frame to a MAC address unknown by the switch. It floods every port, sending a packet out on 1, 2, 3, 4, 5, 6, 7, and 8. And since it's the switches job to repeat any floods, the flood coming in on 1 is repeated to 8, and vice versa. Continuously, exponentially growing in volume, and nearly instantly overhwelming the network until 1 or 8 are disconnected. Same thing if any computer on the network sends out an ethernet broadcast. The resulting noise is called a broadcast storm.
Think of connected switches as acting as one. This situation is therefore possible if you connect a switch to itself, give a switch two uplinks, or daisy-chain switches connecting the first and last switch in the chain to eachother. Other culprits can include VoIP phones plugged into themselves inappropriately and probably a few other wild scenarios I've not encountered.
It's crucial to note that plugging a switch into itself doesn't always result in an immediate broadcast storm, as sometimes the switch knows all connected MAC addresses and all computer ARP tables are pretty stable. But if you suspect a loop, you can send a network over the edge by pinging something that doesn't exist.
4) Low-level System Change In Group Policy.
I once messed up badly. I take solace knowing this was in my greenhorn days.
I created a GPO targeting a handful of application servers. It added folders to the system PATH variable.
Within a few weeks, the servers were completely inoperable. While they responded to ping, they failed all logons, no applications could launch, and the issue persisted through reboots. Restoring to a backup was only so helpful, the issue would always return. Oddly, though, the older the backup the longer it took for the issue to surface.
My GPO was the problem, but disabling it was not an option.
How would you troubleshoot this? What happened to the servers? And what should have gone into the disciplinary write-up I never received?
5) Encoding.
This question is not directly practical for IT workers, but I think it's useful for anyone who works in computers to understand this. The stronger your foundation, the more patterns you recognize.
A signed 16-bit integer is stored in memory. Its decimal value is 604.
The high bit is flipped.
What is the value now?
6) No Logon Servers Available, But Network Up.
This question resembles 4, and in fact, it affected the same application servers. However, the issue was completely different.
An application server has a recurring issue: after a few weeks of running, it slows considerably over a few days before locking up and becoming completely unusable. When it fails, you find you can still ping the server, but it fails to respond to all connection attempts. You tried the application client, RDP, VNC, RPC, and even a proprietary remote console software.
Rebooting the server clears the issue, but it always comes back.
You finally decide to troubleshoot the issue live before rebooting. You connect to the server's physical console and try to log in with your new domain admin (never used before), and the process fails with "no logon servers are available." You switch to the old shared admin, which is locally cached, and log in just fine.
How would you troubleshoot this, and what are some possible causes? Justify your reasoning.