As you know, the PowerShell Oneliner Contest 2017 has its winner: Ka-Splam, from UK. Today I am proud to announce that Ka-Splam has accepted to answer a few questions on this blog.
Take the time to read it all and, if the PowerShell monk in you is able to solve ‘The 25 chars contest‘ hidden in his answers, leave a comment for greater glory.
1. Ka-Splam, it’s an honor to have you here today to talk about the the impressive oneliners you provided. I know that you are very active on the PowerShell subreddit and regularly competing and winning in Shortest Script challenges. Let’s get straight to the point: how one gets good at those challenges?
Oneliners take a long time to write, they’re often frustrating, they run slowly, they’re hard to read, you shouldn’t use them in production scripts, and they don’t handle edge cases or errors. There is no reason you should want to get involved in them.
Except that they’re fun.
If you like that kind of thing – if you enjoy nitpicking over details .. hold up a moment! Here’s a detail to nitpick: shortest code challenges (‘codegolf’) are what produces hard to read code. Oneliners are clean, clear, readable – what you get when you trim away the fluff and use the tools skillfully.
Happysysadm hinted that I should try and write something the community could learn from, and that’s a bit scary; what I’m going to do is solve a simple problem and then shorten it. In detail. I don’t know whether the work of writing short code has any real-world use, but I’m convinced that if you are engaged in something (anything), if you are thinking about it, experimenting with it, trying to shape it to your will – you will learn something.
Short-code challenges keep me engaged, keep me digging into edge cases in my understanding of the language in a way that classic problems like “write three pages of code to simulate an object oriented deck of cards” don’t. Nobody talked me into this kind of thing by saying it would be good for me; I have always liked using built-in tools with no need for 3rd party dependencies, utilities that consist of a no-install .exe, and reduced “bloat”.
Except writing. If you skim read this article and think it’s too long – look how much effort it takes me to shorten this puzzle. Doing that to shorten this blog post would take weeks! ??
Problem: “things in the root of my C:\ drive, where the name is more than ten characters, just their names”, and the long form code is:
$items = Get-ChildItem –Path 'C:\' $longNames = @() $items | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $longNames += $_.Name } } Foreach ($name in $longNames) { Write-Output $name }
It is 12 lines and 241 characters, which I will keep track of as we reduce it, and the output is four strings:
OpenSSL-Win32 Program Files Program Files (x86) Python27-32bit
I have made this code a bit laborious, but people familiar with PowerShell should easily follow what it’s doing – there is clear separation of the major steps: gathering data into a variable. Stop. Checking the name length. Stop. Collecting the results. Stop. Displaying the output. Stop.
(NB. Beginner programmers will still struggle – there’s nothing intuitive from everyday life about `Get-ChildItem` or syntax like `+=` or `$_.Name.Length` or piping into `ForEach-Object`, but they are common patterns in PowerShell. As we shorten the code, we move from common patterns towards rare, novelty patterns).
But I don’t want to type so much code to satisfy a momentary curiosity about folder name length and this is not an engaging or interesting puzzle on its own – once you’ve got past the basics of PowerShell there is no challenge. Instead of looking for a harder puzzle, we can make this more challenging by writing it in less code. And it’s an open ended challenge: there’s no fixed place to get to, no pass or fail, and you’re mostly competing against yourself. That’s something I like about it.
How do we write it in less code, what is that process?
You know how putting five numbers in order is easy and people learn how to sort numbers into order some time during childhood, learning by example? Telling someone steps to sort five numbers in order is much harder – that’s computer science academic work. Well, I know how to write shorter code but I don’t know how to tell other people how to write shorter code. My examples here are step by step progress, and hopefully you can learn by example.
Many of you will look at the last four lines of the long solution and think they “do nothing”. Great, just writing `$longNames` is enough for PowerShell itself to pull the items out of the array and write them to the output, and they will end up on screen. It was going to do that anyway, there’s no need for us to explicitly write that:
$items = Get-ChildItem –Path 'C:\' $longNames = @() $items | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $longNames += $_.Name } } $longNames
9 lines, 188 characters.
But it needs a bit more understanding of the language to know why writing a variable name on its own does anything, and what it does. But that is a common pattern for PowerShell users familiar with functions.
As well as offloading work onto the computer, we can offload work to the programmer’s brain – this code stores the directory listing in a variable named `$items` and pulls it straight out again, a needless double step going A to B to C. We can connect the listing output to the loop input with a pipe and skip right from A to C, but then understanding the code requires that the programmer is comfortable enough with the way the language works to follow A to C with nothing in between to hold on to:
$longNames = @() Get-ChildItem –Path 'C:\' | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $longNames += $_.Name } } $longNames
8 lines, 171 characters.
That doesn’t need more understanding – we already used the pipeline – but it’s my observation from code and questions on the internet that this change is hard for people. This changes the nature of the program from “taking clear, distinct steps, one at a time” to “a flow from start to finish, however far it goes, all at once”. Knowing that pipelining things is possible doesn’t seem to be enough – it needs quite a lot of practice for this “let the data slide through” code-shape to become comfortable and familiar.
The same thing happens from here on down – more understanding means greater leaps. A to C becomes A to E, then A to J. The heart of all “code readability” arguments might be whether the reader has enough familiarity of the patterns used in the code, rather than whether the code itself is “readable”, too long, or too short.
From here, part of me wants to squash the `if` into a single line, part of me wants to get rid of the `@() / +=` combination and have loop output to array in one go – that’s an example of what I was just writing, sending loop output into a variable without any intermediate steps is a pattern that looks weird from other languages, but gets more familiar with use – and part of me wants to get rid of the separation of “storing the names, then displaying them” by merging that into one “find and display them” step:
$longNames = Get-ChildItem –Path 'C:\' | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $_.Name } } $longNames
4 lines, 129 characters.
This uses the same understanding from earlier (“why $longNames does something when written on its own”) to understand how writing `$_.Name` does something on its own, even though the context is different. From directory listing to variable, no stops along the way. A to D.
Have you noticed these changes work on different layers of the code? Some of them are purely visual – removing line breaks from the `if` made no difference to the way the code works. Others change what’s happening behind the scenes – connecting `Get-ChildItem` to `ForEach-Object` removed an array and a variable name, uses less memory, without affecting the output. That’s something else I like – shortening the code involves understanding up and down the layers, from PS reading the code, to what it does behind the scenes, to how those parts interact with each other, to what exactly needs to happen to the data to solve the problem.
If we can remove newlines around the `if`, let’s do that a bit more and put `$longName` up on the previous line, with the rest of the code:
$longNames = Get-ChildItem –Path 'C:\' | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $_.Name } } $longNames
Oh no!
ForEach-Object : Cannot bind parameter 'RemainingScripts'
Nothing happened when I took the newlines away from the `if` statement , but take this one away and it won’t run. This specific problem plagued me for a while with short code challenges, and following up on that error and why it happens was interesting and useful. Something else I’m not showing in this article is how many times I try things which don’t work. We’ll undo that change, and let’s get rid of the names array instead, and output directly:
Get-ChildItem –Path 'C:\' | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $_.Name } }
3 lines, 98 characters.
Some of you have been hitting your heads on why I’m using a loop and a test, instead of merging them together and using `Where-Object` – why two steps instead of one?? OK let’s merge those:
Get-ChildItem –Path 'C:\' | Where-Object –FilterScript { $_.Name.Length -gt 10 } | ForEach-Object -Process { $_.Name }
5 lines, 136 characters. Longer.
I joke, you probably expected this shape:
Get-ChildItem –Path 'C:\' |Where-Object {$_.Name.Length -gt 10} |Select-Object -ExpandProperty Name
1 line, 99 characters (yay! Dropping below a 100 char cutoff is pleasing)
This is another dimension of it – my early changes made it shorter, no question. But we’re now at the point where some changes to make things shorter don’t quite do the same thing, and so they need another change elsewhere to compensate, and the whole thing ends up longer. Squashing things into a small space is easy at first, but after a while every push in one place makes things pop up somewhere else, progress slows down, and that’s a hint that you’re getting past the easy gains. Maybe it’s a good place to stop?
I had to do something, the `if` test wasn’t just checking the length it was also extracting the `.Name` property. `Where-Object` can only do the length test, so the Name property needs to be handled in new code. That’s another dimension, going from “script” to “oneliner” towards “codegolf” means testing the rules of what’s allowed as output. If the full directory listing output is OK, then we can cut the entire last chunk from this code, if it’s mandatory to just output the names then we have to get the Name out. (Tip: argue with whoever set the puzzle that your rule-bending output should be valid ;-))
And the shape of the code changed by merging the loop with the `if` test – it now uses `Select-Object`. Writing short code requires knowing several different ways to do things, so practicing writing code in different ways means you can “choose the shortest way you can think of” compared to people who only know one way.
Back to my original nitpick right up at the top, I’m going to call that last example a oneliner: there’s no variables, no loops, just three pipeline blocks neatly connected, one each for the three stages of the problem – get data, process it, output it. No stops. It’s 8% of the lines and 40% of the typing, and does the same thing.
Let’s not end here, let’s start here – assuming I still didn’t want to type all that at the console, let’s go through the same tricks again:
1) Offload work to the computer
2) Offload work to the programmer
3) Use greater understanding of the language to do (1) and (2)
That means:
• `-ExpandProperty` – parameter can be shortened; if PowerShell can match what you type to a single parameter, it will. Tiny bit of language knowledge, quite common.
• Aliases: `where` and `select`, very common knowledge.
• Default parameters: with `Get-ChildItem` `-Path` is assumed for the first parameter, if you don’t type it. So commonly used, I used it all the time before I even knew that’s what was happening. Don’t tell the computer to go that way, if it was going that way anyway.
Now:
Get-ChildItem 'C:\' |Where {$_.Name.Length -gt 10} |Select –Expand Name
71 characters.
We just knocked a quarter of it off, and it’s still almost the same. I’m happy this is the kind of code I’d write at the command line, off the top of my head, share with people, but not use in production scripts.
Repeat: same again – make the computer do work, use greater knowledge of the language, etc. etc.
• Aliases: `?` and `gci`, quite common.
• In parameter parsing mode strings without spaces don’t need quotes, again common.
• `-Exp` is still unique, instead of `-Expand`, also common.
These changes make it:
gci c:\ |? {$_.Name.Length -gt 10} |select -exp Name
52 characters.
Another 30% reduction. Looking like a traditional “oneliner” now, getting unreadable, approaching “codegolf” territory. There isn’t a distinct cutoff that I know of, but you can see it’s nothing like the earlier code, and yet you followed the blog post down this far step by step and you can also see it’s exactly like the earlier code.
Code so far has enough whitespace to be clear and readable, but “shortest code” means delete all the spaces. The space between `c:\ |` can go and everything will work. Remove the space between `gci c:\` and it won’t work. Removing the space from `} |select` is fine, removing the space from `-exp Name` isn’t. Trying these will give you errors, and exploring why the errors are coming involves learning something about PowerShell.
Now we’re at the point where there’s a couple of spaces I could trim to drop just below 50 chars, but that must be almost as far as it goes, right? We list the directory contents, check the name length, expand the property. What else is there to get rid of?
How far can it go? This is where it gets fun and challenging.
# remove the easy spaces, and gci has an alias 'ls' for –4 chars ls c:\|?{$_.Name.Length -gt 10}|select -Exp Name #48 chars # select can take a pattern for the name as long as it's unique -2 ls c:\|?{$_.Name.Length -gt 10}|select -Exp N* #46 chars # comparisons and other operators don't always need spaces -2 ls c:\|?{$_.Name.Length-gt10}|select -Exp N* #44 chars # If you know how PowerShell handles properties on Arrays, rewrite for –8 (ls c:\|?{$_.Name.Length-gt10}).Name # 36 chars # Or, ForEach-Object can expand a property, a little known feature, -2 ls c:\|?{$_.Name.Length-gt10}|% N* #34 chars
35% gone. And we dropped below three blocks, momentarily, with a big rewrite – after going towards wildcard patterns, changing to a parentheses wrapper and back to the full `.Name` still saved a lot. Come on brain, what else can we dig up? TYPES! I haven’t mentioned casting yet, and that is a huge part of it. Look at this:
gci c:\ | ForEach-Object { $_ } # directory listing output gci c:\ | ForEach-Object { "$_" } # names only !
If you force the output items to be a string, they become just the name, not the whole directory listing, or the full path. For this problem, that’s convenient. For others, it isn’t. In other code, casting between types is incredibly common and tricks to cast between arrays, strings, numbers, are very useful. Let’s abuse the string cast and get rid of calling `.Name` entirely:
# Now $_ is a string name ls c:\|%{"$_"}|?{$_.Length-gt10} # 32 chars # but wait, there's a trick with `Where-Object` to avoid the scriptblock # we couldn't use it for a double lookup Name.Length but now # we have unlocked it, because we're working with one property, -2 ls c:\|%{"$_"}|? Length -gt 10 # 30 chars # and that trick can take patterns for the property name, -3 ls c:\|%{"$_"}|? Le* -gt 10 # 27 chars
What else do I happen to know about strings, casting, types, pattern matching? Regular Expressions! I skip over another pile of background knowledge and edge case behavior, and show:
# Completely different array filtering approach, based on using a regular expression to count (ls c:\)-match'.{10,}'|%{"$_"} # 30 chars (boo) # use a previous short version of `%` again, -3 # "going back to something I was using" happens a lot (ls c:\|% n*)-match'.{10,}' # 27 chars (Q: why are the parens needed?)
Wait. This is getting silly. Almost halved the 52. And if you just saw one of these answers pasted into a web page, you wouldn’t see the pages of “getting it a bit shorter each time” happening earlier. It would look like “What, who can just write that, one-liners are awful”.
Anyway, I try a lot of things. I spend a lot of time on it. I drag in as much knowledge as I can find, all that matters is getting the right output. It’s a form of minimalism, if you can scrape by with a skeleton crew of code and pieces falling off everywhere, as long as it gets past the finish line once, everything else can go. At the same time, it’s not minimalism – if you can spend 1GB of memory and 5 seconds of timewasting to save 1 character, do it and be grateful for it.
Two completely different approaches, both hitting 27 characters.
This fascinates me; the early code and the tiny code are so different to humans, but do the same thing.
This
$longNames = @() Get-ChildItem –Path 'C:\' | ForEach-Object –Process { If ($_.Name.Length -gt 10) { $longNames += $_.Name } } $longNames
Is the same as this
ls c:\|%{"$_"}|? Le* -gt 10
and this
(ls c:\|% n*)-match'.{10,}'
People find one more readable, more writable, but the computer is never confused. This makes me feel there’s something really interesting underneath this about “expressing computation”. How can they be so different and “do the same”? How much of the code is important, how much is fluff? What features could this language be missing that could make that computation shorter or clearer?
Is that is? Can it go below 27 characters? (hint: yes, I have a 25 .. so far)
One last tip for those willing to compete in codegolf competitions: spend a lot of time on them, and then present them as a finished script that looks effortless. And practice rewriting and rewriting in different ways. And shamelessly steal every code-shortening idea from other people that you can possibly find, put them ALL in.2. Tell us a bit about yourself and about the way you got to PowerShell.
I have a long dislike of writing code as an amateur and sharing it with people, only for them to say “I haven’t got Python, or a Java runtime” or “how do I install that” or “what’s the Visual Basic runtime?”. I envied the Linux distributions with their built-in C compilers and Perl and Python, almost as much as I didn’t enjoy VBScript.
When PowerShell came to Windows I jumped on it. Windows Vista, PowerShell 1 or 2. At last a powerful scripting language everyone would have by default!
I didn’t understand it, and I dropped it.
A few years later, 2012 ish, working in IT with Exchange requiring it, newer versions getting better and better, it grew on me. Then it took over from Python as my everyday playing around language, now I’m a J. Snover’s Witness.
3. Can you tell us your approach to Task 1?
Task 1 mandated the use of `Win32_Share` and the shortest way I know is with `gwmi`. After a few trials and errors, I thought of the forced-cast-to-string approach mentioned above and checked to see what happens – one output object becomes:
\\Computer01\root\cimv2:Win32_Share.Name="ADMIN$"
That’s so close to the required output format, string cutting to get rid of the middle bit has got to be one of the shortest possible routes to an answer. And trim the annoying trailing quote. Luckily I like Regular Expressions, so a bit of `-replace` experiments later and I have something that seems roughly as good as I could ever get it.
4. How did you cope with Task 2?
I stared at it, and I saw it needed `$question` to be used and print this output:
The B in Benoit B. Mandelbrot stands for Benoit B. Mandelbrot.
The words “in, stands, for” are not in $question so they must be in my code. “Benoit B. Mandelbvrot” is in the answer twice, remove that duplication, fin.
In my head was the shape “output string, get the first `Benoit B. Mandelbrot` from `$question` with string manipulation (== probably regex), use it and store and re-use for the second place it appears, probably in a sub-expression in the string”.
And then trial and error until I had it quite short. As it happened, SubString came out shorter than regex. This rarely happens. The most pleasing adjustment in my answer was taking the space in front of `Benoit` from the input as well.
5. Your oneliner for Cosine Similarity is impressive. Can you explain how you got to such a short solution?
Nope ?? The whole of this article is trying to answer this question. All that – trial and error, codegolf experience, weird language behavior edge cases, looking at the language specification, spending lots of time, knowing a pile of short-code tricks – that’s how.
This one was scary. I hadn’t heard of it before, I’m not a skilled mathematician, and the Wikipedia page math-terminology-explanation was no help. It took me quite a while of Googling before I decided to look at the Reddit discussion, expecting other people to have finished it already. Other people were puzzled and that was a bit of a relief.
After finding C# examples, explanations, discussions, I started to get a clue. It’s word counting, adding, dividing, then I could start to make sense of the Wikipedia equation – A1*B1 + A2*B2 … An*Bn on the top. A1 squared + A2 squared … An squared on the bottom left. Same for B on the bottom right. I can make those work.
Something which isn’t covered in the previous few pages, is the way I split it into smaller parts – lots of testing how to split the input strings into words, lots of testing ways of counting words in sentence 1 vs sentence 2, before I started combining the code together and trying for a full answer. Then a lot of struggling to get past the “no semi-colons” restriction.
Particular techniques in this answer:
$a=1 # assign a variable ($a=1) # assign variable, and use value 1 right here in the code as well
foreach () { } $t # because, as noted earlier ... | Foreach-object { } $t # this structure doesn't work
Instead of writing these:
$lowerLeft += $a * $a $lowerRight += $b * $b
with a semicolon:
$lowerLeft += $a * $a; $lowerRight += $b * $b
It was a moment of insight to use:
$throwawayVar = ($lowerLeft += $a * $a), ($lowerRight += $b * $b)
Programming by side effect, making an array of the results and ignoring it. It still looks like a very redundant answer to me, three big repeating patterns, I was expecting someone else to get rid of them and be 1/3rd shorter.
It was only by coding it that I came to understand what it did, and spent a while pacing up and down explaining to myself what an N-dimensional cosine means, why it makes any sense at all relating to documents and words, how it measures document similarity, and imagining dogs pulling in the ‘bacon’ direction vs the ‘dog park’ direction. So that was fun.
TL:DR:
One dimension:
0 — 1 — 2 — 3 — 4 — > Bacon.
A dog which pulls strength 3 in the bacon direction is similar to another dog which pulls strength 3 in the bacon direction. They are different from a dog which pulls strength 1 in the bacon direction.
A document which says “bacon bacon bacon” is similar to another “bacon bacon bacon”, but they are different from “bacon”.
Two dimensions:
/\ Dog Park (up)
|
0 — -> Bacon (right)
Dogs which pull strongly towards bacon are similar. Dogs which pull in both directions equally, can’t make up their minds, and go off at an angle – are similar. Dogs pulling in the Dog park dimension are similar.
Dogs pulling towards bacon are a bit different from dogs pulling in both directions and going at an angle.
Dogs pulling towards bacon only are very different from dogs which pull towards dog park only.
Exactly how strongly they pull in each direction, determines which angle they go off at, mostly bacon, mostly dog park, or split the difference.
A document which says “bacon bacon park park bacon park” is similar to “park park park bacon bacon bacon”. It pulls in both dimensions and goes off at an angle. It’s a bit different from “bacon park park park” which pulls more towards the park. It’s very different from “park park park park park” which pulls only towards the park.
Each word is a direction, a dimension. The word count is how strongly the document “pulls” in that direction. After all the pulls combine, the document goes off at an angle. Different word counts make different angles. This equation works out “how different”. I can’t visualize more than 3 dimensions, but I can up to 3 and it now makes sense that it works. More words, more dimensions, same idea.
6. What’s your take on Powershell for one-liners if compared to other languages you might know?
For shell use oneliners, there isn’t any other language quite in the same category so it’s great. For general purpose programming one-liners I like it, plenty of convenience syntaxes. Some things that annoy me. I tend to forget it’s an admin scripting language, and instead wish it pulled in every convenience feature from every language I’m dimly aware of.
http://codegolf.stackexchange.com is a multi-language site, and in my experience the answers (other people’s answers, I don’t know most of the languages!) separate themselves into tiers:
• Really short: single-purpose languages designed for golfing – GolfScript, CJam, many others.
• Short: the major scripting languages – Perl, Python, JavaScript, PowerShell, Ruby, etc.
• Medium: mainstream languages, older languages with fewer built-in conveniences (C#, Java, Lisps)
• Long: novelty answers – SQL, etc.
PowerShell has a reputation for being long and wordy, but the language designers did a fantastic job with the “elastic syntax” to make those things optional, and with things like automatic module imports and type accelerators, being a shell and having direct access to the filesystem it is strong. Other languages save not needing `$` on variable names, but then can’t put variables in strings easily. Or they have better string slicing but worse regex subroutines. PowerShell tends to be slightly behind the other popular languages, often because you need quotes and parens so much or don’t have quite as easy int->char->string conversion.
What really seems to make the difference is whether language X happens to have a convenience feature that fits the question. Often the languages trade places for different problems, so JavaScript might have a short way of doing one thing, but C# has a lambda expression and Linq combo which surpasses it for other things, and Mathematica has an inbuilt list of country names so it takes a winning place in some specific question, etc. PowerShell is fun, competitive enough.