• Matthew Kaufman

Getting Credit in Code and Life

When all you have is 1s and 0s, how do you prove you’re the one who put them there and not someone else?

One of the scariest things for anyone who creates anything is the concept of having that work stolen.

The theft of people’s intellectual property is such a massive ordeal that the United States, among most other countries, has begun placing very strict enforcement of intellectual property. The terms "trademark," "patent," and "copyright," are now familiar even to young children.

We have a few mainstream ways of confronting this difficulty. In quite a few cases, we protect them by watermarking images, copyrighting materials, and by branding our ownership into the work itself.

What about programmers though? How can we personalize our mechanical numbers and equations? Ironically, the answer arises from a far older genre of created work: The wonderful world of books.

Within the realm of writing, there is a study called Authorship Attribution.

Authorship Attribution is all about stylometry.

Alright, keep breathing. Before you leave, let me tell you that stylometry isn’t nearly as complex as it may sound.

Stylometry is the attempt to credit people for work on something that is anonymously written or in dispute. So if two people are arguing over who wrote something, then you would look at the stylometry of the writing to attribute the proper author credit.

Now with the vocabulary lesson out of the way, what can this mean for software engineers? The answer is a lot, actually.

When we are trying to determine who wrote a work, some of the big things we talk about is how they grammatically structure things, the vocabulary and frequency of words used, and the way they space and punctuate.

This all may seem like common sense, because if someone puts two spaces after a period or one, it’s easy to tell who wrote it. Likewise, if someone uses very long sentences, uses a lot of big words, uses the same word very often, or speaks in a slightly unique fashion, then it’d also be easy to tell. When Matthew Emerson writes an article for this blog, it reads differently--and not just because of the subject matter. Our stylometry is different.

Today, we apply stylometry to programming as well.

Programmers utilize different vocabularies depending on their style.

(See “Actually, Please Do Reinvent the Wheel - Lessons From a Computer Programmer” for details) and the way they use various parts of their programming vernacular is a dead giveaway.

As for how we space things, I would like to kindly advise you to not ever ask programmers about a certain holy war called, “Spaces vs Tabs”. If you do ask someone about this, I advise you to nod and agree with them and not point out the frivolities in the whole ordeal.

I have my own opinions on the subject. If I try to go into them now, though, Dan will politely ask me to leave the site...


Okay, I can't resist!

Check out the following pieces of code.

Even to someone who doesn’t understand the awkward languages of code, these probably look pretty different. They both do the same thing but they are completely different in styling. The one on the left uses spaces less often, and is just a heck of a lot bigger.

Meanwhile the one on the right does almost everything in a completely opposite manner. One of them even counts up to five while the other counts down.

The crazy thing is, they both perform the exact same function. They're just written very, very differently.

If we study these sorts of quirks in style, then we can usually isolate who wrote a given work--even if it's in a wildly different genre.

It was techniques like these that Dr. Patrick Juola applied to determine that J.K. Rowling did in fact write The Cuckoo’s Calling, that aided in determining that Sidney Rigdon was the real author of the Book of Mormon, and many more.

Today, as AIs like DeepFakes continue to sound more and more human, these techniques serve an even more important role: Making sure we know when we're being talked to by an artificial intellegence.

Nor is this approach limited to code. A devout Beatles fan recently revealed using these same techniques that the popular song, “In My Life” was actually composed by John Lennon, despite many believing for years that it was Paul McCartney.

Maybe you're not a musician. Still, if you experience what seems to be the commonality of having your social media hacked, this could be used to vindicate you and prove that those horrible messages to your boss weren’t actually you. Now let’s go even further beyond (cue Stereo Sayan).

This is a central aspect of being human.

Even if you are a cave dweller with no electricity, no inventions, no fire, not even the wheel.

When you go to carve into a wall or develop a technique for hunting your prey, you are still inventing a style innately your own. Unless you want Ooga Booga over there to steal them from you, it’ll still be important to you. So lucky for you that we have things like Stylometry to help keep your property yours.

So next time you go to speak, write, or do anything really, pay very close attention to how you’re doing it. Notice every pause you do and every little nitty gritty detail. Because it is those things, that make the way you do whatever it is, unique to you and only you.

While you’re at it, don’t forget to thank good ol’ Cross-Entropy, Kolmogorov Complexity, and empirical testing for keeping it that way.

Okay, maybe don't thank them aloud. That'd be a bit weird.

Recent Posts

See All