WhitneyRip 👾

Crash

Sorry I crashed Whitney’s website

The project was to build a web scraper and get as much data as possible from anywhere, so I wrote a scraper to get all the available art piece from Whitney Museum of American Art.

Click Here to see the Leaks / Full Archive

The Hack

Scraping Logs

found whitney image url - http://collectionimages.whitney.org/standard/176757/largepage.jpg
seems to be randomized number naming
wrote a bash script to download all images by running numbers from 1 to 300000 (./rip.sh)
divided the work load into three days since the progress took too long to complete

obtained a total of 18154 folders, 387.2 MB, with number naming of the image as the folder name

found an easier way to scrape whitney's collection
they index their objects with - http://collection.whitney.org/object/(number)
plugged in the total number of items, 18154, obtained from the previous process
http://collection.whitney.org/object/18154

realized that the previous process missed one item

to total collection is 1-18155

began development for a node scraping tool

figured out a way to loop from 1 to 18155 with node module - lupus

discovered whitney's DDOS prevention system

got soft banned from the server for 10 minutes

solved the problem with node module - system-sleep

saving data to data.json file with node module - jsonfile

discovered a problem with the node scraping tool,
found out that whitney rendered some pages with a special vm binding structure
that prevents my node scraping tool from getting the html elements (ex:http://collection.whitney.org/object/9)

"<div id="main-webapp">
<noscript>
&lt;a href="/artists/by-letter/A"&gt;Browse artists&lt;/a&gt;
</noscript>
</div>"

------------------------------------------------------------------------------------------------------------------------

Visualization One

began working on a visualization with images obtained from bash scrape. (/public/index.html)

wrote a bash_script (./count.sh) to obtain whitney's image number naming from folder names and save to data .txt in /img

created a data.json file with data.txt

ran into a problem of too many files that crashed the browser, maximum serving is 100 images now

looking for some javascript solutions to load all images



Visualization Two

Since I have too many images, and the browser cannot handle the workload, so I decided to take a different approach and show one image at a time

inspired by Fifty Shades of Grey, I created 18154 Shades of Whitney with a show me button
when the button is clicked the site will show you one shade of whitney from the 18154 shades.

found error - http://collectionimages.whitney.org/standard/13663/largepage.jpg whitney took out access to this image from the public domain.

“Forbidden

You don't have permission to access /standard/13663/largepage.jpg on this server.”

Scraping with node server

node

Scraping with bash scripe

bash

Code

scraper.js

var request = require("request"),
  cheerio = require("cheerio"),
  lupus = require("lupus"),
  sleep = require("system-sleep"),
  jsonfile = require("jsonfile"),
  data = [],
  file = "data.json",
  port = 9000;

//1-18155 (18156 for search) 25.216 hrs to complete on6226
lupus(13620, 18156, function (n) {
  //console.log(n);
  request(
    {
      method: "GET",
      url: "http://collection.whitney.org/object/" + n,
    },
    function (err, response, body) {
      if (err) return console.error(err);
      $ = cheerio.load(body);
      console.log(n);
      //if($('h1.xlarge').length == 0){
      $("div#main-webapp").each(function () {
        if ($("div.detail.artist").length > 0) {
          var detailArtist = $("div.detail.artist", this),
            detailTitle = $("div.detail.title", this),
            detailDate = $("div.detail.date", this),
            detailOM = $("div.detail.object-medium", this),
            detailDim = $("div.detail.dimensions", this),
            detailCL = $("div.detail.credit-line", this),
            detailAcce = $("div.detail.accession", this),
            detailCR = $("div.detail.copyright", this),
            artImg = $("img.artwork", this).attr("src"),
            holder = "p.info";

          console.log({
            id: n,
            img: artImg,
            artist: $(holder, detailArtist).text(),
            title: $(holder, detailTitle).text(),
            date: $(holder, detailDate).text(),
            medium: $(holder, detailOM).text(),
            dimension: $(holder, detailDim).text(),
            credit: $(holder, detailCL).text(),
            accession: $(holder, detailAcce).text(),
            copyright: $(holder, detailCR).text(),
          });

          data.push({
            id: n,
            img: artImg,
            artist: $(holder, detailArtist).text(),
            title: $(holder, detailTitle).text(),
            date: $(holder, detailDate).text(),
            medium: $(holder, detailOM).text(),
            dimension: $(holder, detailDim).text(),
            credit: $(holder, detailCL).text(),
            accession: $(holder, detailAcce).text(),
            copyright: $(holder, detailCR).text(),
          });

          //save to json
          jsonfile.writeFile(file, data, { spaces: 2 }, function (err) {
            console.error(err);
          });
        }
        //console.log(data);
      });
    }
  );

  //sleep delay prevent socket hungup
  sleep(5000);
});

scrape.sh

#!/bin/bash

for i in {194876..300000}

do
wget http://collectionimages.whitney.org/standard/$i/largepage.jpg -P ./img/\$i

done

Art Piece Data

view all

[
  {
    "id": 3586,
    "img": "http://collectionimages.whitney.org/standard/102386/largepage.jpg",
    "artist": "Pamela Bianco (1906-1995)",
    "title": "Zinnias",
    "date": "1927",
    "medium": "Lithograph",
    "dimension": "Sheet: 14 1/2 × 9 11/16 in. (36.8 × 24.6 cm) Image: 11 7/16 × 6 7/8 in. (29.1 × 17.5 cm)",
    "credit": "Whitney Museum of American Art, New York; Gift of Gertrude Vanderbilt Whitney ",
    "accession": "31.606",
    "copyright": "© artist or artist’s estate"
  },
  {
    "id": 4129,
    "img": "http://collectionimages.whitney.org/standard/102387/largepage.jpg",
    "artist": "Emil Ganso (1895-1941)",
    "title": "Hilda #2",
    "date": "c. 1930",
    "medium": "Drypoint, etching, and aquatint",
    "dimension": "Sheet (Irregular): 11 13/16 × 12 3/8 in. (30 × 31.4 cm) Plate: 6 1/4 × 6 15/16 in. (15.9 × 17.6 cm)",
    "credit": "Whitney Museum of American Art, New York; Purchase ",
    "accession": "31.674",
    "copyright": "© artist or artist’s estate"
  },
  {
    "id": 4135,
    "img": "http://collectionimages.whitney.org/standard/102388/largepage.jpg",
    "artist": "Emil Ganso (1895-1941)",
    "title": "Still Life With Bottle",
    "date": "1930",
    "medium": "Wood engraving",
    "dimension": "Sheet (Irregular): 16 3/8 × 13 in. (41.6 × 33 cm) Image: 12 × 10 in. (30.5 × 25.4 cm)",
    "credit": "Whitney Museum of American Art, New York; Purchase ",
    "accession": "31.734",
    "copyright": "© artist or artist’s estate"
  }
]