Skip to main content

Warhop top-to-tail

A good article this time, which entails a slightly different approach to editing than that taken for stub and start class articles. Less than 0.5% of articles are rated as Good or above on Wikipedia. As there is much more content than a random page usually throws up I may or may not edit the whole article. If I choose not to, I will often take the lead or last section and edit that before moving on (or sometimes a section at random if it looks interesting). Also, rather than making all the edits in one or two sessions, I will try to break down the edits so that reasoning can be given for each group and also so that another editor can revert something that they don't like (or is wrong) without having to trash all of the edits. Often if you are going to get feedback from another editor articles like this will be where it happens, so it is reasonable to be on your best (editing) behaviour.

With all that in mind let's get to Jack Warhop's article. Warhop was an early 20th century baseball player. The page stats show about 60 editors but only 125 edits and a page that gets about 5 views per day. There was, at least, something on the talk page (as these are so often blank) but it was just an article peer review (the process of getting other editors to mark your homework) from 2013 when the last major set of edits was applied to the article creating the bulk of the page and getting the article its Good Article status.

I decided to start at the top with the opening paragraph, intending to probably only tackle that section and then move on, as the article's topic isn't that interesting to me (although the content itself was quite engaging). An undefined abbreviation, a wiki-link and then I got to the last sentence of the opening paragraph. It was a few lines on whether Jack was 5'8" or 5'9". Well, does this really matter and if it does is it important enough to be in the lead paragraph? I thought no on both counts. Also, it looks a bit like original research, which is frowned upon. A deletion is probably in order.

I really dislike doing this. But from personal experience I know that some editors have no issues with it at all. If the text I am removing is large, interesting or debateable I will often copy it to the article's talk page with a query about whether it should be included or not. However, in this case I can't really see any merit in it (and the editor who supplied it has since been permanently banned from editing…). So I will radically shorten it and preserve the references so a really curious reader can explore the issue of Jack Warhop's height in more depth.

Unfortunately this deletion then pricked my conscience and I felt that I should go through the whole article as penance. Overall it was a fairly straightforward edit, a series of references to be expanded and polished and a string of baseball terminology wiki-links to be added. Two references were just page references to a third and so could be condensed to a single reference with {{r|refname|pagenumber}} templates. Also, I found a few dead links and had to tag all the NY Times citations with |url-access=subscription red padlocks (Lock-red) indicating their lack of open access.

For the last dead link it was straightforward to find the article on another part of the referenced website, Genealogy Trails, and for the first one the Wayback Machine came up trumps as the page on Newspapers.com had been archived.

But for the middle one, which used a Google News link, although archived pages were present the actual page image hadn't been captured. The newspapers archive of Google News wasn't a service I was aware of, but it appears to be an old service given its formatting. I'm not sure how many titles are archived; many hundreds at least. The dead link's reference was to The Milwaukee Sentinel, a title since subsumed by merger into The Milwaukee Journal Sentinel. Google did have a copy of the paper's archives but has since been told to take it down. As usual it is grubby commercial interests (NewsBank - I won't provide a link) that are responsible and are currently trying to extort the public library system in Milwaukee for access to archives that they provided the material for in the first place. Bottom line, I can't find access to this one.

That left the handful of NY Times articles that may have been accessible in the past but are now hidden behind a paywall (though not a very high one - more later) even though they are long out of copyright. For most of the stories the Internet Archive has the issues of the paper imaged. So I will happily strip out all the URL links to the NY Times website in favour of the Internet Archive and render them open source. That was good for five of seven of the articles.

I had to get more devious for the last two issues of the NY Times, which the Internet Archive for some reason doesn't hold. The first reference is to the 6th July 1912 issue of the paper and the NY Times webpage gives the title of the story and the rest is obscured by a demand for payment. Not a vast amount, but I don't have a US$ denominated account. So not much use to me or most of the rest of the world.

New York Times paywall

But as the link is on Wikipedia the web-spiders at the Internet Archive will have crawled it, so now we are able to view the page without the demand. Here at least we can see the prĂ©cis and a link to Download PDF, but still can't see the article.

Page as viewed from the Internet Archive

Clicking the PDF link leads to a page stating that the webpage at the URL is available but not archived. The second reference (23rd Aug 1912) follows the same pathway leading to an unarchived page.

So a dead end?

Well, actually, I tried this route before finding the Internet Archive's NY Times archive. As you might imagine, I started at the top of the list of the seven articles to work my way down. For the first reference (10th May 1911), for some reason, the Download PDF button worked and passed through to an archived copy of the story. Now we have the web address of a working PDF copy we can see that the data is ordered by year/month/date folders and the individual stories are given a nine digit ID number. With this archive structure all I need now are the two articles' 9 digit ID numbers and I might be able to use the same layout to stroll around their paywall and get to the content.

Returning to 6th July 1912 story, the NY Times site doesn't give us this number, as we can't get round the banner, and neither does the page source HTML. The Internet Archive webpage is also unhelpful, but the source code of that page gives us the target of the button as article ID number 104899650.

Thus, changing

      https://timesmachine.nytimes.com/timesmachine/1911/05/10/105027109.pdf?pdf_redirect=true&ip=0

to

      https://timesmachine.nytimes.com/timesmachine/1912/07/06/104899650.pdf?pdf_redirect=true&ip=0

and

      https://timesmachine.nytimes.com/timesmachine/1912/08/23/104905152.pdf?pdf_redirect=true&ip=0

drops us straight into the required stories. Simples!

Just in case this is not a permanent state of affairs I submitted both the links to the Internet Archive's Save Page Now service so that they are captured and that is all the red padlocks dispensed with.

Finally, as I scanned some of the references I discovered that Jack wasn't a coal-miner, but a coal-shoveller or a fireman. I did wonder why the Chesapeake & Ohio Railroad employed coalminers. So my last task was to remove the Category:American coal miners link and that was that.

Overall it took five days and ten edits, moving me up to third place in authorship contribution and adding about 10% of the text of the article.

Comments

Popular posts from this blog

Getting moving with Grocy

Now we have Grocy working , even if in skeletal form, the next hurdle is to cut the tie to the PC and go mobile. The advantages of being able to update stock levels on the hoof are obvious. All consume and purchase operations can be done at the point of use without the need to make notes and mark these up later. Inventorying can be performed at the storage location rather than dragging everything to the computer or more note making. What is needed, therefore, is mobile access to Grocy via a smartphone or tablet. There are multiple ways to achieve this. Web Browser The first and simplest method is to use the web browser on your device. Although, at least with Android devices, the browser is slightly finnicky about which address it will accept for the Home Assistant (HA) server. Using the standard homeassistant.local:8123 address results in an error message ' This site can't be reached DNS_PROBE_FINISHED_NXDOMAIN. ' There appear to be known problems with Android phones ...

How-to: Setting up a Blogger account with a non-Google e-mail address

As I've set up an e-mail address for the blog it makes sense to be able to post from it too. To do that I will need to set up an account with Blogger for the new e-mail address. At the same time I'll create a gmail account to both manage the blog and give access to the Google apps suite without the monthly fee. Skullcinema@gmail.com is already taken (as there is nothing new under the sun), but I can take a related e-mail address. I won't publish it here as the first address will generate enough spam as it is. Setting up a gmail account really doesn't require a guide, but it is covered here if you need one. So, off to set up an account  through Blogger . First of all though, log out of Google and/or Chrome or it will pull you straight through to Blogger on the account you are logged in on. Choose SIGN IN in the top right-hand corner of the Blogger home page, and then select  Use another account underneath the list of your current Google accounts. Now ...

Grocy and its Home Assistant Add-on

The next layer of the Virtual Bean Counter software stack is the meat in the sandwich, Grocy (see Grocy system install ). As with the Home Assistant (HA) Operating System that was reviewed previously the aim of this article is to consider the support and backup options for the Grocy system as installed onto our HA virtual machine . But first a little history. Whither Grocy ? It sprang from the desire of a software developer, Bernd Bestel , to progress beyond using Excel (as we know a great spreadsheet and data analysis tool but with a limited UI) to something more fully featured by exploiting his experience with commercial inventory management. The first version of Grocy was released in 2017 and after seven years is now on its fourth full point release . However, unlike HA, it is essentially a one-man band with a single developer responsible for pretty much the entirety of the content all without stable funding (currently). That said the package is quite mature and " does wha...