This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.


Earthli gets OpenGraph and Twitter metadata


Most tools that scrape web pages use the OpenGraph metadata embedded in web pages. Some fall back to using the more general and older metadata tags, like <c>description</c> or the <c><title></c> element, but this leads to a rather limited embedding. Almost no-one extracts pictures from pages unless explicitly requested to do so by metadata. Until recently, earthli didn't include this metadata, leading to somewhat substandard rendering of any links pasted to social media. <h>Sample Metadata</h> As an example, the article <a href="{app}/view_article.php?id=3974">NY Times Spelling Bee</a> now includes the following OpenGraph metadata: <code> <meta name="twitter:image" content="https://.../forthwith.png"> <meta property="og:url" content="https://...view_article.php?id=3974"> <meta property="og:title" content="NY Times Spelling Bee"> <meta property="og:type" content="website"> <meta property="og:description" content="I recently wrote that Kath and I have a one-year streak going in the NYT Crossword Puzzle. While that is still ongoing, we've also recently discovered a little gem called Spelling Bee. The concept is ..."> <meta property="article:author" content="marco"> <meta property="article:published_time" content="2020-05-16 20:39:52"> <meta property="article:modified_time" content="2020-05-21 21:15:08"> <meta property="og:image" content="https://.../forthwith.png"> <meta property="og:image:width " content="2562"> <meta property="og:image:height " content="1566"> </code> The same article also now has Twitter metadata: <code> <meta name="twitter:card" content="summary_large_image"> <meta name="twitter:site" content=""> <meta name="twitter:creator" content="@mvonballmo"> <meta name="twitter:title" content="NY Times Spelling Bee"> <meta name="twitter:description" content="I recently wrote that Kath and I have a one-year streak going in the NYT Crossword Puzzle. While that is still ongoing, we've also recently discovered a little gem called Spelling Bee. The concept is ..."> </code> Twitter refuses to use any of the OpenGraph information, so you really need to include both copies. <h>Implementation</h> Some of the properties aren't necessarily required, but it was easy enough to generate them all from earthli's general facilities. <ul> The <a href="{root}/software/webcore/docs/developer/webcore/obj/AUDITABLE.html">AUDITABLE</a> object provides the creator, creation date, and the last-modification date. The <a href="{root}/software/webcore/docs/developer/webcore/obj/CONTENT_OBJECT.html">CONTENT_OBJECT</a> object provides the title and the description. The <a href="{root}/software/webcore/docs/developer/webcore/obj/DRAFTABLE_ENTRY.html">DRAFTABLE_ENTRY</a> updates the creation date to the publication date for articles. The <a href="{root}/software/webcore/docs/developer/webcore/obj/ATTACHMENT_HOST.html">ATTACHMENT_HOST</a> sets the image to the first attachment. The <a href="{root}/software/webcore/docs/developer/webcore/util/IMAGE_METRICS.html">IMAGE_METRICS</a> provides size information for the default image The <a href="{root}/software/webcore/docs/developer/webcore/text/MUNGER_STRIPPER.html">MUNGER_STRIPPER</a> extracts and formats text for the description so that it is legal HTML. The <a href="{root}/software/webcore/docs/developer/albums/obj/ALBUM.html">ALBUM</a> sets the image to the main picture for the album, if available. The <a href="{root}/software/webcore/docs/developer/albums/obj/PICTURE.html">PICTURE</a> sets the image to the picture's URL. </ul> In addition, I added support for <c>SOCIAL_PAGE_OPTIONS</c> and introduced a method on the data hierarchy called <c>set_social_options()</c>, which allows the data objects to enrich the social options before they're formatted into the metadata area of the page header. A page must enable the social options and explicitly request to generate them, a feature I only enabled from the <c>view_entry.php</c> and <c>view_folder.php</c> pages. The results are shown below. <h>Apple Messages</h> Apple Messages uses the OpenGraph tags to make nicely formatted previews now. <img attachment="apple_messages_news_article.png" align="left" caption="Apple Messages news article"><img attachment="apple_messages_photo.jpg" align="left" caption="Apple Messages photo"><img attachment="apple_messages_photo_album.jpg" align="left" caption="Apple Messages photo album"> <clear><h>Facebook</h> I haven't actually posted anything to Facebook, but was able to use the Social-graph Testing tools to see how posts would look. <img attachment="facebook_article_with_attachment.png" align="left" caption="Facebook article with attachment"><img attachment="facebook_photo.jpg" align="left" caption="Facebook photo"><img attachment="facebook_photo_album.jpg" align="left" caption="Facebook photo album"> <clear><h>Twitter</h> I only tested articles with Twitter because I don't anticipate ever tweeting photos or albums. The tweet is nicely formatted now, with or without an attachment. Previously, Twitter displayed links to earthli as only a simple title, with no description or image. <img attachment="twitter_article_with_attachment.png" align="left" caption="Twitter article with attachment"><img attachment="twitter_article_without_attachment.png" align="left" caption="Twitter article without attachment">