Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve gdoc-to-markdown conversion process #2236

Closed
coreycaitlin opened this issue Jan 25, 2017 · 23 comments
Closed

Improve gdoc-to-markdown conversion process #2236

coreycaitlin opened this issue Jan 25, 2017 · 23 comments
Labels
Parking lot Not enough resources but need to get done type:blog-feature

Comments

@coreycaitlin
Copy link

When a blog post is completed, approved, and ready to post, it has to be converted into markdown for the site. That process is documented here, but is very bumpy.

Current pain points:

  • all links are italicized
  • dollar signs don't come out as dollar signs
  • lists where an item is greater than 1 line long turn out as blockquotes
  • if you use custom styles in a google doc (i.e. orange headings), some links come out as font-files references instead of links
  • headings aren't reliable
  • frontmatter goes okay, but people are bad at typing it correctly
  • images don't show up
    • have to export gdoc as webpage, then extract images and re-add manually
  • code snippets get mangled
  • add space between list items
@awfrancisco
Copy link

Note that these bugs are produced when we use pandoc. Find open pandoc issues here.

@elainekamlley @gboone got any more bugs to add?

@gboone
Copy link
Contributor

gboone commented Jan 25, 2017

Here's a list of all the things I look for — when I remember.

Number 6 in this list is an alternate way of grabbing the images.

Also, worth noting that this script works great for exporting Google Docs direct to Markdown. It's a bit cumbersome to get it working every time, but it's possible I'm just not doing it right.

@coreycaitlin
Copy link
Author

Scripts seem like a good option to explore — this variant looks promising, too.

Do you have any estimates of how long this currently takes for simple blog posts? For more complex posts?

@coreycaitlin
Copy link
Author

More info! Using Pandoc via odt instead of docx seems to resolve some of the issues, particularly the weird list formatting.

I suspect that many of these issues are being introduced in the GDocs --> Word part of the process, rather than the Word --> markdown step, so I think we should explore ways to remove the interim step.

@gemfarmer
Copy link
Contributor

Has this Google script been tried as an alternative to pandoc? I tried it just now and it seemed to solve some of the issues, but mangled the frontmatter.

Would the convert to markdown --> check email --> amend any errors be an acceptable new workflow, or is that even worse than before?

@gemfarmer
Copy link
Contributor

gemfarmer commented Jan 27, 2017

Whoops! Should have read this thread before posting. Looks like this option has been/is being explored 😀

@awfrancisco
Copy link

@gboone would know more, but I think either we aren't allowed to use Google Scripts, or you have to reinstall them for each document. There's some high burden that makes it kind of a pain.

@coreycaitlin
Copy link
Author

coreycaitlin commented Jan 27, 2017

I've submitted the OCIO vetting form for at least one of the Google Scripts, to see if that process can be worked through. Hopefully if that works, it will give us the ability to use that script again.

@gboone
Copy link
Contributor

gboone commented Jan 27, 2017

Yeah, @awfrancisco's description matches my experience — I'd have to reinstall the script every time I wanted to use it. I wasn't aware of this form for getting them approved, but I did open an IT helpdesk issue about it a long time ago that went largely unanswered. Fingers crossed that the form works for us.

@coreycaitlin
Copy link
Author

coreycaitlin commented Jan 31, 2017

@awfrancisco When we talk next, I want to ask some about the overall blog workflow and if there are upstream opportunities to improve this process (like writing in markdown within gdocs) — understanding the whole process will help me understand a bit better what we can/can't change.

@coreycaitlin
Copy link
Author

I'm marking this as blocked until I hear back about the Google Script.

@coreycaitlin
Copy link
Author

First hypothesis about what "better" looks like:

  • NOT having to fix list styling (pandoc via ODT might help)
  • copy/paste frontmatter instead of editing linebreaks
  • better keyboard shortcuts to help speed things up — documented

Things we are unlikely to fix:

  • images
  • special characters? (check odt option)
  • code snippets (ask people to put them in GitHub gists so we can copy/paste directly)

@gemfarmer
Copy link
Contributor

gemfarmer commented Mar 24, 2017

Update on where this is at:

I few weeks ago I had submitted a request to GSA IT to get a google script approved for usage.

I thought that getting it approved would mean that it was available as an add-on, but it turns out that approval just means that we won't get in trouble, not that using it is functionally different.

To make sure that it would work as an add-on, so I tested it on my own personal gmail account and posted it for, for reference.

>>Functioning add-on script <<

Steps that worked for me (make sure to configure the SDK):

  1. add-ons#before_you_publish
  2. add-ons

It is important to note that this wasn't a plug 'n play type of thing. There was a bunch of account configuration that will probably need to be taken care of by #infrastructure :(. Fortunately, @gboone is the infra lead for Outreach, so we might be able to get approval.

I mentioned the add-on to the guy I was talking to from GSA IT, and he didn't know about approval. With that in mind next steps are:

  • Mention that I have a working script that will create an add-on similar to Export as Markdown
    • See if anyone has created an add-on
    • If not, see if anyone knows how the approval process would be different for an add-on vs a standalone script
  • Try to get help through the configuration process
  • PLAN B: stick to using pandoc, but update documentation on how to streamline the process

@gemfarmer gemfarmer self-assigned this Mar 24, 2017
@gemfarmer
Copy link
Contributor

We are currently pivoting to tackle this issue in a few different ways:

  • @gboone is going to document the best practices that are currently used for converting gdocs --> odt --> markdown. GIFs ftw!
  • I will look into making a bookmarklet to do the same thing
  • Greg and I will continue working with GSA IT to figure out how we can get the script we wrote or the existing Export as Markdown add-on approved for use org-wide

@gemfarmer gemfarmer assigned gboone and gemfarmer and unassigned gemfarmer Apr 6, 2017
@gemfarmer
Copy link
Contributor

cc @toolness @jeremyzilar

@gboone
Copy link
Contributor

gboone commented Apr 19, 2017

Work in progress on the best practices wiki page.

@elainekamlley
Copy link
Contributor

@gboone is this good to close?

@jeremyzilar
Copy link

@elainekamlley If you could keep this open for the time being,... I sent this thread to the Google Docs team

@evbacher
Copy link

Try gd2md-html, a free Google Docs add-on that I developed as an add-on after using Renato Mangini's standalone script.

@jeremyzilar
Copy link

Great news @evbacher 🎉 We'll test it out and then get it run through our approval process here at the GSA. So excited about this

@lsgitter
Copy link
Member

lsgitter commented Jun 26, 2019

Don't have the resources to do this. Would be nice to have if we do get the resources. Moving this to the parking lot. Might need need engineering resources

@Dahianna Dahianna added the Parking lot Not enough resources but need to get done label Jun 26, 2019
@jeremyzilar
Copy link

I just submitted gd2md to GSA IT for approval https://github.com/evbacher/gd2md-html/wiki

@lsgitter
Copy link
Member

Not enough resources to work on at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Parking lot Not enough resources but need to get done type:blog-feature
Projects
None yet
Development

No branches or pull requests

9 participants