Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the future of Rodeo? #655

Open
ghost opened this issue Mar 3, 2018 · 65 comments
Open

What is the future of Rodeo? #655

ghost opened this issue Mar 3, 2018 · 65 comments

Comments

@ghost
Copy link

ghost commented Mar 3, 2018

This is an issue currently facing by many users of Rodeo, so please dont close this

I have been a user of rodeo for an year or more. Eventhough rodeo has many bugs and problems I stayed with it because of my love for RStudio and belived Rodeo can become the "RStudio" of python. The biggest problem for me in rodeo is its memory and over heating problems and the culprit I think is electron.

For a year there seems to be no update or development happening in this project. Everything is breaking in rodeo ide, forum had been removed and there is problems with the rodeo website itself. As an user of this software, I want to know whether this project is abandoned. Can some one from yhat give some details about rodeo's developmen? What is happening there?

@abalter
Copy link

abalter commented Mar 14, 2018

I second the question.

@m-macaskill
Copy link

m-macaskill commented Apr 10, 2018

The Yhat founders moved on after an acqusition by Alteryx in June 2017:
http://blog.yhat.com/posts/alteryx-acquires-yhat.html

and their other technology is apparently being readily adopted there:
https://techcrunch.com/2017/09/12/alteryx-promote-puts-data-science-to-work-across-the-company/

As no pull request to this project has been merged since Jan 2017, I think it is reasonably clear that it has been abandoned in this form at least. It is open-source, of course, so development could continue from anyone else who wishes to contribute. But a strong community of contributors other than the founders has not developed:
https://github.com/yhat/rodeo/graphs/contributors

so that doesn't give me much confidence that the project will continue, barring the entry of a new enthusiastic and committed developer(s). Such people wouldn't necessarily have administrative rights to this particular repository however, so it would probably need to be forked.

This is a shame, since Rodeo was a very nice implementation of a scientific Python IDE. But as it definitely requires further, and ongoing, development, I think it would be unwise to invest time and energy in trying to keep running it as an end-user. Existing performance and bug issues will only increase over time, and new or improved feature developments are not likely.

@TakenPilot
Copy link
Contributor

☝️ This is true. And since I worked for less than a year at Yhat, I moved on to another company when that was happening.

If someone does pick up this project, that would be amazing and I would gladly walk them through the code or help them out.

@m-macaskill
Copy link

That's a great offer, Dane, I hope someone will take it up. So much was achieved in this very nice project and it would be an amazing asset to the Python community if it's development could be continued, in the same way that RStudio has gone such a long way to making R accessible to a broad audience, but with an even more clean look and feel.

@emprovements
Copy link

I am total noob with all JavaScript related stuff but it is very sad that project like this, which I think is the most cleanest and nicest looking IDE for python data science I have seen is slowly dieing.

@TakenPilot
Copy link
Contributor

I could pick it up again if I get permission from @glamp, but otherwise it would feel weird. 🙂

@emprovements
Copy link

I will try to contribute as well when I gain some JS knowledge.

@abalter
Copy link

abalter commented Jul 3, 2018

I have pretty good JavaScript skills, and I'm a solid programmer. But I have no experience with large software projects. What would be the next task? I'd be happy to take a stab at it.

@abalter
Copy link

abalter commented Jul 5, 2018

Honestly, I wonder if the best thing would be to start from scratch. JupyterLab has some of the functionality already. It is built on a JS library called Phosphor that is designed to create panels and menus etc. Jupyter is already developing a variable explorer.

Also, the Eclipse project is abandoning Che and Orion in favor of developing an IDE meant to work on both cloud and desktop called Theia. They are using Phosphor for this project. Between that and Jupyter, this suggests Phosphor will have a long well-supported life.

However, Phosphor has very little recent activity. Also, unfortunately, there is almost no documentation for how to get started using Phosphor :(

On the other hand, there is also a panel layout library called Golden-Layout that is well documented and looks super sharp and useful. It is very active with 45 contributors and lots of recent commits. But no guarantees on its future.

@asampat3090
Copy link

Is there another similar project that is being developed? Basically it seems like all of the users of Rodeo are primarily looking for an "RStudio" for Python. Seems like RStudio is currently natively built for each platform but not sure where it started. I wonder if it would be good to start with one OS and go from there. Thoughts? @abalter @potholiday

If it makes most sense to continue with the development here, I would be glad to learn from you @TakenPilot, this project is pretty amazing and I've been a big fan for a while.

@ghost
Copy link
Author

ghost commented Sep 16, 2018

I also agree with you on starting with one OS. I love RStuido because its really fast and doesn't clog memory when working with big data. That's probably because of it being natively built for each platforms. Whereas most python IDE's which I have used including Rodeo clogs memory and slows the computer down. I would prefer Rodeo rebuilt using c or c++ but it would require a lot of time and hard work. In my knowledge there is no good alternative for RStudio in python and so I have moved to lower level IDE's such as vim. Its hard to work with big data in it but it doesn't clog my memory.

@m-macaskill
Copy link

RStudio development occurs cross-platform, as described here:

https://github.com/rstudio/rstudio/wiki/RStudio-Development

@ghost
Copy link

ghost commented Sep 27, 2018

I could pick it up again if I get permission from @glamp, but otherwise it would feel weird. 🙂

If that's so, please let me know. I'm in for a rebrand also.

@abalter
Copy link

abalter commented Sep 27, 2018

@potholiday -- how do you feel about Atom and VS Code which are built on electron?

@xguse
Copy link

xguse commented Oct 11, 2018

@asampat3090 you might take a look at Spyder. It ships with anaconda and also aims to provide a similar environment to rstudio. https://www.spyder-ide.org/

@boralprophecy
Copy link

In for development. This is a really good project and it will help millions of people if done.

@CAM-Gerlach
Copy link

CAM-Gerlach commented Nov 15, 2018

As an obligatory plug, Spyder seems to be a clear upgrade path for Rodeo users, IMO. As a huge R/Rstudio fan who moved to Python around a year and a half ago (right after Rodeo had reached the peak of its development), I actually waffled between choosing it and Rodeo due to the former's similarity to my favorite IDE vs. the latter's greater maturity and features, I don't regret my final decision. The fantastic thing about Spyder is its written in pure Python—the entire IDE is coded in the very same language its primarily used for, and in fact Spyder itself is primarily developed in Spyder—and it uses the Qt Framework as its GUI (C++, though its Python bindings), so it looks like any other native application. Of course, I'm a little biased since i liked it so much, I became one of the developers myself recently, since its a 100% community developed project. The pace of development has continued to increase over the years since its first release in 2009, and our community of hundreds of thousands of users on every continent continues to grow along with our number of contributors.

Its got a essentially a strict superset of the features and functionality in Rodeo and is at parity or even most sophisticated than Rstudio in most respects, with an advanced editor with all the core features commonly found in other mainstream IDEs and editors (we're currently integrating the same completion/introspection/analysis architecture as Atom and VSCode for Spyder 4, to replace our in-house one); a Console that supports an unlimited number of tabs/sessions and can launch and connect to kernels in any local or remote Python installation and that has IPython, Matplotlib, Cython, Sympy and Pylab support built right in, a Help pane that interactively renders rich documentation for both your own and external objects on demand, a file explorer, project system, outline viewer, interactive debugger, history log, static code analyzer, online help browser, GUI data import tool that supports reading a variety of filetypes and can convert to lists, numpy arrays or pandas dataframes, session save and restore, a new plot module for Spyder 4 inspired by Rodeo's own UI, and of course our marquee feature, the Variable Explorer, with full support for interactive viewing, editing, manipulation and visualization of not only scalar types, lists/dicts/tuples/sets, 1/2/3D numpy arrays, Pandas dataframes, Pillow images and more, but also functions, modules, classes and arbitrary Python objects.

We also have a growing ecosystem of first and third-party plugins which should grow and get much easier to make your own (all in Python, of course) with the major refactoring and public API coming in Spyder 4; officially supported examples include full Jupyter notebook integration inside Spyder, an R Markdown/knitr/sweave equivalent, a cross-platform system shell plugin for the console, a Vim plugin that emulates its editing functionality and shortcuts, line and memory profilers, a unit tests plugin supporting all the popular frameworks (Pytest, Unittest and nose), a GUI package and environment manager (the basis for Anaconda Navigator, in fact), and an plugin to automatically PEP8-ify your code. Active development on most of them is temporarily paused to focus on Spyder 4, which itself has a large number of new features and improvements as well as a major rework to the plugin system and API, but should resume when that goes final in the next couple months. We're active on Gitter, Stack Exchange, Github and our Google Groups mailing list as well as social media, so feel free to ask if you have questions.

@OldGuyInTheClub
Copy link

Does it have dockable plots and/or tabs for figures? The last time I tried Spyder I couldn't get past plot windows disappearing when using the Spyder GUI and the Spyder GUI did not work well when tiled with plot windows.

@CAM-Gerlach
Copy link

CAM-Gerlach commented Nov 30, 2018

Does it have dockable plots and/or tabs for figures?

It currently has the option for displaying plots inline in the Console as well as standard plot windows, and a full Plot pane built into the UI (that can be moved around, resized and docked at will, like any other pane) is already implemented in Spyder 4 and available in the released Beta (Spyder 4 goes final in a couple months). The latter is actually directly inspired by Rodeo's Plot viewer, with some added features, customization options and UI functionality. Here's a quick preview:

image

@OldGuyInTheClub
Copy link

Thanks. Is any interactivity possible either with inline plots or this pane? One of the many things I like about Matlab's and RStudio's UIs is the ability to keep code, variables, plots, debugger, etc. in view. Rodeo was close but didn't have the debugger.

@CAM-Gerlach
Copy link

Is any interactivity possible either with inline plots or this pane?

Not with inline plots (by design), as they are intended as immutable output. Its rather limited with the pane at the moment, just zooming, panning, and showing/hiding an outline/border. However, more can likely be added in the future; given that Spyder is written in pure Python, the same language you use it for, and its open-source and developed entirely by and for the scientific Python community. You can also easily swap the backend to "Automatic" in preferences to open the plot in the dedicated viewer with expanded functionality (e.g. interacting with 3D plots); we might be able to add an option to do that directly from the plot pane for a given plot.

One of the many things I like about Matlab's and RStudio's UIs is the ability to keep code, variables, plots, debugger, etc. in view.

Yeah, I really like that. I dunno about MATLAB but Spyder's UI is much more customizable than Rstudio's limited one; you can open multiple vertical and horizontally split editor panes, can move around any pane to any area of the screen, and customize the layout, size and position nearly infinitely to show as much or as little as you want. For example, there's a number of built-in window layout presets to match Rstudio or MATLAB:

image

and you can save, customize and reset your own:

image

(These are all older versions of Spyder 3)

Rodeo was close but didn't have the debugger.

At least with regard to viewing the current state, Spyder's Variable Explorer is a lot like an always-on debugger on steriods; it when I first tried it, it really blew Rstudio's viewer out of the water. Not only can you recursively inspect collections, functions, modules, and arbitrary objects, but you can actually edit most of them in memory, including specialized viewers to interact with arrays, dataframes, lists/dicts/sets/tuples, images and more.

Currently, the GUI and Console is integrated with ipdb for the debugger, but we're currently finishing the implementation of own powerful, full-featured debugging kernel similar to that found in heavy-duty IDEs to make it vastly more capable, along with UI/UX improvements.

@OldGuyInTheClub
Copy link

Thank you. The last time I tried it was a couple of years ago (Rev. 2?) and had problems with the debugger crashing along with the difficulty in swapping back and forth between the GUI and plot windows. I will give Spyder 3 a try and check out the new features. The Rev. 4 debugger sounds very interesting. Out of curiosity, what happens in the console with, say, Bokeh plots? Would they retain their interactivity?

@CAM-Gerlach
Copy link

Out of curiosity, what happens in the console with, say, Bokeh plots? Would they retain their interactivity?

My guess is maybe not, if they are HTML/CSS/JS based, but I really don't know for sure—you might be able to set the backend properly so they work.

@OldGuyInTheClub
Copy link

Spyder 3 feels better than the prior version I tried. The upcoming changes are intriguing and I look forward to them. I am sorry to hear about the situation with Anaconda. I made a small donation to the cause. I'll start tracking the Spyder github page(s) for further support.

@versipellis
Copy link

versipellis commented Dec 1, 2018 via email

@OldGuyInTheClub
Copy link

I think there's a difference between tools for software development and tools for technical computing/algorithm exploration/data analysis. I can see PyCharm's power for the former but my needs are in the latter camp.

@OldGuyInTheClub
Copy link

Out of curiosity, what happens in the console with, say, Bokeh plots? Would they retain their interactivity?

My guess is maybe not, if they are HTML/CSS/JS based, but I really don't know for sure—you might be able to set the backend properly so they work.

I installed the Jupyter Notebook plugin and tried a simple x-y demo Bokeh plot. It rendered with its interactive tools. Panning, zooming, and reset all worked.

@versipellis
Copy link

versipellis commented Dec 1, 2018 via email

@abalter
Copy link

abalter commented May 11, 2019

Hi all,

TL/DR: There are packages under active development for electron based IDEs that run both in on the desktop and web. We should build off one of those to build an IDE modeled after RStudio or combining the best of RStudio and Spyder.

I want to put a completely different idea out there. The future of IDEs is that they are based on a toolkit such as electron that shares the bulk of its code base between desktop and web. I understand that you can do this with QT as well, but electron seems to be the choice for most html/css/js based IDEs.

Currently the leaders are Atom and VSCode. VSCode has been ported to the web in the in the Coder project.

However, there is another, created by the folks who made Eclipse, that I suspect will be the most open source and likely longest maintained candidate. This is the Theia IDE. Unlike Atom it runs on web. Unlike VSCode + Coder, it 1) runs on both desktop and web natively with the same code base 2) Web and desktop are simultaneously maintained 3) It is created by a non-profit, FOSS group.

I'm game to dive in despite the fact that I'm a scientific programmer not a developer. But I can follow directions and figure things out.

Anyone out there willing to take the lead?

@abalter
Copy link

abalter commented May 11, 2019

Just want to correct one thing. I just realized that Theia, although it is put forth by the Eclipse community, is based on VSCode as well. But unlike Coder desktop and web are co-maintained as I said before. I also trust the Eclipse community to keep the project going regardless of what happens to Atom, VSCode, or Coder.

@CAM-Gerlach
Copy link

While I don't want to discourage you, I want to explore your rationale a little more, as well as provide you a bit of real-world insight on the modern IDE development business.

The future of IDEs is that they are based on a toolkit such as electron that shares the bulk of its code base between desktop and web.

I've never really understood the benefits of a fully web-based IDE for serious data science work over a desktop alternative. The only one that immediately comes to my mind is being able to run code and models on a remote server (say, a cluster) from one's individual workstation, but the same thing can be accomplished without all the severe limitations of a web-based environment by simply connecting to a remote kernel on the same server with e.g. Spyder, which can be used mostly just like a local one (with greatly expanded features in this department being the main focus of Spyder 5, which is currently in the process of being funded),

However, given there clearly seems to be continued interest in this vein, I'm sure there's just something I'm missing that the web-based environment offers over the desktop for this application. @abalter , perhaps you could elaborate further on the advantages you see here?

combining the best of RStudio and Spyder.

Just to make sure everyone's clear, Rstudio (at least its frontend) is built on most of the same frameworks as Rodeo (i.e. web-based), whereas Spyder is pure Python. So in terms of code, virtually none of the latter would be translatable to this project, and you would have to accept loosing Spyder's biggest advantage over most of the rest of the field, being written in the same language as the development community it serves and thus it being much easier to solicit contributors (few scientists know JS, especially considering its such a horrible language, whereas if they're using it virtually all of them have to know Python). If you have a committed dev team or corporate sponsorship, that could be worked around but you would need to keep it in mind.

Currently the leaders are Atom and VSCode.

Currently, there is one major web-based data science IDE out there that is of paramount interest to this discussion: JupyterLab. To note, while it is usable nowadays, hundreds of thousands of $ have been poured into it just to get it ready for initial users, and its still not even a 1.0 release yet after three years of development, never mind reaching parity with desktop IDEs like Spyder in most areas and is still ultimately limited by its web-based roots (whereas Spyder has also continued to move forward despite a budget less than 1/10th of Jupyter, e.g. integrating the same LSP architecture for completion, linting, analysis, help and introspection as Atom and VSCode, among other major improvements for Spyder 4).

So, while its clearly been shown to be possible and it is difficult to compare Spyder and Jupyter directly with high confidence, the evidence and reasoning suggest creating a similar web-based IDE is likely to take far more financial resources than a desktop IDE (never mind one that can do both well), because your built-in community (Python scientific developers) by and large is much less naturally equipped to volunteer and pitch in to help, as well as the increased complexities and challenges from operating in a web-based environment.

However, on the flipside, if there isn't something your proposed IDE would provide that JupyterLab doesn't (or at least is in scope for them), it would make much more sense to just use and contribute to the development of that existing platform for your web-based needs. The one thing I can think of are the serious limitations of the whole "notebook" framework, but my understanding is that JupyterLab is not limited to just the walled garden of Jupyter notebooks and can be used to work with proper Python code files as well.

It is created by a non-profit, FOSS group.

There are basically two out there currently that take on such projects in the scientific/engineering/data science realm. Juptyer (which is almost entirely corporate-sponsored, but is itself independent) and Spyder (which is mostly crowdfunded and volunteer-developed, but has much less corporate backing). Aside from that, you'd either have to seek sponsorship from someone outside the Python ecosystem, or create your own from scratch.

@abalter
Copy link

abalter commented May 11, 2019

@CAM-Gerlach

I'll try to address your points one by one:

I've never really understood the benefits of a fully web-based IDE

  1. This allows you to have projects on multiple servers going at the same time.
  2. If you are running code within the IDE, you can can be running code on multiple servers at the same time.
  1. It is markedly simpler than connecting to a remote server. Been there done that

JupyterLab etc.

I have tried to use JupyterLab. I've promoted it at various times. Right now I'm doing biomedical data science, and I have to user R. I can use R in JupyterLab. I can even user Jupytext so that I can share the rmarkdown format between Jupyterlab and Rstudio. It's a great model. However, the way RStudio and Jupyter run the notebooks are somewhat different, so some stuff just doesn't "work.".

More importantly, even with installing various extensions and creating custom keybindings, Jupyterlab is simply not as efficient in the way I can bounce between a command terminal for the same kernel as the .R or .Rmd I'm running. I can quickly send code back and forth. Robustly inspect variables. Test things in .R files and then put them in my notebook. Quickly open files.

I didn't want it to be true, but RStudio with rmarkdown notebooks just rocks the pants off of Jupyterlab. And you can easily run code chunks with multiple kernels without needing to install other kernels or extensions.

The one thing I can think of are the serious limitations of the whole "notebook" framework

I'm not sure what to say to that. Notebooks have become the defacto way in which reproducible research is done. ALL of the work my group does is in rmarkdown notebooks. One document has text, code, images, sortable and downloadable tables, etc. I can have text that says "we found 19 significant features with log fold changes between 1.2 and 3.8" where those numbers came directly from the calculations and are put in programmatically. When we change something in the code like a filtering parameter or run a different data file, we don't need to edit those numbers by hand. Etc., etc. And that's just for our work. People are using notebooks to write books, create blog pages. It's just here to stay and only going to get better.

Aside from that, you'd either have to seek sponsorship from someone outside the Python ecosystem, or create your own from scratch.

One of the reasons I'm interested in this idea is because I'm convinced the future is beyond RStudio for R, Spyder for Python, X for Y. The future is seamless integration between the best languages for the best tasks within a single notebook.

I've put forth my idea and a super, super minimal barebones example in this repo. I don't know if I'll ever have time to develop this further. But if I did, it would be easiest in an IDE such as I'm proposing rather than using Jupyter kernels as I used for my proof-of-principle.

I don't know if I've convinced you about the utility of notebooks or the need for more flexible and future-oriented workspace environments. But at least I've laid out my thoughts for anyone else who might be interested.

@CAM-Gerlach
Copy link

This allows you to have projects on multiple servers going at the same time.
If you are running code within the IDE, you can can be running code on multiple servers at the same time.

I'm not sure I really understand what you see as the major difference between these two claimed advantages, but a desktop IDE (like Spyder) is fully capable of both. You can have an arbitrary number of different kernels running on different conda envs/virtualenvs, Python installs, or remote servers connected to Spyder at any given time in one Spyder instance, and switch between them just as easily as code files in the editor (unlike Rstudio, at least when I last used it a few months ago, where you could only run one R instance at a time).

Also, if you prefer to have separate workspaces for each project, you can easily open separate Spyder instances and connect them to the same or different kernels as the first, and Spyder has a project system (that's poised to see major improvements in Spyder 4) that allows for loading and saving workspaces (files, window layouts, working dirs, your session, and soon preferences, consoles, Python environments and more, along with management of the same) that allows you to switch between them with just a click or use different projects with different instances.

It is markedly simpler than connecting to a remote server. Been there done that

If you're going to use an IDE hosted on a remote server, you still have to connect to said remote server somehow, and you still need substantial setup on the server itself to get everything working. Desktop IDEs like Spyder have built-in SSH support (and can remember your credentials and configuration settings), so all you need to do is install spyder-kernels in your desired Python environment on the server, enter your basic SSH details in Spyder, and further connections are just a click away. Not terribly difficult, and much simpler than setting up and running a whole server-based IDE I would think (at least without substantial work).

I have tried to use JupyterLab. [...]

it would be easiest in an IDE such as I'm proposing rather than using Jupyter kernels as I used for my proof-of-principle.

I share many of your reservations about JupyterLab, as I've mentioned above, and I'm certainly not arguing it doesn't have a way to go in order to come close to meeting the vision you express. However, what I'm not sure I'm getting here is how it is fundamentally incompatible with what you are trying to do, particularly considering you seem to strongly favor the whole notebook concept JupyterLab is built around, and the primary goal of your vision is build an IDE that can be both web-based and have a desktop client, preferably electron-based (and in fact, there exists an electron-based desktop client for JupyterLab. albeit one on which appears to have paused its development.

Jupyterlab is far from a perfect solution, but the fact of the matter is that reinventing the wheel as opposed to improving or adapting Jupyter (IPython) kernels and the basic infrastructure they're built on is almost always going to be the better choice, considering the alternative is somehow finding the time (over a decade), money (a least hundreds of thousands, and perhaps even millions of dollars), and people (hundreds of trained developers), just to build what they already have, much less go beyond it. The whole idea of open source is building upon others' work, and having your work built upon by others (which is, incidentally, one of the biggest shortcomings of Jupyter notebooks themselves as a do-everything tool). Given the massive investment and considerable buy-in to the Jupyter ecosystem (as you mention), as well as nearly universal adoption of Rstudio and its ecosystem in the R world, and the huge companies and organizations that back them, simply offering a better version of what already exists today, but starting from scratch instead of actually leveraging and building upon those them is simply not a viable proposition in the reality of today's world.

rmarkdown notebooks

I didn't want it to be true, but RStudio with rmarkdown notebooks just rocks the pants off of Jupyterlab.

R Markdown /= Notebooks. RStudio is trying to paint them as such to jump on the hype train of the Jupyter notebooks bandwagon, but they're really quite different (as you discuss), which ultimately is a product of the fact that "R Notebooks" are merely a pretty presentation view of R Markdown documents, which are themselves a simplified (Markdown vs. LaTeX) version of full knitr documents, which are themselves an evolution of R Sweave files.

I love R Markdown (and full-on knitr even more so, with the expressive power of LaTeX combined with the executability of actual code) and have always found it a very useful tool for what it is, because it doesn't try to be (like Jupyter notebooks) what it isn't, something more than a final knitted output format rather than an end in and of itself. Its the biggest thing I largely missed switching from R to Python, although we've been working on Spyder-Reports, which integrates pWeave (directly inspired by R Sweave, knitr and R Markdown) directly into Spyder, just like R Markdown in Rstudio (we also have Spyder-Notebook, which does the same thing for Jupyter Notebooks). They are a great tool for short papers, reports, demonstrations, exercises and more, and I look forward to fully having them in Python (and, like you, greatly prefer them to the whole Jupyter notebook model). However, what's important to understand is they aren't intended to be the primary tool in one's research toolbox, but rather the format to knit together your very highest-level, tip of the stack code and final output, statistical summary, plots and visualizations.

Notebooks have become the defacto way in which reproducible research is done.

If you think notebooks are the ultimate tool for truly "reproducible" research (much less genuinely re-usable research, in the true "shoulders of giants" spirit of both science and open source), then I suggest you read the presentation I linked. Notebooks and knitr/R Markdown certainly have their place, and can be fantastic tools for sharing and communicating the output of research, and even, in many disciplines, kitting together at a very high level underlying data processing, analysis and visualization routines. However, by design, they are pathologically unsuited for being the foundation of a truly "re-usable research" workflow, where scientists encapsulate their code into usable, sharable packages that can build upon existing frameworks and tools, be easily employed and extended for future research, and in turn serve as a robust, maintainable, interoperable foundation for those of other scientists.

I don't know if I've convinced you about the utility of notebooks

As I've said, notebooks have their place in sharing results and visualizations, and sometimes knitting together, at a high level, separate pieces of external, low-level processing and analysis code in various languages (and knitr/R Markdown is a superior tool at that job); I think we agree on both of those. My point is that they are not, by design, the be-all end-all of "reproducible research" that some of their advocates and users try to paint them as.

the need for more flexible and future-oriented workspace environments

Again, other than being web-based for the sake of being web-based, I still don't really understand the really compelling advantages of such an environment, other than being able to integrate multiple languages in one document, something which Rstudio, JupyterLab, etc. already do to varying degrees of success (which can be built-upon and improved, rather than throwing all that work out and starting over). Furthermore, as a desktop IDE, Spyder already has most of the core infrastructure in place to support the same thing; it would merely require some additional time investment to add full support for the language(s) of interest (far less work than creating a new environment from scratch). There may be a compelling case, but so far I haven't really seen it spelled out.

@david-bonin
Copy link

I think Rstudio is well on its way to becoming a fully integrated IDE for Python and R. Last year, when it came out with the built in terminal functionality, it made it easy to fire up python from within Rstudio in a way that was more natural than reticulate (to me). The beta version of Rstudio 2 is very promising and the development team has said that they want to push the python functionality even further, for instances by allowing the python console to interact with the data viewer and chart functionality like it does in Rodeo. Rstudio suits my needs very well, but I love Rodeo's neatness very much.

@stevekm
Copy link

stevekm commented May 13, 2019

However, what's important to understand is they aren't intended to be the primary tool in one's research toolbox, but rather the format to knit together your very highest-level, tip of the stack code and final output, statistical summary, plots and visualizations.

I strongly agree with this. Over-reliance on the "notebook" style systems is a mistake. I use R Markdown extensively, but its best usage is reporting on results that were produced externally in some controlled manner, not for actually doing the full analysis workflows themselves (example; my main production repo here uses a modular .Rmd-based system). I spent a long time with the "notebook-first" mentality described in the presentation linked by @CAM-Gerlach and the pitfalls shown there are spot-on.

I also agree with the sentiment that future efforts should be focused on supporting existing projects. I would lean more towards RStudio because in my experience, you dont need much more than Atom + terminal for general Python coding. Rstudio's strengths are in the data viewing, which is where you start to benefit the most from an interactive environment.

@medmatix
Copy link

medmatix commented May 13, 2019 via email

@abalter
Copy link

abalter commented May 13, 2019

@medmatix -- I think Theia IDE is the new Eclipse. It should be easier to extend as well. Also, it is based on VS Code.

@medmatix
Copy link

medmatix commented May 13, 2019 via email

@CAM-Gerlach
Copy link

CAM-Gerlach commented May 13, 2019

I do note that Spyder is easy to extend with (pure-Python) plugins, and Spyder 4 will introduce a proper public API for them. In fact, behind the scenes, almost of Spyder's own features (from the Editor and the Console to the static analyzer and the profiler) are actually just "plugins" internally, and so external plugins are essentially almost as capable as internal Spyder modules.

Right now, we have a number of officially developed plugins in various stages of development (including Spyder-Notebook, adding Juptyer notebook integration; Spyder-Reports, adding R Markdown/sweave-like document creation (based on pWeave, which is based on Sweave for R); Spyder-Terminal, adding cross-platform system terminal integration; Spyder-Unittest, adding built-in support for Pytest, Unittest and nose; Spyder-Vim, adding Vim commands and shortcuts; Spyder-Autopep8, with automatic formatting; and Spyder-Line-Profiler and Spyder-Memory-Profiler, and plans for others; although they (except for Spyder-Unittest) are mostly paused right now to focus our limited resources on Spyder 4.

We always appreciate new developers to work on them, particularly Spyder-Notebook and Spyder-Reports, which are similar to what @abalter seems to be wanting.

@abalter
Copy link

abalter commented May 14, 2019

@medmatix if you do a google search for theia site:eclipse.org you will find that Theia IDE is being developed by the Eclipse project, and that they are adopting it as the editor for the Eclipse Che development environment. Quoting from here,

Today, every editor or IDE project either focusses on running in browsers (e.g. Eclipse Che’s IDE) or as a local desktop app (e.g. the classic Eclipse IDE). Technologies like Electron, however, allow to run the same code in the browser as well as in an integrated desktop app. Therefore, it is no longer a choice of either desktop or cloud IDE but both scenarios can be supported with a single tool.

@fkromer
Copy link

fkromer commented Jun 30, 2019

What's the defacto tl;dr suggestion for a RStudio equivalent in Python right now? According to this R Community thread support for Python in RStudio already improved. However it was not clear to me if it's really usable e.g. in comparison with JupyterLab usability.

@abalter
Copy link

abalter commented Jun 30, 2019

@fkromer Spyder. It's a great IDE. But also different. They have recently implemented a notebook format.

There is more information if you read upwards discussions with @CAM-Gerlach .

@OldGuyInTheClub
Copy link

What's the defacto tl;dr suggestion for a RStudio equivalent in Python right now? According to this R Community thread support for Python in RStudio already improved. However it was not clear to me if it's really usable e.g. in comparison with JupyterLab usability.

Spyder with the Notebook plugin comes the closest. They can be set to share a kernel which allows the variable inspector to show items entered in the Notebook which can be convenient. AFAIK, JupyterLab has no variable inspector (the floating widget isn't useful in my experience.) The Spyder Reports plugin while still shown on the Spyder website is no longer under development and is undocumented.

@abalter
Copy link

abalter commented Jun 30, 2019

AFAIK Jupyter is basically getting superseded by Jupyterlab. There are Jupyterlab extensions being developed for variable explorer. I think I tried them and the work well enough. Jupyter I think wants to be a full-fledged online IDE eventually, but they are slow getting there.

I respect everything @CAM-Gerlach says about Spyder. I used Spyder for years. There has been nothing like it for doing science with Python, and the team has worked very hard to keep developing it.

That being said, I'm pretty sure Theia is going to be the way to go. For better or for worse, JS based apps that work on any platform from mobile to web server are the way of the future. Let's look at who has been working in this direction:

Adobe: Brackets
GitHub: Atom
Microsoft: VS Code (which theia is based upon)

Also, Microsoft now owns github, so I predict we will eventually see Atom retired in favor of VS Code.

Again, Spyder and the whole PythonXY ecosystem was just what we needed over the years. But I think the future is going to lie in a different direction.

@OldGuyInTheClub
Copy link

I think it was the JupyterLab widget I tried and it didn't do very much. The whole Jupyter/Lab hype cycle has gotten too much for me. I don't know what Theia is. When I learned Python a few years ago it was billed as a Matlab killer for scientific and technical computing. It isn't. Every year the target market changes: Data science, machine learning, AI, now digital transformation, who knows what tomorrow. Each one of these requires different tools and the scientific/technical computing piece of it isn't well served and is unlikely to be given the funding and time required to do it right.

@abalter
Copy link

abalter commented Jun 30, 2019

@OldGuyInTheClub I wouldn't frame it as a battle royale.

I don't think I've seen Python billed as a Matlab killer. Also, what makes Matlab is not just the language, but the IDE. It also has Mathworks behind it creating tons of fantastic packages. But Matlab licenses are crazy expensive. Honestly, and arrest me if you will, I pirated it for many years when I used it in research.

Python is open source and WAY better than Octave or SciLab. The syntax can be a bit more heavy with needing to import libraries and then refer to them. However, if you are brave, you can just from mylib import *. With the memory today's computers have, I think it's worth it. Plus, when you code in Python you are contributing to the open source community, which is good karma.

For stats-heavy data science, I've had to admit that R/RStudio lets me be extremely productive. Moreso than I think Python could ever be unless APIs are simplified and functionality increased.

So, it usually just boils down to the right tool for the job, and being able to afford that tool.

@OldGuyInTheClub
Copy link

I agree with you on matching tools for the job bu I guess we see/read different things. Many sites/people say that Python supersedes Matlab and for free.

I wholeheartedly agree that the Matlab IDE is excellent. I've been looking for years for something comparable in Python (hence being on this thread) and haven't found anything that comes close to it. And, yes, with Mathworks one gets what one pays for. I have access to Matlab and Simulink at work and am very grateful for them.

Not to quibble too much but import * is strongly not recommended by just about every Python site I've encountered. I don't think programming in Python is automatically a contribution to open source or a virtuous act.

If I were to do statistical work, I'd make the time to learn R. The packages seem to be written by heavyweights in that community. The ggplot approach to data display is also very intriguing.

@CAM-Gerlach
Copy link

N.B. I am (at least for now) retired as a Spyder core dev, so I'm (somewhat...) less biased than before, though also less informed on a number of exciting things happening with Spyder lately.

However, if you are brave, you can just from mylib import *

As @OldGuyInTheClub correctly states, this is strongly ill-advised by every authoritative resource and is virtually never used outside of very specific circumstances by Python users who have been educated on the pitfalls involved (or already fallen into them). It takes virtually no more time to use the properly qualified names for things with any decent modern editor that has autocomplete (or assign just the names you're using to shorter forms), while avoiding actively breaking said autocompletion, introspection, docstring retrieval, etc. tools, making it much easier to find instances of specific names being used in your codebase, leading to confusion and ambiguity reading source files, and most importantly, introducing pernicious "namespace hell" issues (common in R and other languages without proper namespaces) that can be devilish to track down.

In short, don't ever use from spam import *; always use import spam, import spam.eggs, or (if you prefer shorter names) the standard abbreviations import numpy as np; import pandas as pd, import matplotlib.pyplot as plt. import os.path as osp, etc. I used to miss the R way in terms of simpler APIs and shorter name/spaces, but now that I'm properly used to Python, I really see the benefits and don't miss the inferno that is R's syntax and semantics so much.

That being said, I'm pretty sure Theia is going to be the way to go.

I just don't understand this suggestion. Its like saying that a Tesla Model 3 is going to be the way to go over a pickup truck for hauling loads, just because the Tesla is newer and more hip to the latest trends., rather than better-suited for the task at hand. Theia is designed to be a lightweight software development IDE; it has none of Spyder's, Rstudio's, Matlab IDE's etc tools designed for data science, and is actively worse than other options like VSCode (on which it is based) or even Pycharm in that regard, which at least include some basic scientific functionality (variable viewers, etc).

Furthermore, as much as people love to hype up "web-based", "JS-based" etc;, I've never been able to get a solid explanation for what advantages that offers for this use case. You're not going to be developing your scientific code on a smartphone, and with the stagnation of Moore's law there is no reason to believe that will change within the next decade. Jupyter Notebook and JupyterLab already highlight some of the key limitations of the web based approach, and most of the advantages (being able to easily run code on remote servers) either already exist in IDEs like Spyder without the same compromises or are planned to be added for Spyder 5.

And, of course, the fact that it's all JS/TS-based rather than written in the same (or any) language actually used in scientific computing (in fact, in a language whose sole virtue and staying power is due to being the only one shipped in web browsers), makes it much less accessible to users wishing to modify, extend (e.g. with plugins) or improve it, and less easy to integrate with the rest of the PyData stack.

The Spyder Reports plugin while still shown on the Spyder website is no longer under development and is undocumented.

Development will hopefully pick up again soon once Spyder 4 is out. I offered to fund it myself, there just wasn't the dev bandwidth.

@fkromer
Copy link

fkromer commented Jun 30, 2019

@abalter @OldGuyInTheClub Thanks. I'll give Spyder and Theia a try.

@abalter
Copy link

abalter commented Jun 30, 2019

@CAM-Gerlach BTW, this is fun and informative for me, I hope for you too. It's good to hash out ideas in a debate to expose foregone conclusions etc.

Its like saying that a Tesla Model 3 is going to be the way to go over a pickup truck for hauling loads,

I think it's like saying an electric or hybrid pickup is going to be the way to go over an internal combustion engine.

My point is also that the JS apps appear to not just be a fad. They are clearly here to stay. They are also much more amenable to community development since you can add to it using JS which is both interpreted and much more widely known. Also, because JS based apps use web technology, there is just soooooooo much out there in terms of tools and solutions pre-built for adding features.

Finally, a comprehensive data science environment is being developed for VS Studio. This would also work in Theia.

As for the web app aspect, I can only say that there are applications where it is really essential. For instance, I could open up the web app on server A, start a notebook running, open up the web app on server B and start a different notebook running, etc. It's not a use case everyone needs, but if you do need it, a remote connection from a desktop application just won't fit the bill.

Finally, while you may not do development on a cell phone, a person may very well want to check on a job. More likely, they may work on something on a tablet or chromebook. I think 5 years ago this may have seemed like a fad. But given the continued development behind this format, and the big guns behind it, it's probably for us to learn the advantages and create our own use cases rather than doubt that it's useful.

@CAM-Gerlach
Copy link

I hope for you too.

Sure. I just genuinely want to understand why a substantial fraction of people seem to see "web-based" or "mobile-ready" as a meaningful advantage for a workhorse data science IDE, as opposed to some end-user "app".

I think it's like saying an electric or hybrid pickup is going to be the way to go over an internal combustion engine.

But my point is that Theia isn't a pickup (or a Tesla Semi), or even close to one. Its a Model 3, great for driving a few people a few hundred km at a time within the supercharger network with cheaper running costs and a much cooler aesthetic, but not at all designed for hauling loads (and does a rather poor job at the same). It may be eventually developed into something resembling one, but even then it only matches the existing feature set of alternatives (Spyder, Rstudio, etC), possibly with some additional limitations (range, superchargers, cost, etc.) inherent to its fundamental operating principles (JS-based/web-ready). Also, unlike climate change and fossil fuel depletion of the analogy, there isn't necessarily some overarching externality that I'm aware of pushing the drive toward a different paradigm for Theia, at least that has been concretely explained to me.

They are clearly here to stay

So long as JS remains entrenched in the browser space, then yes, due to (as I mentioned) the simple fact that, as it is the sole scripting language of the web, hordes of web developers learn just enough of it to be dangerous, and thus want to apply this skillset to other domains, as well as browser developers pour massive effort into optimized JS engines. Without this, there is virtually no motivation to use it to actually build applications, due to its poor design and being demonstratively inferior to many other readily available options that are properly designed for such tasks and widely used for such.

They are also much more amenable to community development since you can add to it using JS which is both interpreted and much more widely known.

As I state above, this is valid for web developers, but simply not true for the scientific/engineering/data analysis community. Essentially no one does their actual data analysis in JS (as opposed to Python, R, Matlab or even C++), and thus very few in said communities are familiar with the language, as opposed to those latter ones, nor is integration with their actual stack nearly as easy for the same reasons. Ergo, my points above. Furthermore, even overall, Python is only modestly behind JS overall and continues to grow relative to it; in terms of traction within the data science community at large, it isn't even close.

Finally, a comprehensive data science environment is being developed for VS Studio.

Nueron looks quite interesting indeed, but it is far from a comprehensive environment as opposed to a tool to generate several different types of interactive output, and there's nothing that couldn't be done just as well in a desktop IDE like Spyder (some of which it already does, and more is planned for Spyder's Viewer plugin in Spyder 5). Furthermore, it makes my very point--as VSCode already offers a number of data science features, what does Theia offer as a platform for data science above and beyond that? This is the part that I just don't understand.

As for the web app aspect, I can only say that there are applications where it is really essential.

Okay, but could you provide a specific, real-world example? Presumably, since you're interested in Theia, you would have one from your own work. Furthermore, these would need to outweigh the fundamental limitations of being a web app in all other contexts where it is not essential.

For instance, I could open up the web app on server A, start a notebook running, open up the web app on server B and start a different notebook running, etc.

Okay, but what does this accomplish? What's the practical purpose being served here? What's the real-world use case?

A remote connection from a desktop application just won't fit the bill.

Why not? What specific things in this scenario can a desktop application not accomplish? With Spyder, for instance, you can automatically start or connect to Jupyter kernels running on many different servers at once, switching between them with ease, and in Spyder 5 the plan is to add remote file editing and manipulation as well as connecting to and interacting with full-on Jupyter notebook servers to do anything you could locally or from a web-based UI. Furthermore, relying on Jupyter notebooks for the backend and the frontend also locks you into all of its limitations and pitfalls, as opposed to working with proper portable, interoperable, re-usable Python modules.

Finally, while you may not do development on a cell phone, a person may very well want to check on a job.

Checking on a job is a radically different use case than working in a full IDE; all that's needed for the former is a means of notifying the user as to its status, which can be done through something as simple as email or a webpage displaying status output. Its a huge stretch to go from this to porting your entire IDE into a cross-platform, mobile-first framework just for something so simple and tailored.

It's probably for us to learn the advantages and create our own use cases rather than doubt that it's useful.

I don't see it as particularly wise to try each and every new whiz-bang workflow idea that comes along without an obvious practical benefit to a mainstream workflow, nor clearly illustrated use cases and applications that offer demonstrated advantages over current methods. This is exactly why I'm asking you, an advocate for some sort of web- or mobile-based data science IDE, to illuminate these very things regarding your proposal, so I may understand why indeed you see it to be such an attractive option.

@abalter
Copy link

abalter commented Nov 15, 2019

https://github.com/abalter/theia-data-science-ide/blob/master/README.md

Project Proposal:

Open-Source, Platform Agnostic, IDE Based on the Theia Framework for (Data) Scientific Computing

TL/DR

I propose that "the community" use the Theia framework to build an open-source IDE that combines the best of RStudio, Spyder, and Jupyter (etc.) into a data science IDE that is cloud/desktop agnostic and language agnostic.

Contents

Introduction

As far as I can tell, the Data Science / Science community primarily uses three coding environments:

  • Jupyter[lab] Notebooks
  • RStudio
  • Spyder

Each has their strengths and weaknesses, pros and cons, adherents and detractors. I personally avoided R and RStudio
for a long time in favor of Python, Spyder, and Jupyter. My most recent position is in an RStudio
shop. I have discovered that RStudio is a migical world that seemlessly
integrates script files, notebooks, exploring variables, maintaining history, accessing files, loading
data, and an interactive command line (both R and bash). Fantastically, the RStudio IDE has both a cloud and desktop version.

Taking in to accout what I have seen in academia, and extending this to my perceptions of industry as well,
RStudio reigns predominent in terms of daily usage among these disciplines. Rstudio is supported by a large, profitable organization which does a fantastic job with this product. The RStudio company does release open-sourced versions, but stripped
of some important functionality. While technically open source, these versions are not community maintained. Consequently, or by design, the roadmap
and development move forward at the sole command of the RStudio company.

I propose developing a free, open-source, data science IDE that combines the best features of the existing commercial and open source options out there.

Chart below.

The Good News

We would not need to build a one-off project from the ground such as with Spyder, Jupyter Notebook, Architect, or Rodeo (which was eventually abandoned; see discussion). Quite the opposite. We would build on an existing framework and get professional support for our own development issues.

The Eclipse foundation hosts and develops an IDE Framework called Theia for building platform agnostic (cloud/desktop) IDEs (More below). This is modern,
flexible, extensible, and uses the latest build technologies. Importantly, Theia was designed by intention from the ground up to work on the desktop and in the cloud
without needing to create a parallel code base.

Theia IS being very actively developed.

Thiea development activity
Development activity on github.

Not only can we get support directly from Eclipse, the very act of building our IDE would likely contribute new ideas and code to the Theia project, creating the sort of positive feedback loop that is one of the shining hallmarks of open-source development.

The Bad News

While I am idealistic, passionate, and a very good scientific programmer, I am not a developer. I'm also old-ish, have a lot of
family obligations, and am trying to carve out a career for myself in a new field (namely biomedical data science). I neither have the skill nor
the bandwidth to LEAD this project. However, I swear upon all that is good and true that if some person or group
would come forward to lead the software development, I would take on a strong supporting role by responding
to issues, coding bits and pieces (menu here, UI tweak there), writing documentation, testing, fixing small problems, looking for sponsors,
etc.

More about Theia

Wikipedia tells us that

Theia was developed by TypeFox and Ericsson, with additional contributions from Red Hat, IBM, Google and Arm Holdings. It was first launched in March 2017. Since May 2018, Theia has been a project of the Eclipse Foundation.

If you search the 'net, many people refer to Theia as an IDE. However Theia developers fom Eclipse try to emphasize that Theia is a framework to build your own IDE just as they did with their own Che editor and GitPod (which by the way is awesoms). ([differences between Che and Theia]((which by the way is awesoms))).

Another common misconception is that Theia is a VS Code clone. This stems from the fact that in addition to having some of VS Code's look and feel, Theia can actually use VS Code plugins. However, Theia is a completely independent code base.

Other real life examples are Microclimate, potential GitLab integration, the new Arduino Pro IDE, Hyperexponential's infrasturcture.

Why not use Plugins?

Why not just use existing technology (Atom, VS Code, Theia, Jupyter) and build it out with plugins?

The plugin model seems like fun—everyone gets to contribute and users get lots of options. But for serious tools this model fails. Essential functionality (code linting, markdown preview, variable explorer, kernel integration, ...) becomes dependent on individuals in the community implementing versions of these features AND dedicating themselves to support them for eternity. Insted, plugins tend to stall out or become totally abandoned as quickly as they are created.

On the other side of things, the pluginverse become flooded with options making it hard to know which to use. There are currently a multitude markdown previwers for VS Code. Some have more features than others. Some work better than others. Hhow do you pick which to use? You have to try them all first and/or read many reviews; and then hope that development continues and bugs are fixed. If things don't work out, you need to find another plugin.

Consider the data science plugins for Atom and VS Code. There are multiple ones for R with non-overlapping feature sets, and some of the most robust have already been abandoned .

(In writing this in VS Code, I tried one markdown previewer that improperly added line breaks at each newline character in the source and rendered text inside escaped square brackets \[...\] as math. I switched to another that is ok with that, but this one uses a markdown flavor that requires me to use an explicit — rather than ---.)

That is why I believe the project needs to be curated at the top level by a group of people who will take input from the community and make wise decisions. This has largely been how the Jupyter project has gone. However there are a growing number of unofficial extensions. It will remain to be seen how well this works out.

Theia does have an a plugin interface, can use existing VS code plugins, and is designed to be extended with more deeply rooted extensions. Thus, the community is welcome to add new functionality. Plugins that have shown themselves to be popular, useful, and stable could be curated (i.e. incorporated) in to the main code base and be maintained by others even if the original plugin author moves on to other things.

Feature Comparison Chart (proposed)

I have filled this table in to the best of my knowledge. I do not currenlty use Spyder or Hydrogen, and have not fully explored data science options for VS Code. PLEASE help make this chart more complete and accurate with your suggestions and input!

Jupyter RStudio Spyder VS Code Hydrogen Proposal
IDE-like environment No Yes Yes Yes Yes
Real-time notebook rendering Yes No No Yes Yes
Visual notebook editing Yes No ?? Possibly with plugin Yes Yes
Plain text notebook editing No Yes ?? Possibly with Plugin Yes Yes
Multiple notebook formats No No No Possibly with plugin No Yes
Notebook-Focused Yes No No No Yes Yes
Development-Focused No Yes Yes Yes No Yes
Data science focused Yes Yes Somewhat Poor plugin options Yes Yes
Edit code and notebooks side by side Somewhat Yes Yes Possibly with plugins To the extent that Hydrogen runs in an IDE... Yes
Notebook linked to command line Awkward choice of console format. Yes ?? ?? N/A Yes
Shared environment for notebooks, scripts, and command line Partial--can create individule console for each notebook. Yes Yes ?? N/A Yes
Each notebook needs/gets it's own console. Yes No ?? ?? N/A Yes, if wanted.
Multiple parallel execution environments (kernels) Yes No ?? ?? No Yes
Multi-language support in notebooks Yes Yes ?? ?? Yes Yes
Multi-language support in IDE Yes No No ?? N/A Yes
Variable Explorer Primitive Yes Yes No Yes Yes
Robust file browser No Yes Yes Yes ?? Yes
Easily import data in to computational environment. No Yes ?? ?? ?? Yes
Delegates important functionality to community supported plugins/extensions Yes No No Yes ?? No
Curates and includes solid implementations of important features. No Yes Yes Not for data science. Yes Yes
Maintains command history for console. No Yes Yes ?? N/A Yes
Designed for browser and Desktop No Yes No No No Yes
Works in browser Yes Yes No There is a separate browser version. No Yes
Works on desktop No Yes Yes Yes Yes Yes
Integrated Git support With extension Yes Yes Yes No Yes
Integrated Conda Support Yes with extension No No No No Yes
Robust enterprise support No Yes No Yes No No
Robust community support Yes Yes Somewhat Yes Somewhat Hopefully.
Completely free and open source Yes No Yes No Yes Yes

@CAM-Gerlach
Copy link

CAM-Gerlach commented Nov 16, 2019

@abalter

Overall, your goal is laudable, but as an actual former developer of a major data science IDE (Spyder) I'm not sure how practical it is to develop this project compared to dedicating your and the community's time and effort to any one of the existing tools instead. JupyterLab is also under active development (more active than Theia; note the 10x difference in scale on the plot between Theia and Jupyterlab) and has a goal very similar to what you want: building a hybrid web and desktop based, notebook centric, data-science focused IDE with a full suite of tools and plugins. It also has the backing of some of the biggest names and largest communities in the data science field, and is used by millions of people around the world. You're going to have to make an extremely compelling case, more so than even the above (which is quite the effort) for why all that effort should be thrown out and duplicated in a different framework rather than building on everything the community has poured a massive amount of time and effort into.

Unlike with Spyder, where by design we made an proper desktop program written in the same language as users code in it and it is designed to work with, and is not easily adaptable into a web app, there is nothing fundamental about why the features you are asking for couldn't be added, either in JupyterLab core, as one or more plugins, or at the very worst a fork, and turn those "No"s into "Yes"es. There are have been literally millions of dollars in corporate donations, sponsorships and dev time, hundreds of thousands of lines of code, tens of thousands of commits, tens to hundreds of person-years, and numerous paid, full-time developers on the Jupyter team that have been working on the project for years to develop Jupyterlab to this point, plus contributions from hundreds of community members. Furthermore, this was all off the back of a large amount of work already done and already having a strong position in mindshare and credibility and with an over decade-long history spanning back to Jupyter Notebook, IPython notebook and IPython itself. Ergo, it defies believe that a especially now that these tools already exist, that trying to create a new community from scratch would attract enough user, developer and fiscal interest to start over with something new, at least without a very compelling marketing pitch.

Impossible? No. Highly improbable? Yes. I wish you the best of luck, but I again urge you to more thoroughly consider putting your efforts toward making the current alternatives "good enough" rather than striving for perfection and falling short of making a substantial impact at all.

It brings to mind a relevant XKCD:

image

Spyder development history, for comparison (note scale):

image

Similarly, JupyterLab development history:

image

To fill in the chart for Spyder (as I've mentioned I used to be a Spyder developer, but I'll try to be as unbiased as possible):

The notebook stuff requires the first-party "Spyder-Notebook" plugin that is developed by the Spyder core team. While it is a plugin, it integrates with Spyder as fully as any other aspect of the UI since basically every other UI pane is also a implemented using essentially the same core plugin system. The Spyder team is still updated for the forthcoming Spyder 4 release but hopefully that should be in the next couple months

Real-time notebook rendering: Yes, w/1st party plugin
Visual notebook editing: Yes, w/1st party plugin
Plain text notebook editing: Yes, w/1st party plugin
Multiple notebook formats: Possible with plugin
Data science focused: 100% (Its literally what Spyder is built for, every bit as much as Rstudio, which it was in fact originally inspired by)
Edit code and notebooks side by side: Yes, w/1st party plugin
Notebook linked to command line: Yes, w/1st party plugin
Each notebook needs/gets its own console: Optional
Multiple kernels: Yes
Multi-language support in notebooks: No
Multi-language support in IDE: Partial
Easily import data: Yes [Spyder has a built-in import wizard that can import a variety of file types to lists, numpy arrays and pandas dataframes as well as save and restore individual variables or full sessions]
Integrated Conda Support: Partial, further in development

Let me know if you'd like further clarification on any of those aspects.

@OldGuyInTheClub
Copy link

I don't think Spyder will ever be ready. v4 has been around the corner for over a year. The other projects such as Jupyter go long on PR but short on delivery of an environment with the kind of debugging, inspection, and visualization tools necessary for exploratory technical computing/research. Tools for developers (PyCharm, SublimeText, etc.) are different from researchers (this is what Matlab gets bang on) and there is just too much churn in FOSS community to think about usability and need before releasing the next great whatsit.

Perhaps if enough people start running Python within RStudio, there will be enough of a clamor for them to do tighter integration with Python kernels that would allow the variable inspector, dockable plots, and debugging that work well for the native R.

@CAM-Gerlach
Copy link

I'm no longer a core dev with Spyder, but my main focus was UX, UI text, documentation, support, and addressing common user annoyances. While I agree Spyder 4 has taken much longer than we hoped in the beginning, and in hindsight we would have likely went with a more incremental, modular plan, Spyder did release 7 betas over that time (each with multiple significant new features) and its now on release candidate 1, with release candidate 2 coming out within a week or so and 4.0 final due to follow. If you look at the development history above, you can see the large and increasing amount of work that has gone into it, and most of the time was spent not adding shiny new stuff, but on improving stability and fixing bugs and UX issues, along with the features most requested by the user community (e.g. the new debugger and LSP. Spyder 4's 2 biggest features that took most of the effort, were the two most commonly asked for capabilities).

Given its an open-source community project, there was no PR budget, and every dollar of donations was spent on actually paying developers to implement the features requested by the community (both donations and expenses are fully documented on the OpenCollective). Keep in mind that unlike JupyterLab, which has big corporate sponsors to the tune of millions of dollars, Spyder 4 was funded at a less than a tenth that level and made up most of the slack with volunteers like me (I never got paid a cent, nor did I want to be). We aren't charging users hundreds or thousands of dollars for a revokable license to use Spyder, which has opened up data science to hundreds of thousands of people all around the world who could never afford Matlab or similar proprietary tools, and they've given back by contributing their own time and effort to making Spyder even batter.

At the end of the day, the Python ecosystem is moving fast to keep up the state of the art of data science in general right now. Lack of long-term stability is the price to pay for keeping up with a fast-paced field; once things settle now, the ecosystem will stabilize to match. However, if rapid change and iteration isn't your cup of tea right now, you are free to settle on a particular version and use that for as long as you wish, with the understanding that the wider world may have moved on in the meantime, same if you stuck with any given version of a proprietary package, or even with R these days as even it is getting outstripped by Python in the long run (I being previously a diehard R/Rstudio user myself). But unlike with proprietary software, no one can take that freedom away from you.

@OldGuyInTheClub
Copy link

OldGuyInTheClub commented Nov 17, 2019

Yes, I remember having several good discussions with you on Gitter. I kludged a mix of Spyder and Jupyter via the Notebook plugin with your assistance although you said that working within Spyder/Markdown (unsupported) would be better in the long term as the external interfaces might not always be available.

That being said, taking this into the realm of freedoms is overstating the case by quite a bit. No one, least of all me, says Spyder was squandering funds or was even remotely close to being funded to achieve its goals. Clearly, Spyder is not a donation magnet but one has to look at what it will take to meet the objective and ask how the small team can achieve all of that when it is getting pennies on the dollar of other projects who have the same ambitions and are themselves still not delivering. I continue to be surprised by and often appalled by how much money Jupyter has received for the future-ware it keeps putting out. At minimum there is no reason for Spyder to claim on its website that third party plugins provide Reports and Notebook capabilities when the former doesn't exist and the latter may not be stable from one Spyder rev to the next.

I don't know what "DataScience" actually is nor do I care. It is a moneymaker for the Universities scrambling to offer degrees in it so bully for them. There are large sets of technical problems that require exploring data and algorithms and taking answers from there to code that does something especially when interfaced to the outside world and double-especially when it will have to be mission critical. Yes, this also means talking with (ugh) hardware. Python and its "ecosystem" promised the world on that and underdelivered on multiple fronts. There are advantages to proprietary tools. If one's alternative can take them on and honestly win, that's fine. Reverting to "we're volunteers doing good works" in the face of evidence is a cop out.

Matlab is expensive for a reason. It hires people, pays them, tests before release, and provides professional support. I was asked to do some light programming at work involving playing with datasets and implementing some algorithms. Python/(Jupyter/Spyder) would have taken 2x-3x as long as it took me in Matlab where a) the tools work and b) I could get fast, knowledgeable support when I needed it. Additionally, LiveScripts are a serious challenge to Notebooks, Tables challenge Pandas, and that is on top of things that the free "ecosystem" will never do like align with certifications/industry standards.
e.g. https://www.mathworks.com/solutions/aerospace-defense/standards/do-178.html

If you were going to fly on a plane, do you want one designed with traceable and proven tools or one by some guy on Github?

@CAM-Gerlach
Copy link

CAM-Gerlach commented Nov 17, 2019

That being said, taking this into the realm of freedoms is overstating the case by quite a bit.

I got a bit too ideological about that. Ultimately Spyder and Matlab, while similar in some ways, serve different groups of people, and each has its use cases as you illustrate. Many people regard the distinction between libre and proprietary software with various degrees of importance, while many others take a more pragmatic approach and just see them as tools and means to an end. There's no "right" answer or approach in this regard, its a personal choice.

Clearly, Spyder is not a donation magnet but one has to look at what it will take to meet the objective and ask how the small team can achieve all of that when it is getting pennies on the dollar of other projects who have the same ambitions and are themselves still not delivering.

What we proposed for Spyder 4 was adding a relatively modest number of significant, oft-requested features within an existing IDE that already had the great majority of the expected functionality, vs. JupyterLab was essentially creating an entire IDE from scratch, all inside a web browser and in a language few data scientists were familiar with. The size of their core team is about the same as ours; e.g. over the past year, Jupyterlab had 6 people over 100 commits while Spyder had 8.

As explained previously, the great majority of the difficulties with Spyder 4 were not in implementing the features themselves, which was not conceptually that difficult, but working through and resolving a large number of bugs and deficiencies particularly with LSP, that we didn't know a priori. Obviously, one always plans for some difficulty, but in this case it was much more than expected. I for one generally pushed for later announced target dates than were initially announced, though at least in the position I was in even I didn't think it would take this long.

At minimum there is no reason for Spyder to claim on its website that third party plugins provide Reports and Notebook capabilities when the former doesn't exist and the latter may not be stable from one Spyder rev to the next.

These are first-party, not third-party plugins (the site is mistaken on that point) and have existed since 2016, worked fine and were actively supported and developed around the time we originally put up the website. However, right around that time, Anaconda cut the funding they had given us to help develop them, so we eventually had to pause full support for them (except for Spyder Unittest and partially Spyder-Terminal) while focusing all our resources on Spyder 4. Given as mentioned we have no real PR budget, evidently no one ended up adding something to the site itself stating that, although we did have a disclaimer in each of their readmes if users were to actually click the links on the site. Spyder-Reports' main problem is its incompatibility with the latest versions of pweave, not Spyder itself, while last I checked Spyder-Notebook does appear to still work with some bugs and has been minimally maintained, and Spyder 4 support has been merged.

I don't know what "DataScience" actually is nor do I care.

No one really knows for sure, but everyone sure loves to use it. Its an umbrella term, rather nebulous and certainly overused, but it describes what tools like Jupyter, Spyder, Rstudio, and Rodeo, the very object of this discussion, are designed to do (the headline of Rstudio's website is "Open source and enterprise-ready professional software for data science," and Rodeo's is "A Native Python IDE for Data Science". You are welcome to use them for other things, but it shouldn't come as a huge surprise that they may not be the most suitable for something completely different, like aerospace engineering or safety-critical applications that you mention (the vast majority of which don't, won't and probably shouldn't ever use Python over e.g. C, Ada, or other more appropriate choices).

Python and its "ecosystem" promised the world on that and underdelivered on multiple fronts.

Could you point me toward where "Python" or the core PyData stack make the promise of being "double-especially...mission critical"? Where has it overpromised and underdelivered on "talking with (ugh) hardware"? I haven't heard many claims at all about the latter, and while I certainly wouldn't use it in the embedded space, I've had good success using it to build a system of a dozen networked lightning sensors that can reliably send commands and log data from multiple devices (charge controller, sensor hardware, control computer, etc) via various low-level protocols (serial, modbus, ethernet, GPIO) and send it back to a central server from anywhere in the world, along with automated alerting, remote access, command and control, and displaying all of this in a dynamic, interactive web dashboard. Not to say I couldn't have done it in C or another language, but I found it a good fit for the application.

There are advantages to proprietary tools.

Sure there are. Different jobs, different tools. And to be honest, the single biggest weakness throughout much of the Python ecosystem, Spyder especially so, is documentation. Developers love to write code and are generally pretty good at it, but documentation? Not so much...that was another of my focuses with Spyder, but while I did do a full port and rewrite of the existing docs but I didn't end up having the time to do much in the way of very sorely needed expansion. They can also make some pretty poor UI/UX decision sometimes, which was something I found myself arguing with the others about all too frequently.

Reverting to "we're volunteers doing good works" in the face of evidence is a cop out.

A cop out for what, sorry? I'm not sure I understand the point here, since I brought up this in the context of not expending funds to run a big PR operation like you were saying Jupyter did, and in explaining how Spyder serves a different niche than Matlab, the vast majority of our users that simply couldn't afford Matlab licenses for each of the varied devices they plan to run their code on. I should have made it more clear that I didn't mean to imply you should use XXX open source tool just because its open source if its clearly unsuited to your application; I assumed we were discussing something resembling data science here (as is the designed application of all the tools that had theretofore been discussed: Jupyter, Spyder, Rodeo, Rstudio, etc) which has a robust open-source ecosystem and open source itself is a distinct domain-specific advantage.

Matlab is expensive for a reason.

Sure, I didn't say they were purposely bilking people just because they could. Just that both their expense and the non-free nature of their ecosystem results in a significant niche to be filled by open-source tools even if Matlab were strictly superior in every way. Regardless, I certainly am not meaning to imply that Matlab doesn't still have a substantial niche, particularly for applications like you describe (engineering, aerospace, etc), for which it is surely well worth the money.

I could get fast, knowledgeable support when I needed it.

While e.g. Quansight and Anaconda offer something in the way in paid support for Spyder and other open source tools, it isn't exactly the same sort of thing that Matlab offers. However, if a clear market exists, it seems likely that companies will step up to fill this need at some point since with regard to open source in general, providing paid consulting and support for open source tools is the entire business model of numerous companies and worth tens if not hundreds of billions. For example, Red Hat (recently purchased by IBM for $34 billion)'s whole company was build around providing paid support, validation/certification, consulting etc for a free and open source product (RHEL) that anyone can freely distribute, use and modify (as CentOS). Anaconda, Quansight etc. do the same for data science. Of course, you pay a good deal of money reducing the cost advantage of open source, though its important to remember that free as in beer (gratis) isn't what the "free" in "Free and Open Source" is about...but I digress, I don't want to turn this into some ideological debate.

that is on top of things that the free "ecosystem" will never do like align with certifications/industry standards.

Not sure why the scare quotes around "ecosystem", but open source is typically at the forefront of implementing open standards over proprietary solutions, when those standards are widely applicable enough to matter to a meaningful fraction of the userbase. However, general-purpose open source cannot be expected to always implement rigorous, highly specialized regulatory requirements and certification for one specific country, which is why niche proprietary products and companies providing paid validation and certification of open-source tools will always exist. I'm not sure where I recommended using a tool (Python) for an application where it is clearly not very appropriate or accepted (aerospace design and modelling), at least for mission-critical code, an area where conversly Matlab is very well suited.

To note, there is no technical reason why it couldn't, e.g. the R language provides considerable documentation to support FDA regulatory conformance, as does Rstudio and other open source R tools, and third party companies provide fully validated builds of R with an extensive set of documentation, test suites and regulatory documents for various industries, as a result of R being increasingly widely used in the medical field and other such industries, including by numerous major Fortune 500 companies.

As far as I'm aware, there isn't something exactly similar, since it is not as heavily used in these specific areas, but companies like the aforementioned Red Hat do offer testing and validation services for the Python builds and packages included in their distributions (and others, for a fee), and companies like Anaconda and Quansight offer a level of guaranteed support, validation and targeted development aligning with corporate priorities for clients using the PyData stack, including Spyder, that could be used to implement such.

If you were going to fly on a plane, do you want one designed with traceable and proven tools or one by some guy on Github?

I have no idea why you're bringing this up. What does some unknown CAD/validation software tool designed by "some guy on Github" and flying on a plane have to do with using Spyder, Jupyterlab and Rstudio for data science, each tools developed and supported by dozens of active developers and used by hundreds of thousands? Why do you regard "traceable and proven" as contrary to open source? Software tools, open or closed source, can never fully prevent people or companies from making poor high-level decisions, particularly those the compromise long-term safety for short-term gain (cough, 737 Max MCAS, cough), However, by definition open source provides traceability by allowing anyone to examine the source for themselves, and verifiability by opening the source (and the validation test suite) to independent, community examination, testing and validation by experts around the world and the open sharing of any deficiencies found. Does this automatically mean that open source code will be rock solid simply because its open source? Of course not, but if the software has a significant expert userbase, particularly with such demanding requirements, then open source certainly increases the opportunity for this (as e.g. R and Rstudio's compliance documentation explain and justify in detail).

@t-wojciech
Copy link

RStudio released another major version which introduced a few things mentioned by @abalter as missing. I think some of you will be interested. RStudio 1.4 introduces visual markdown editor and strict Python support.

I used Rodeo for a long time as an IDE for Python, then changed to Spyder, but I was not satisfied. Now RStudio is the best choice for me, but it can be biased as I previously used RStudio for R and Rodeo for Python. Anyway, I think it's worth testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests