Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User experience report #243

Open
Erudition opened this issue Dec 21, 2023 · 5 comments
Open

User experience report #243

Erudition opened this issue Dec 21, 2023 · 5 comments

Comments

@Erudition
Copy link

Erudition commented Dec 21, 2023

I just finally got SEPIA running via a docker container for testing.
As there are no YouTube videos or tutorials, I couldn't risk spending all that time doing the manual setup with all its steps just to test it out, I wanted to see how it performs before digging further. I ran into so many weird quirks and poor UX on the way, but I eventually got it so I can talk to the android app with my voice.

Then I realized that almost everything I ask it, just opens a web browser, gets it wrong, or does nothing.

  • "What is the weather" - gives me the definition of "weather". same with "forecast", "weather outside"
  • "What is the temperature outside" - "You have no connection to a smart home right now." (facepalm)
  • "What is my name" - tells me about a song called "My Name Is"
  • "What is my location" - opens a google search for the term "my location"
  • "Set a timer for 30 seconds" - this is one that actually works! Except when it rings:
  • "Cancel the timer" - "Are you sure you want to stop the next timer" -> "There seems to be no active timer"
  • "How tall is Barack Obama" - "I'm not sure I got that right."
  • "What is 2+2" - Tells me about a Pontiac car called 2+2
  • "What is the sum of 2 and 2" - searches the web for "sum of 2 and 2"
  • "17 divided by 4" - "I'm not sure I got that right"
  • "How long is the drive to New York City" - takes me to the Maps app, but the destination is a local financial advisor business, not New York City
  • "Show me the way from Munich to Berlin by train" (pulled right from the built-in examples list) - opens map app going from the same random business as above to the address "null, null"
  • "Add potatoes to my shopping list" - this seems to work ok.

Overall I'm struggling to see how this will make my life easier, as the software is full of a LOT of configuration, settings, and cluttered interfaces, but it just doesn't seem to be "intelligent" like the website says.

@barneysspeedshop
Copy link

There are closed source, zeroconf solutions for speech processing and synthesis available that require very little user knowledge to set up. This is not that. This product is open source, free, and highly configurable. You may have to exert a considerable effort to make it work the way that you want it to. The difference between this and a "turn key" solution is that you can fully customize almost everything about it. You have to weigh the benefits from the drawbacks and determine which solution would be best for you, but your comments about being "cluttered" and "doesn't seem to be intelligent" are unhelpful and might dissuade knowledgeable users from providing assistance to this FREE product that does not harvest your personal and private information, and does not sell that information to anyone.

@Erudition
Copy link
Author

Sorry you feel that way. Naturally, I'm here precisely because a libre, free, privacy-preserving solution is exactly what I want. I'm also a fan of the high configurability. That said, you seem to paint these values as in opposition to sensible defaults and a good out-of-box experience. That could not be further from the truth. I've tried half a dozen projects with the same values and similar goals, and yet somehow they provide a decent OOB experience.

Home Assistant, for example, is a Free/Libre/Open project with a voice assistant that is also fully private, extremely modular and customizable. It doesn't quite meet the experience of many of the proprietary projects you mention, but nothing about these values prevents that, and the default experience provides much more promising results than those I listed from SEPIA. Not closed source, not zeroconf, and yes, requires considerable effort to make it work the way I want to. Yet, the defaults at least make sense for what some users want, out of the box. "Customizing" functionality does not imply building functionality from scratch, in the same way that a "customizable home" does not secretly mean that you actually get a kit of building materials and no place to live until you put something together.

My opinions on the intelligence and interface are just that, opinions, which are not meant to be "helpful" except for devs looking to hear about the current experience from the perspective of prospective users - which is exactly what the title of this issue says... I see no reason why they "might dissuade knowledgeable users from providing assistance" (you mean devs, volunteering dev work? why would this stop them?), but it might encourage not knowledgeable prospective users from wasting their time if their expectations are similar and they had no idea this would be the experience. Or better yet, they'd still go into it, but with more awareness and armed with knowledge of what to expect.

@barneysspeedshop
Copy link

Mainly, what I feel is unhelpful is being critical of how it IS. What I believe would be helpful is a list of what you would like to see specifically.

With bug reports, it's typical to use the following format:
Steps to reproduce:

  1. Open the app
  2. Choose option (X)
  3. Enter value
  4. Save
    Observed behavior:
    It does not save the value that I entered
    Expected behavior:
    It saves the value that I entered

For feature/functionality requests, they often are best served under the following format:
As a user, I would like to see the number of times I have asked S.E.P.I.A. for a place to buy hummus

@fquirin
Copy link
Contributor

fquirin commented Apr 26, 2024

Hi guys,

I'm sorry that I wasn't really involved in the discussion until now, but I had to take a little (ongoing) break from the project for a while, since it was consuming a huge amount of time and I wanted to reevaluate the influence of the new LLM hype on the future of the project as well.

Some quick comments:

  • "what is the weather?" really is an odd bug I never noticed 😅. It seems the Wikipedia service takes priority here. "How is the weather" or "what is the weather in [location]" should give you correct results.

  • Knowledge based questions are incredibly hard to identify and answer if there is no paid service in the background. Not sure how apps like Home Assistant would do this in a legal way without using their own cloud to pay for it. That said it is certainly possible to improve SEPIA in this point by doing extensive parsing of Wikipedia content and applying compute intense procedures, but that was never really the priority or scope of SEPIA, at least not for now, especially since web searches will work good enough in most cases.

  • Your timer cancel issue seems weird, I cannot reproduce this problem, at least not in my everyday usage.

  • Calculator is not a feature of SEPIA. There have been community based experiments for it, that could solve very simple calculations. But its harder to do than you might think and not really a priority feature right now.

  • Your location based issues seem to be related to a missing GPS service (the server log might mention that, or an in-app API key note should show). The null,null error is a bit weird and would require deeper investigation. I've just tried it again in my own version and all of your examples work pretty well for me 🤔.

I'm using SEPIA every day for timers, reminders, news, smart home control, weather, lists and web searches. These services should work pretty well out of the box. Well, smart home requires you to set up your devices in addition and depends a bit on the choice of your HUB.

The NLU is a very efficient and customizable, rule based engine, that does not require any fat LLM or third party service. But since SEPIA has a larger number of services, this means there can be some cross-talk in identifying commands and some quirks if inputs don't match any parser. In this case you can overwrite the command with the Teach-UI. Extending these features was one of the priorities before I decided to take a little break.
Systems like Home Assistant work in a similar way but usually support far fewer services, which makes them less versatile, but possibly better in very specific cases like smart home control. Besides that there are multiple people working full time on HA for actual money ^^ and some things only work if you pay for their cloud ;).

In summary I should say I'm well aware of certain weaknesses in SEPIA and I appreciate everyone who takes their time to play with the framework and writes down their feedback! 👍
I usually collect this feedback and try to fix the most annoying bugs first while at the same time trying to improve services and add new features. It takes a lot of time, but it will improve eventually. I'm determined to work on this project for many years to come 😉

@fquirin
Copy link
Contributor

fquirin commented Apr 26, 2024

One more thing:

In theory, it would be rather easy to overcome 90% of the shortcomings of SEPIA in the area of knowledge based questions by simply adding support for the Open-AI API. Many companies try to take this shortcut lately ^^.

Since user-choice is one of the key elements of SEPIA, I'm not completely against it, but I'm not a fan either, so I'll probably not do it myself, but I would support community developments in this area ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants