Skip to content
This repository has been archived by the owner on Jan 31, 2022. It is now read-only.

testConnectivity OH Masking Issue #292

Open
1 of 2 tasks
bregnery opened this issue Jan 23, 2020 · 5 comments
Open
1 of 2 tasks

testConnectivity OH Masking Issue #292

bregnery opened this issue Jan 23, 2020 · 5 comments
Assignees

Comments

@bregnery
Copy link

It would seem that the OH masks are not working properly on the testConnectivity.py

Say for example, that we want to do testConnecivity with a GBT phase scan on (1,11,5). So we run testConnectivity with OH mask 0x20. As we do that, we notice all of the other OH's on the same CTP7 start jumping around in current values. Then, testConnectivity will fail with all of the other chambers on that CTP7 and to test another, the chamber must be powercycled in order to recover it.

Types of issue

  • Bug report (report an issue with the code)
  • Feature request (request for change which adds functionality)

Expected Behavior

We should be able to run testConnectivity one chamber at a time without affecting current/communication with masked chambers.

Current Behavior

running testConnectivity on one chamber will mess up the communication with an unprogrammed chamber.

Steps to Reproduce (for bugs)

  1. Power on all chambers on a CTP7
  2. Run testConnecitivity with a GBT Phase scan for one chamber (say for example OH5, mask 0x20)
  3. Observe the current value of other OHs on that CTP7
  4. Run testConnectivity on another chamber from that same CTP7 (say for example OH1), this should fail.

Possible Solution (for bugs)

Context (for feature requests)

Your Environment

  • Version used:

Name : gempython_vfatqc
Arch : x86_64
Version : 2.7.7

Name : gempython_vfatqc
Arch : noarch
Version : 1.0.5

  • Shell used: Bash
@mexanick
Copy link
Contributor

This is known behavior. The current values are jumping because all the OHs receive hard reset and got reprogrammed. However this should not cause any issues unless you're doing something on them in parallel. The communication to the frontend should remain stable (GBT config should not be affected). Could you please elaborate on the subsequent communication failures on other chambers?

@bregnery
Copy link
Author

After we did testConnectivity.py with GBT phase scan for 0x20 and it finished, we tried testConnectivity.py for 0x2 and got the following error:

[gempro@kvm-s3562-1-ip151-74 ~]$ testConnectivity.py 1 11 0x2 --skipScurve --skipDACScan
Open pickled address table if available  /opt/cmsgemos/etc/maps//amc_address_table_top.pickle...
Initializing AMC gem-shelf01-amc11
====================
Step 1: Checking GBT Communication
====================
Checking GBT Communication (Before Programming GBTs)
GBT Communication was not established successfully
        Try checking:
                1. Fibers from GE1/1 patch-panel to OH have correct jacket color ordering
                2. Fibers from GE1/1 patch-panel to OH are fully inserted
                3. OH3 screw is properly screwed into standoff
                4. OH3 standoff on the GEB is not broken
                5. Voltage on OH3 standoff is within range [1.47,1.59] Volts
Connectivity Testing Failed
If Vmon = 8.0V then Imon must be 1.71 +/- 0.01A; if not the GBT's are not locking to the fiber link
Goodbye

This happens with any of the other OHs on shelf01-amc11at p5. The only way to recover one of the other chambers is to power cycle

@bregnery
Copy link
Author

Okay, so something strange is happening.

I first noticed this last week at point 5 on a different amc. Then I was able to reproduce this twice at p5 today. But just now, we did recover.sh to the CTP7, I tried to reproduce this behavior again. But now that strange behavior has stopped.

@mexanick mexanick reopened this Jan 23, 2020
@mexanick
Copy link
Contributor

Hmmm... this is really interesting @evka85, please take a look. Under normal operations we are supposed to get resyncs from time to time as well as hard resets, which will be forwarded to front end and it is important to restore the system properly...

@lpetre-ulb
Copy link
Contributor

lpetre-ulb commented Jan 23, 2020

@bregnery, since you could reproduce the issue, did you read some registers and/or made some dumps? Such as reading the GTH transceiver or the GBT link statuses (before and after a link reset for the latter)? It rather difficult to do post-mortem analysis with very little information.

We are/were using testConnectivity.py for electronics tests on 6 chambers at ULB during months and we never experience a such behavior...

Also, the IPBus issue is not related since it is not for the same link and there is not IPBus-related error messages in the current issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants