dir=>

Switch to dark theme

Switch to light theme

Troubleshooting #


Hercules onboarding - Client creation failure

Problem statement : Oauth client creation failure during hercules onboarding.

Fix/Validation steps :

  1. Check if hercules_service application (as Relying Party) is added in your domain.
    DON’T delete the application from the domain. If deleted currently there is only one way to fix it is using dml.
  2. Check if cipher_sb_admin role is present
    Note: Please also verify cipher_sb_admin has proper role entries added. This can be easily checked using the token in your domain and for authprofile having the above role and calling Get Enriched API with realm.zeta.in as objectType JID. For more details on actions required you can check Enabling oauth v2 clients for realm. Sandbox configurations can be reconfigured using sandbox-setup.jar. Make sure to use latest yaml file from the bitbucket.
  3. Make sure sso_admin application is present in the domain.
Tenant/Sandbox Mismatch

Problem statement : Occurence of the sandboxId/tenantId mismatch error.

Fix/Validation steps :

  1. If you face the sandboxId/tenantId mismatch error, you need to pass the correct sandboxId and tenantId combination of the passed client with the url as query params.
  2. There are also multiple fixes related to multithreading, abstraction which might cause issues in rare scenarios.This issue can be resolved by updating spring-boot-commons to 1.4.7 version or later.
Oauth Client creation failure

Problem statement : Failed to create an Oauth Client.

Fix/Validation steps :

  1. Please ensure the setup is done for your domain and sandbox.
  2. Search the debugId in kibana that is there in response.
  3. sso_admin application is not added for the domain or authorization mechanisms are configured incorrectly.
  4. Improper sandbox configuration follow above attached document to do proper sandbox configuration.
OMS Cipher service timing out

Problem statement : OMS Cipher service timing out.

Fix/Validation steps :

  1. Check the Microservices Chatty dashboard to see if service is timing out or its client reported. If client reported then check if the client is OMS or nonOMS.
    If OMS : Check olympus proxy metrics, if it’s fine then report the same.
    If not OMS : then involve compute infrastructure team and if non OMS ask them to check if they are using cipher proteus URL.
  2. If service is reporting timeouts then:
    a. Check database latency.
    b. Check downstream services is timing out.
    c. Check if any recent deployment was done, if yes check the changes.
    d. Check if any particular API is timing out.
    e. Check if the timeouts are at a specific instance.
    f. Check if there are resource constraints.
Public keys not found for CA and PKID

Problem statement : No public keys found for any given CA and PKID.

Fix/Validation steps :
The application needs to be added as CA in certstore using following steps:

  1. Use this jar to generate a keyPair https://drive.google.com/file/d/1PT1tkOxfvHp167cY9349cFEH_uGbmpVh/view?usp=sharing.

  2. Add the generated key pair in CA application.

  3. Add the corresponding public key in certstore’s keys.txt file for preprod and staging in the format ={“publicKey”:, “extendedCertTypes”:[“PUBLIC_KEY”]}.

  4. Possible values for extendedCertTypes is PUBLIC_KEY, PRIVILEGE. Use PUBLIC_KEY if the CA is to issue public key certificates. Use PRIVILEGE if the app is to issue privilege certs.

  5. For ks8 environments, the keys.txt is mounted via vault. The location is /secrets/secrets/show/zone/common/cipher/external-cert.

  6. The JID(issuer JID) will be provided by the application which is being made a CA. The key pair has to be generated by that application as well.

AuthProfileNotFoundException

Problem statement : AuthProfileNotFoundException on deleteAuthProfile API .

Fix/Validation steps :
Error logs will state that resource not found and trace will show error on getAuthprofile inside deleteAuthProfile call.

in.zeta.oms.exception.authProfiles.AuthProfileIdentityNotFoundException: resource not found
at in.zeta.oms.cerberus.dao.authProfiles.AuthProfileDaoWrapper.lambda$null$19(AuthProfileDaoWrapper.java:143)
at java.util.Optional.orElseThrow(Optional.java:290)
at in.zeta.oms.cerberus.dao.authProfiles.AuthProfileDaoWrapper.lambda$getIdentity$20(AuthProfileDaoWrapper.java:143)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:110)
at in.zeta.oms.cerberus.dao.authProfiles.AuthProfileDaoWrapper.getIdentity(AuthProfileDaoWrapper.java:140)
at in.zeta.oms.cerberus.dao.authProfiles.AuthProfileDaoWrapper.deleteIdentity(AuthProfileDaoWrapper.java:149)
at in.zeta.oms.handler.authProfiles.DeleteIdentityHandler.on(DeleteIdentityHandler.java:27)
Cipher SSO request failed

Problem Statement : When trying to access any center, the SSO request fails.

Fix/Validation Steps :

  1. When cipher sso request fails it appends a traceId in the url.
    eg: /oauth/error?error=xx&traceId=1ac1de2a-b6c3-46a0-9aa9-8e1db532d897
  2. Use this UUID 1ac1de2a-b6c3-46a0-9aa9-8e1db532d897 to search in kibana to find the error in most of the cases it will be with the same log to why the request failed.
  3. Scope is not attached to client (invalid scope: for client:)
  4. Invalid client (Make sure clientId, domainId, tenantId and sandboxId params are passed correctly).
  5. Authplan is not added with the scope.
  6. cerberus_oauth_agent application is not added in cerberus2 domain or doesn’t have correct authorization mechanism which is required to communicate with sso.
  7. IDP certificate is malformed or incorrect please make sure you have added correct PKC (verify using openssl x509 -in cert.pem -text -noout).
READONLY You can't write against a read only replica
Problem Statement : Getting the error READONLY You can’t write against a read only replica while entering the login credentials for accessing any center.

Fix/Validation Steps :
This error comes at sso when sso or its dependent services are facing issues with the redis cluster. Please involve devops team to investigate further.
Notification Setting issue

Problem Statement : Notification settings not getting updated.

Fix/Validation Steps :
You need to check below points-

  1. Check if you are logging in from the domain/realm and sandbox for which bot_admin role is defined.
  2. Check if you have an expiryTracker.create action with your token in case you face 401 on expiryTracker APIs. check unauthorized debug section.
  3. If not available, run the botAdminModule.yaml from accesscontrol repo.
  4. Check object group condition in case still 401 occurs with your token there might be domain level js filter added which is causing failures.
SSO login screen not opening

Problem Statement : When you are trying to access any Center, the login screen comes as blank.

Fix/Validation Steps :

  1. Check SSO logs and health check url.
  2. Redis/jedis errors -> Contact devops.
  3. No logs -> Check for recent deployment failures, try restarting.
  4. Rollback if a recent deployment was done.
  5. If you face the sandboxId/tenantId mismatch error, you need to pass the correct sandboxId and tenantId combination of the passed client with the url as query params. (Still not working? Refer here)
  6. If you face domain/sandbox mismatch error, you need to map domain and sandbox in the access management dashboard.
  7. If sso.error screen comes without any error or debug ID, issue would be related to login screen template i.e. branding Config data might be incorrectly configured.
Idp Cert errors cerberus2

Problem Statement : ImproperCertException {“message”:“certificate is not valid for the signature”,“type”:“ImproperCertException”}

Fix/Validation Steps :

  1. Make sure you are using correct private key to sign ICRC.
  2. Make sure idp has correct x509 certificate corresponds to private key used for signing.
Auth Plan Issue
Problem Statement : Correct Auth Plan is not getting triggered for an employee.

Fix/Validation Steps :
Check if auth plans are associated with scopes once after creating auth plans.
OTP Not Sent

Problem Statement : OTP is not getting send to the selected identity type.

Fix/Validation Steps :

  • Search on kibana “p:+91<10 digit phone number>”
    • You would find a log in one of mercury’s applications.
  • Check the communication provider config for the given Domain and identity type.
    • To get IDP used for identity type, use eg:
      • select * from identity_config_view WHERE domain_id = ‘<domain>’ and identity_type = ‘phoneNumber’
    • Then, find the communication provider configuration used with that IDP from idpcommunicationprovider table.
    • If the communication provider config has a ‘toggle’ field with ‘true’ value then it is trying to send OTP via direct mercury call, otherwise via atropos webhook.
      • If no communication provider is configured, Inbox is being used.
    • If atropos is being used, verify that the event is being generated using akhq (zone-specific URL can be found in tools.corp.zeta.in)
      • Topic name is _tenant_(tenantId)_identity
      • Also check for any lag in the webhook consumer in AKHQ.
    • Once verified that atropos side is fine (or mercury is being called directly), search for the event in mercury logs.
      • Use app as ‘mercury-router’ and ‘parsedMessage.title’ as ‘SendCommunicationRequestPayload’ filters in Kibana.
Sign-up flow troubleshoot

Problem Statement : Employee getting errors on sign-up flow

Fix/Validation Steps :

  • Failed to decode challengeCert
    • Make sure you are sending ICRC in the correct format. The expected format is the url encoded by base64 in the signIn url (/signIn?challengeCert=UrlEncoded(Base64 of challengeCert))
  • ImproperCertException {“message”:“certificate is not valid for the signature”,“type”:“ImproperCertException”}
  • Make sure you are using correct private key to sign ICRC
  • Make sure idp has correct x509 certificate corresponds to private key used for signing
  • There is a two days cache on the idp certificate. if you have changed the certificate for existing idp it might cause issues. Please contact cipher oncall to clear that cache.
Auth Profile Sync Issue

Problem Statement : Auth Profile Sync Issue

Fix/Validation Steps :

  • Profile Provider query should be different for each domain in the operation table to sync.
How to get OTP?
  1. Getting OTP of numbers through API
    • Add a new entry in Firebase Realtime Database - https://console.firebase.google.com/u/2/project/apollo-test-apurva/database/apollo-test-apurva/data
    • Make sure you are able to write data into the database through PUT API call on the endpoint mentioned in the realtime database dashboard.
      • The endpoint will be: < base url mentioned in dashboard > < databaseName >.json
      • For example: https://apollo-test-apurva.firebaseio.com/AUTOMATION-OTPS.json
    • To read the OTP, you can make a GET request on the same endpoint defined in step 2.
    • Add a webhook subscription to atropos and keep the webhook subscription url same as that used in step 2.
      • Make sure to add a transformer so that only the required event is passed and all other events are filtered out (return ‘{}’ in transformer for all other cases).
      • The webhook subscription will be a custom HAR with PUT request on the endpoint determined from step 2.
      • Sample ticket: https://zeta-tm.atlassian.net/browse/CII-392
    • You will be able to get the latest OTP after all the above steps are performed successfully.
  2. OTP should be available in Kibana if consumed by mercury. Which can be searched with filter “p:<mobile number with country code> or “e:<email address>.
  3. If OTP is not available with kibana, you can retrieve it from atropos topic.
    Eg. https://akhq-appinfra.internal.mum1.zetaapps.in/prod-olympusworld/topic/_tenant_296793_identity?sort=NEWEST
    OR
    https://akhq-appinfra.internal.mum1.zetaapps.in/prod-olympusworld/topic/_system_DEFAULT_communication?search=<mobile number>
    this is example for prod test tenant id 296793. There will be a similar topic for your tenant in the respective zone as well.