firethorn

changeset 4313:ba9eff2eed8a

Added some notes on recent issue with the TAP services
author Stelios <stv@roe.ac.uk>
date Fri Oct 30 18:19:55 2020 +0000 (3 months ago)
parents 7345ff52db44
children a544ca0d35c9
files doc/notes/stv/20201030-TAP-Service-issue.txt
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/doc/notes/stv/20201030-TAP-Service-issue.txt	Fri Oct 30 18:19:55 2020 +0000
     1.3 @@ -0,0 +1,76 @@
     1.4 +#
     1.5 +# <meta:header>
     1.6 +#   <meta:licence>
     1.7 +#     Copyright (c) 2020, ROE (http://www.roe.ac.uk/)
     1.8 +#
     1.9 +#     This information is free software: you can redistribute it and/or modify
    1.10 +#     it under the terms of the GNU General Public License as published by
    1.11 +#     the Free Software Foundation, either version 3 of the License, or
    1.12 +#     (at your option) any later version.
    1.13 +#
    1.14 +#     This information is distributed in the hope that it will be useful,
    1.15 +#     but WITHOUT ANY WARRANTY; without even the implied warranty of
    1.16 +#     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    1.17 +#     GNU General Public License for more details.
    1.18 +#  
    1.19 +#     You should have received a copy of the GNU General Public License
    1.20 +#     along with this program.  If not, see <http://www.gnu.org/licenses/>.
    1.21 +#   </meta:licence>
    1.22 +# </meta:header>
    1.23 +#
    1.24 +
    1.25 +
    1.26 +# Issue with Live WFAU TAP Services (October)
    1.27 +
    1.28 +# On the weekend of October 17th our healthchecker started producing a number of error messages:
    1.29 +
    1.30 +..
    1.31 +Health Check Results for: http://tap.roe.ac.uk/firethorn/system/info
    1.32 +HTTP Error 503: Service Unavailable /
    1.33 +
    1.34 +Health Check Results for: http://tap.roe.ac.uk/firethorn/system/info
    1.35 +HTTP Error 503: Service Unavailable /
    1.36 +..
    1.37 +
    1.38 +
    1.39 +# Check 1: Sync query
    1.40 +# -------------------
    1.41 +
    1.42 +# First check was to check a simple query to the sync endpoint of one of our TAP services
    1.43 +# firefox http://tap.roe.ac.uk/osa/sync?REQUEST=doQuery&QUERY=SELECT+TOP+1+*+from+ATLASDR1.Filter&LANG=ADQL
    1.44 +
    1.45 +
    1.46 +# Exception 503
    1.47 +
    1.48 +
    1.49 +# Check 2: Docker ps
    1.50 +# ------------------
    1.51 +
    1.52 +# Second check was to log into the VM to see if the Docker containers are up and running
    1.53 +
    1.54 +# ssh Stevedore@Lothigometh
    1.55 +
    1.56 +# ..ssh exception
    1.57 +# Not able to log into the machine at all
    1.58 +# It looks like the VM got locked up, an issue we've seen before
    1.59 +
    1.60 +
    1.61 +
    1.62 +# See if we can restart machine
    1.63 +# ------------------------------
    1.64 +
    1.65 +# VM did not successfully restart after kvm restart command
    1.66 +
    1.67 +
    1.68 +# Recreate VM and run import
    1.69 +# ------------------------------
    1.70 +
    1.71 +# Service recovered after recreating Lothigometh and running the import scripts as defined here:
    1.72 +# http://wfau.metagrid.co.uk/code/firethorn/file/310132961970/doc/notes/stv/20200514-TAP-Swarm-deploy-2.1.36.txt
    1.73 +
    1.74 +# As the endpoints produced matched what is already there in the Apache proxy, we did not have to edit the proxy config
    1.75 +
    1.76 +
    1.77 +# Service back up & running ..
    1.78 +
    1.79 +