Thursday 3 March 2022

GCP DAGS " Given * file, /home/airflow/dags/* , could not be opened.

Although this post is about solving my specific issue with locating the 'googleads.yaml' and it's 'private_key' file , it's will also be useful to others who want to Access any other file that they have uploaded. 

I had the following code that I could not get to work.  


FOLDER_PATH = '~/dags/'
2 3def gam_traffic(): 4 client = ad_manager.AdManagerClient.LoadFromStorage(FOLDER_PATH + 'googleads.yaml')

  I'd tried other FOLDER names , like the gs:// bucket address and the full address like

europe-west1-composer-XXX-XXXX-bucket.  All to no avail. 


RED HERRING ALERT

Using the gs:// bucket address worked fine for Saving and Retrieving .csv files using Python Panda, but I couldn’t get the address to work when using the Python Open command. My assumption now is that the Python Panda library must contain some ‘magic’ that process this address when it see that the URI starts with ‘gs://’

To solve my issue and find the path I needed then I ran the following tasks to prove what folder we are in , and all the files and folders in it.


import os
2 3# Print working dir 4def print_working_dir(): 5 directory = os.getcwd() 6 # Iterating through the json 7 logging.info(directory) 8 return "success"

 

1import glob 2 3# Print working dir 4def list_working_dir(): 5 for filename in glob.iglob("./**/*", recursive=True): 6 logging.info(filename) 7



The second task takes ages to run, as its iterating through loads of folders. In our case we intercepted the logs while it was running , as we could already see the information that we needed.

In our case the filepath we needed was

1./gcsfuse/dags/googleads.yaml'










No comments: