<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Interview &#8211; ODF Sweden</title>
	<atom:link href="https://oceandatafactory.se/category/interview/feed/" rel="self" type="application/rss+xml" />
	<link>https://oceandatafactory.se</link>
	<description>Data driven innovation and collaboration for a sustainable Blue Growth.</description>
	<lastBuildDate>Wed, 13 Apr 2022 09:34:59 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.3</generator>

<image>
	<url>https://oceandatafactory.se/wp-content/uploads/2020/03/cropped-logo-32x32.png</url>
	<title>Interview &#8211; ODF Sweden</title>
	<link>https://oceandatafactory.se</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Interview with Francis Freire from the PLAN-SUBSIM project</title>
		<link>https://oceandatafactory.se/interview-with-francis-freire-from-plan-subsim-project/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=interview-with-francis-freire-from-plan-subsim-project</link>
		
		<dc:creator><![CDATA[Felicia Ridderbjelke]]></dc:creator>
		<pubDate>Mon, 04 Apr 2022 11:17:19 +0000</pubDate>
				<category><![CDATA[citizen science]]></category>
		<category><![CDATA[Innovation]]></category>
		<category><![CDATA[Interview]]></category>
		<guid isPermaLink="false">https://oceandatafactory.se/?p=3065</guid>

					<description><![CDATA[<p>Francis Freire works at the governmental agency Geological Survey of Sweden (SGU) and was interviewed by the ODF Sweden Team&#8230;</p>
<p>The post <a rel="nofollow" href="https://oceandatafactory.se/interview-with-francis-freire-from-plan-subsim-project/">Interview with Francis Freire from the PLAN-SUBSIM project</a> appeared first on <a rel="nofollow" href="https://oceandatafactory.se">ODF Sweden</a>.</p>
]]></description>
										<content:encoded><![CDATA[		<div data-elementor-type="wp-post" data-elementor-id="3065" class="elementor elementor-3065" data-elementor-settings="[]">
						<div class="elementor-inner">
							<div class="elementor-section-wrap">
							<section class="elementor-section elementor-top-section elementor-element elementor-element-7b5cd46a elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="7b5cd46a" data-element_type="section">
						<div class="elementor-container elementor-column-gap-default">
							<div class="elementor-row">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-669a7bfe" data-id="669a7bfe" data-element_type="column">
			<div class="elementor-column-wrap elementor-element-populated">
							<div class="elementor-widget-wrap">
						<div class="elementor-element elementor-element-46a07710 elementor-widget elementor-widget-text-editor" data-id="46a07710" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
								<div class="elementor-text-editor elementor-clearfix">
				<p><!-- wp:cover {"url":"https://oceandatafactory.se/wp-content/uploads/2022/04/kso_profile_image.png","id":3090,"dimRatio":50} --><!-- /wp:cover --><!-- wp:tadv/classic-paragraph --></p>
<p><span style="font-weight: 400;">Francis Freire works at the governmental agency <a href="https://www.sgu.se/en/">Geological Survey of Sweden</a> (SGU) and was interviewed by the ODF Sweden Team Members Yixin Zhang (Department of Applied IT at Gothenburg University, Responsible for WP 2 Continuous evaluation &amp; innovation) and Felicia Ridderbjelke (Community Curator at ODF) for his contribution to the project </span><a href="https://oceandatafactory.se/plan-subsim/"><span style="font-weight: 400;">PLAN-SUBSIM</span></a><span style="font-weight: 400;">. The project is a national implementation of a PLatform for ANalysis of SUBSea IMages to develop methods for monitoring and analysing the status of the subsea habitats. The project will leverage existing methods, knowledge and infrastructure in the field of subsea image analysis and implement these for applications in marine resource management.&nbsp;</span></p>
<p><span style="font-weight: 400;">Francis is a marine geologist who surveys the Swedish coastal waters using hydroacoustic techniques and collecting sediment samples and high-resolution underwater images to produce full coverage benthic habitat maps. Analysing these data is extremely time consuming and ODF will therefor use our machine-learning approach to speed up and scale out the analysis of the surveys.&nbsp;</span></p>
<p><b><i>Yixin Zhang: Can you tell us a little bit about yourself and the project?</i></b></p>
<p><b>Francis Freire: </b><span style="font-weight: 400;">I&#8217;m a marine geologist by background and use marine geophysical methods to study seafloor geology. Together with people from my department, we survey the Swedish coastal waters to determine what type of seafloor can be found where. We do a habitat mapping based on the data that we collect for the survey and that&#8217;s basically how we came in contact with the ODF. Swedish Geological Survey is the government agency assigned to do the geophysical surveys in the territorial waters within Sweden and we wanted to effectivese the surveying and to characterize the sea for geology.</span></p>
<p><span style="font-weight: 400;">When we go out into the field, we use different geophysical equipment and have our own boat. The trips can be around 2 weeks and we are about 10 people on the boat to map a particular place. We also do geophysical sampling, which means that we use acoustic multi-beam systems to determine and characterize the seafloor. We use other acoustic systems to investigate what&#8217;s underneath the seafloor, the subseafloor. We take samples of the sediments in the coastal waters and do a lot of chemical analysis to determine the level of toxins.&nbsp;</span></p>
<p><span style="font-weight: 400;">I am probably more interested in the habitat mapping part of the seafloor, the part where we record underwater videos and acoustic data. When we collect the data, we use machine learning and to come up with habitat maps of the area that we survey. For example, we had a big project in Hoburgs bank, which is on the Baltic part of Swedish water. It was a very comprehensive survey and we were there for almost two months.&nbsp;</span></p>
<p><b><i>Yixin Zhang: I once read that we actually know more about planet Mars compared to the seafloor. How little or much do we actually know about the seafloor?</i></b></p>
<p><b>Francis Freire: </b><span style="font-weight: 400;">That is probably true. We try to map as much as we can and we already know quite a lot about Swedish waters. But overall, there are still a lot of areas that are not mapped. The biggest gap is probably not in Europe or in the US but in the bigger ocean areas, for example in the middle of the Atlantic.</span></p>
<p><b><i>Yixin Zhang: What are habitat mapping and geophysical surveying?</i></b></p>
<p><b>Francis Freire:</b><span style="font-weight: 400;"> Habitat mapping is when you try to identify all the habits that can be found in the waters. There are around 40 different classified habitats depending on the coverage of the area. An example of this is the project called </span><a href="https://resource.sgu.se/dokument/publikation/sgurapport/sgurapport202034rapport/s2034-rapport.pdf"><span style="font-weight: 400;">Helcom</span></a><span style="font-weight: 400;"> in the Baltic Seawhere where the key component is habitat mapping. There are so many ways to define the habitats and I think it is our ethical mandate to also create this to follow this directive and create habitat maps.&nbsp;</span></p>
<p><span style="font-weight: 400;">Regarding geophysical surveying, that is when we go out into the field and collect geophysical data, mostly acoustic data. We send out a caustic pulse which will bounce back from the seafloor to the boat and give us information about the seafloor. We also measure the amount of sound that comes back which gives us an idea of the seafloor material. Regarding this acoustic system, we send out 10 or 15 samples for every square meter. We then get very dense data and information for even up to a decimetre.So every point one decimeter in the seafloor, we get an acoustic pole. Wa also collect underwater videos and photos and sometimes also just collect “real samples” by just going out in the ocean and grabbing anything that is there to see what materials can be collected.</span></p>
<p><span style="font-weight: 400;">Finally, we then use machine learning to interpolate all this data to get a more clear picture of the habitat maps. But briefly, from the acoustic data, we collect information for the whole area. But for the underwater videos, pictures, and samples, we only receive information for specific points. Then we interpret and combine all this data.&nbsp;</span></p>
<p><span style="font-size: 12pt;"><i><span style="font-weight: 400;"><a href="https://resource.sgu.se/dokument/publikation/sgurapport/sgurapport202034rapport/s2034-rapport.pdf">A report</a> that describes the habitat mapping process: in the HELCOM project:&nbsp; </span></i></span><i><span style="font-weight: 400;"><span style="font-size: 10pt;">High-resolution benthic habitat mapping of Hoburgs bank, Baltic Sea (2020), Gustav Kågesten, Finn Baumgartner, and Francis Freire. Available at:</span></span></i></p>
<p><img decoding="async" fetchpriority="high" class="alignnone size-full wp-image-3071" src="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-12.46.31.png" alt="" width="1418" height="1254" srcset="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-12.46.31.png 1418w, https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-12.46.31-768x679.png 768w" sizes="(max-width: 1418px) 100vw, 1418px" /><span style="font-size: 12pt;"><i><span style="font-weight: 400;">A figure from the report (Kågesten, Baumgartner, and Freire, 2020) illustrates ocean surveying and the instruments.</span></i></span><img decoding="async" class="alignnone size-full wp-image-3072" src="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-12.59.03.png" alt="" width="1384" height="834" srcset="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-12.59.03.png 1384w, https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-12.59.03-768x463.png 768w" sizes="(max-width: 1384px) 100vw, 1384px" /><span style="font-size: 12pt;"><i><span style="font-weight: 400;">Figures from the report (Kågesten, Baumgartner, and Freire, 2020) illustrate the photo mosaic of seafloor habitat.</span></i></span></p>
<p><b><i>Yixin Zhang: What motivated you to join the Subsim-project?</i></b></p>
<p><strong><span style="font-family: impact, sans-serif;">&nbsp;</span></strong><span style="font-weight: 400;"><strong>Francis Freire:</strong> I think the motivation for the whole team was that we wanted to facilitate the processing of our collected underwater pictures and videos. The idea behind the project is to use an algorithm and a fast computer to do the identifying work for us. To collect our data and feed it all into a computer would save us a lot of time.&nbsp;</span></p>
<p><b><i>Yixin Zhang: Considering all the data that you are gathering, processing and analyzing, how many hours do you actually invest in analyzing a five or ten-minute long video?&nbsp;</i></b></p>
<p><b>Francis Freire: </b><span style="font-weight: 400;">For example, we have collected close to 600 sampling points around the Hoburgs bank. For each of the photos, we need to find a way to upload it to our software and then identify the percentage for everything that covers the seafloor. For example, how large is the percentage of mussels? It probably takes around 30 minutes to one hour to analyze one picture. Then you also have to do some cross-checking afterward, so another person clarifies that the identification is right.</span><span style="font-family: impact, sans-serif;"><img decoding="async" class="alignnone size-full wp-image-3073" src="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.00.39.png" alt="" width="1386" height="598" srcset="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.00.39.png 1386w, https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.00.39-768x331.png 768w" sizes="(max-width: 1386px) 100vw, 1386px" /></span><img decoding="async" class="alignnone size-full wp-image-3074" src="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.00.54.png" alt="" width="1398" height="794" srcset="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.00.54.png 1398w, https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.00.54-768x436.png 768w" sizes="(max-width: 1398px) 100vw, 1398px" /><img decoding="async" class="alignnone size-full wp-image-3075" src="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.01.09.png" alt="" width="1452" height="976" srcset="https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.01.09.png 1452w, https://oceandatafactory.se/wp-content/uploads/2022/04/Skärmavbild-2022-04-04-kl.-13.01.09-768x516.png 768w" sizes="(max-width: 1452px) 100vw, 1452px" /></p>
<p><span style="font-size: 12pt;"><i><span style="font-weight: 400;">Figures from the report (Kågesten, Baumgartner, and Freire, 2020) illustrate camera and sensor set up, and underwater images mosaic.</span></i></span></p>
<p><b><i>Yixin Zhang: If I understand it right, you monitor 600 different observation points. How many pictures do you take per site?</i></b></p>
<p><span style="font-weight: 400;"><strong>Francis Freire:</strong> Depends on the area. The more diverse areas require more pictures. When the area is small, the resolution is high and the pictures become very clear. Before we start to photograph, we receive information from our multi-beam system or backscatter system that can give an idea about the depth.</span></p>
<p><b><i>Yixin Zhang: What challenges do you expect for similar projects in the future?</i></b></p>
<p><span style="font-weight: 400;"><strong>Francis Freire:</strong> For now, I think the biggest challenge is to be able to use artificial intelligence in our algorithm so we can make the identification even faster. If the algorithm can identify with good enough confidence, we can monitor more and bigger areas by feeding more pictures into our software and then process it effectively. This algorithm can then identify how much percent of the area was covered by for example algae so that we later can do the habitat mapping for the sites.</span></p>
<p><b><i>Yixin Zhang: Has machine learning been used in the work you do?</i></b></p>
<p><b>Francis Freire:</b><span style="font-weight: 400;"> I do not think we have enough manpower to do that just yet. We have been looking for some partners who can help us with this for a long time now, so we were happy to get in contact with ODF.</span></p>
<p><span style="font-family: impact, sans-serif;">&nbsp;</span><b><i>Felicia Ridderbjelke: From all of the photos you have collected, is there anything in particular that has surprised you?</i></b><span style="font-family: impact, sans-serif;">&nbsp;</span></p>
<p><b>Francis Freire:</b><span style="font-weight: 400;"> One interesting finding is that many seafloor covers are not permanent. The seafloor changes depending on the season. However, it is quite repetitive and looks similar for every season. We have also found a high percentage of mussels in Baltic areas, that differ from the species on the West Coast. Something that is not part of the project, but that interests me, is some of the shipwrecks that we have found.&nbsp;</span></p>
<p><span style="font-family: impact, sans-serif;">&nbsp;</span><b><i>Felicia Ridderbjelke: During which months have you done these trips?&nbsp;</i></b></p>
<p><span style="font-weight: 400;"><strong>Francis Freire:</strong> We started with the project in 2016 and our survey season starts from April until October.</span></p>
<p><b><i>Yixin Zhang: What are the challenges with your ocean trips?</i></b></p>
<p><strong><span style="font-family: impact, sans-serif;">&nbsp;</span></strong><span style="font-weight: 400;"><strong>Francis Freire:</strong> Well, you work 24 hours a day and if you get the night shift you work from 8 pm to 5 am. There was this one time, I think I threw up like three times during one trip. Also, the problem with our ship is that sometimes it makes a lot of noise which makes it hard to sleep. Sometimes the boat is shaking too much which makes it hard to eat. But when you have collected the data and produced the product, you get really satisfied and happy.&nbsp;</span></p>
<p><!-- /wp:tadv/classic-paragraph --></p>					</div>
						</div>
				</div>
						</div>
					</div>
		</div>
								</div>
					</div>
		</section>
						</div>
						</div>
					</div>
		<p>The post <a rel="nofollow" href="https://oceandatafactory.se/interview-with-francis-freire-from-plan-subsim-project/">Interview with Francis Freire from the PLAN-SUBSIM project</a> appeared first on <a rel="nofollow" href="https://oceandatafactory.se">ODF Sweden</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Interview with ODF Sweden data scientist Jurie “Jannes” Germishuys</title>
		<link>https://oceandatafactory.se/interview-with-odf-sweden-data-scientist-jurie-jannes-germishuys/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=interview-with-odf-sweden-data-scientist-jurie-jannes-germishuys</link>
		
		<dc:creator><![CDATA[Torsten Linders]]></dc:creator>
		<pubDate>Wed, 04 Mar 2020 21:58:07 +0000</pubDate>
				<category><![CDATA[Interview]]></category>
		<guid isPermaLink="false">https://oceandatafactory.se/?p=542</guid>

					<description><![CDATA[<p>Jannes is from Combine Control Systems AB, and he was interviewed by ODF Sweden Team Members, Yixin Zhang from the&#8230;</p>
<p>The post <a rel="nofollow" href="https://oceandatafactory.se/interview-with-odf-sweden-data-scientist-jurie-jannes-germishuys/">Interview with ODF Sweden data scientist Jurie “Jannes” Germishuys</a> appeared first on <a rel="nofollow" href="https://oceandatafactory.se">ODF Sweden</a>.</p>
]]></description>
										<content:encoded><![CDATA[		<div data-elementor-type="wp-post" data-elementor-id="542" class="elementor elementor-542" data-elementor-settings="[]">
						<div class="elementor-inner">
							<div class="elementor-section-wrap">
							<section class="elementor-section elementor-top-section elementor-element elementor-element-3463fb3b elementor-section-boxed elementor-section-height-default elementor-section-height-default" data-id="3463fb3b" data-element_type="section">
						<div class="elementor-container elementor-column-gap-default">
							<div class="elementor-row">
					<div class="elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6f6c9ecb" data-id="6f6c9ecb" data-element_type="column">
			<div class="elementor-column-wrap elementor-element-populated">
							<div class="elementor-widget-wrap">
						<div class="elementor-element elementor-element-76ff85d6 elementor-widget elementor-widget-text-editor" data-id="76ff85d6" data-element_type="widget" data-widget_type="text-editor.default">
				<div class="elementor-widget-container">
								<div class="elementor-text-editor elementor-clearfix">
				<!-- wp:paragraph -->
<p>Jannes is from <a href="https://combine.se/">Combine Control Systems AB</a>, and he was interviewed by ODF Sweden Team Members, Yixin
Zhang from the Department of Applied IT, Gothenburg University, and Adrian
Bumann from the Entrepreneurship and Strategy Department at Chalmers University
of Technology.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>Yixin together
wth Adrian are responsible for continuous evaluation within the ODF Sweden
project and are conducting interviews as part of this process. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Could
you guide us through the process of the first ODF Sweden Innovation Cycle?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> Roughly six
months ago, ODF Sweden started to work on our first innovation cycle focused on
the use case: Predicting the presence of the invasive species Dikerogammarus
Villosus (aka the Killer Shrimp) in the Baltic Sea region.</p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":543,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img decoding="async" width="600" height="500" src="https://oceandatafactory.se/wp-content/uploads/2020/03/killer-shrimp.png" alt="škamp" class="wp-image-543" srcset="https://oceandatafactory.se/wp-content/uploads/2020/03/killer-shrimp.png 600w, https://oceandatafactory.se/wp-content/uploads/2020/03/killer-shrimp-300x250.png 300w" sizes="(max-width: 600px) 100vw, 600px" /></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p>The process can
be roughly broken down into the following:</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Data collection
and feature selection</em>: The data were download from various open APIs
including Emodnet and Marine Copernicus. Features were selected based on the
input of marine experts which were: temperature, salinity, depth, substrate and
wave activity.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Data
preparation and cleaning</em>: Missing data were removed, and features were
visualized. In this case, we noticed that the data were very skewed towards the
absence class, which meant there was extreme high-class imbalance. To address
this, we used an oversampling method that increased the instances of the
“presence” class by creating synthetic cases based on the original presence
cases.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Setup of
training and test sets:</em> The training set is what we use to train the model,
whilst the test set is an independent dataset used for evaluation. An 80/20
split was used in this case.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Choosing the
model</em>: Primarily tree-based models were used: a single
decision tree and random forest (shown above). The main difference between them
is, for example, if you ask a question to a single person, there is a smaller
chance to get the question right (assuming they’re not an expert) than if you
average the opinion of a whole group (just like in the show “Who wants to be a
millionaire”). </p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":544,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="522" src="https://oceandatafactory.se/wp-content/uploads/2020/03/model-1024x522.png" alt="A close up of a map

Description automatically generated" class="wp-image-544" srcset="https://oceandatafactory.se/wp-content/uploads/2020/03/model-1024x522.png 1024w, https://oceandatafactory.se/wp-content/uploads/2020/03/model-300x153.png 300w, https://oceandatafactory.se/wp-content/uploads/2020/03/model-768x391.png 768w, https://oceandatafactory.se/wp-content/uploads/2020/03/model-1536x783.png 1536w, https://oceandatafactory.se/wp-content/uploads/2020/03/model-1600x815.png 1600w, https://oceandatafactory.se/wp-content/uploads/2020/03/model.png 1727w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p><em>Training the
models: </em>All the models chosen were trained with their standard
configurations in scikit-learn and fast.ai Python libraries for easy
replication. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Evaluating the
models: </em>The models were scored on their ability to correctly
predict the locations where the killer shrimp would be present, which is termed
<em>recall</em>. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Interpreting
model output: </em>Using our model, we are able to get a probability that
a particular point belongs to our presence class and produce an interactive web
app to showcase the outputs. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Continue until
output is actionable: </em>Throughout the entire process, we had to adapt our methods
as new information became available and we learned more about the problem,
which is almost always the case in machine learning problems.&nbsp; </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Why
tree-based models?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes: </strong>For several
reasons:</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Simplicity:</em> No feature
selection is needed (as they have been expertly chosen), no need to pre-process
features (avoid unnecessary biases).</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Interpretability</em>: Black box
methods seem great on paper but in practice they lack transparency when
evaluating model output. Tree methods allow us to look into each decision and
see what influenced its output. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Incremental
models</em>: We wanted to start-off simple and show the
shortcomings of simple decision trees to justify more complex model choices
such as Random Forest.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><em>Hint of
experience</em>: Tree-based models have been shown to work well for
tabular datasets such as in our case.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Why did
you add a deep neural network? </strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes</strong>: We added a
deep neural network to show the value of this method in extracting complex
features. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: What were
the challenges when working with ocean data? </strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes: </strong>I think there
are several challenges to consider.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>First, I would
say that <em>ocean data can be quite intimidating</em>. Working with geospatial
information means not only looking at the data but looking at it in the right
way. &nbsp;</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>One of our challenges was <em>understanding coordinate reference
systems (CRS),</em> which determine where points are located on a map. Since the
Earth is spherical, each CRS represents a projection onto a flat 2D surface for
visualisation. We are all familiar with one such system, the latitudes and
longitudes we see on our Google Maps, also known as WGS84 (or EPSG: 4326). But as it turns out, each data provider has its own
favourite CRS and so re-projecting between these is often necessary when
performing comparisons and calculations. Luckily, many Python packages such as
GDAL and Rasterio help us to simplify this process.&nbsp; </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>Another major
challenge was <em>interpreting inland data</em>. Since we had no information
available about inland water sources, we had to match these to the closest
ocean which proved to be difficult and inaccurate because we have to make
assumptions such as “inland water is just as salty as sea water”.&nbsp; This led to large biases in our initial results
and led us to revisit this assumption and ultimately abandon this when we
obtained additional presence data in the Baltic Sea.&nbsp;&nbsp; </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: What
were the initial reactions when you presented the ML model to the team? </strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> I would say an
equal mix of intrigue and confusion. Although the model results seem
impressive, it is difficult to understand what these metrics mean until you
have had an opportunity to work with the data and modelling yourself. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Was it
difficult to find the relevant data?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> The data exist
and are plentiful on open data platforms. But the data lie on multiple siloed systems with no
central access point or methodology, and the difficulty also
lies more in extracting the relevant data in the correct format.</p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":545,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img decoding="async" width="1001" height="1024" src="https://oceandatafactory.se/wp-content/uploads/2020/03/emodent-1001x1024.png" alt="A screenshot of a cell phone

Description automatically generated" class="wp-image-545" srcset="https://oceandatafactory.se/wp-content/uploads/2020/03/emodent-1001x1024.png 1001w, https://oceandatafactory.se/wp-content/uploads/2020/03/emodent-293x300.png 293w, https://oceandatafactory.se/wp-content/uploads/2020/03/emodent-768x785.png 768w, https://oceandatafactory.se/wp-content/uploads/2020/03/emodent.png 1146w" sizes="(max-width: 1001px) 100vw, 1001px" /></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p>EMODnet Central
Portal, <a href="https://www.emodnet.eu/portals">https://www.emodnet.eu/portals</a></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: As a
data science expert, what do you consider as limitations of the ML solution in
this context of predicting invasive species?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes: </strong>The output from
any ML model is only as good as its assumptions and the data used. So, one
limitation of this model is that we have insufficient data to make high
confidence predictions. It is also limited as a predictor using only data
points and not entire grids, which could be useful as areas in the grid close
to each other usually have a strong relationship to one another.</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Is there
anything you would like to share with data scientists who start working with
ocean data?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> I would say
that data scientists should be very critical of any methods they are
“comfortable” with when shifting to geospatial ocean data. For example, if you
simply sample data points from a large area in the ocean and then split your
datasets into training and test sets, the distribution of the training and test
data will be so similar that the test set effectively “leaks” into the training
set, which leads us to be overconfident in our model predictions. </p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":546,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="1012" src="https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp-1024x1012.png" alt="" class="wp-image-546" srcset="https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp-1024x1012.png 1024w, https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp-300x296.png 300w, https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp-768x759.png 768w, https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp-1536x1518.png 1536w, https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp-1600x1581.png 1600w, https://oceandatafactory.se/wp-content/uploads/2020/03/map-killer-shrimp.png 1675w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p><strong>Map of Killer
Shrimp distribution in Baltic Sea</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: What
relation to the ocean and ocean data did you have before this project? How was
your experience as data scientist, working with ocean data?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> I had never
worked with ocean data specifically, so this was all rather new to me. It was
very rewarding since ocean data expanded my toolkit to deal with a broader
range of datasets and tools for future use cases (especially geospatial data).</p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":547,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="851" src="https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output-1024x851.png" alt="En bild som visar text, karta, utomhus

Automatiskt genererad beskrivning" class="wp-image-547" srcset="https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output-1024x851.png 1024w, https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output-300x249.png 300w, https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output-768x638.png 768w, https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output-1536x1277.png 1536w, https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output-1600x1330.png 1600w, https://oceandatafactory.se/wp-content/uploads/2020/03/web-application-output.png 1837w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<!-- /wp:image -->

<!-- wp:paragraph -->
<p><strong>Web application
output: http://odf-open-data.herokuapp.com</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Is there
anything you would like to share with ocean data experts who start learning AI?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> Always
question the output of the model and trust your gut because as a subject expert
you have the experience to judge what is reasonable. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Looking
back, what part of the work process was most time consuming? </strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes:</strong> Extracting
data from the respective sources took up the bulk of the time, since there is
no central place to get all the information we need. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: What
could you suggest to ocean data providers about how the data could be better
prepared for use, in terms of accessibility, format, or other aspects?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>One of our main
goals within ODF Sweden is to encourage and enable FAIR data practices. This
means that any and all data we use should be findable, accessible,
interoperable and reusable. This includes open data sources through open APIs,
open code sharing on Github and public notebooks on Kaggle. With this in mind,
we would recommend that all data providers improve and align documentation
standards. We also hope that datasets will become more searchable and that new
datasets will be promoted to boost research efforts. </p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Yixin: Finally,
the next step will be to publish parts of the ML model on Kaggle, an online
ML-learning/problem-solving community. What results do you hope for?</strong></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p><strong>Jannes</strong>: I hope to
engage with a broad audience from diverse backgrounds who are interested in
learning more about ocean data and data science or contributing their expertise
and insights to build on and improve on our models. I also hope to showcase
what we have done in ODF Sweden and to share our data and insights with a large
and active online community.</p>
<!-- /wp:paragraph -->

<!-- wp:image {"id":548,"sizeSlug":"large"} -->
<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="466" src="https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-1024x466.png" alt="A screenshot of a social media post

Description automatically generated" class="wp-image-548" srcset="https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-1024x466.png 1024w, https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-300x137.png 300w, https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-768x349.png 768w, https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-1536x699.png 1536w, https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-2048x932.png 2048w, https://oceandatafactory.se/wp-content/uploads/2020/03/kaggle-1600x728.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<!-- /wp:image -->					</div>
						</div>
				</div>
						</div>
					</div>
		</div>
								</div>
					</div>
		</section>
						</div>
						</div>
					</div>
		<p>The post <a rel="nofollow" href="https://oceandatafactory.se/interview-with-odf-sweden-data-scientist-jurie-jannes-germishuys/">Interview with ODF Sweden data scientist Jurie “Jannes” Germishuys</a> appeared first on <a rel="nofollow" href="https://oceandatafactory.se">ODF Sweden</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
